MiniDOM is a minimal implementation of the Document Object Model interface. It is intended to be simpler than the full DOM but full-featured enough to be useful in most applications.
MiniDOM is fully documented and unit tested. It can be used on iOS, macOS, watchOS, and tvOS. The library is released under the MIT license.
To parse an XML document, simply create a Parser
object and call parse()
:
import Foundation
import MiniDOM
func parseXML(url: URL) -> Document? {
let parser = Parser(contentsOf: url)
let result = parser?.parse()
return result?.document
}
The resulting structure is a tree of objects implementing the Node
protocol: Document
, Element
, Text
, ProcessingInstruction
, Comment
, and CDATASection
. Accessor methods and properties are provided that are similar to those in the DOM specification. DOM trees can be traversed using search methods, path-evaluation methods, or using the visitor design pattern. Each of these will be discussed in detail below.
MiniDOM supports installation via the CocoaPods, Carthage, and the Swift Package Manager.
Add the following to your Podfile
:
pod 'MiniDOM'
Add the following to your Cartfile
:
github "MiniDOM/MiniDOM"
Add the following dependency to your Package.swift
file:
.package(url: "https://github.com/MiniDOM/MiniDOM/", from: "1.0.0")
MiniDOM has no third-party dependencies. It only uses Foundation
classes, including XMLParser
. Unit tests are written using XCUnit
.
MiniDOM provides a mechanism for traversing a document via a path. Call Document.evaluate(path:)
, passing an array of strings representing element node names (Node.nodeName
). For example, consider the following document:
<a id="1">
<b id="2">
<z id="3"/>
</b>
<c id="4">
<z id="5"/>
</c>
<b id="6"/>
<z id="7"/>
<b id="8">
<z id="9"/>
</b>
</a>
Evaluating the path ["a", "b", "z"]
(by calling document.evaluate(path: ["a", "b", "z"])
) will return an array of two Element
objects representing the <z>
elements with ID 3
and 9
.
The Visitor Design Pattern is used throughout the MiniDOM library to implement algorithms that involve traversing the DOM tree. It provides a convenient mechanism to separate an algorithm from the object structure on which it operates. It allows operations to be added to the DOM structure without modifying the structures themselves.
A Visitor
object is provided to Node.accept(_:)
to start the traversal. The Node
object calls the appropriate methods on the Visitor
object before calling Node.accept(_:)
on its child nodes, performing the recursive traversal.
The Visitor
protocol defines methods that correspond to each of the Node
types in the DOM. Types implementing the Visitor
protocol do not need to deal with the actual traversal; its methods are called by the traversal algorithm provided by the DOM classes.
For a simple example of a visitor, see the ElementSearch
class in Search.swift
. For a more complex example of a visitor, see the PrettyPrinter
class in Formatter.swift
.
The following is taken from MiniDOM.playground
in the root of the project. Feel free to open it up and experiment on your own.
We have an XML document saved in the resources section of the playground. It contains a snapshot of the EFF Updates RSS feed. We'll begin by parsing the document.
let url = Bundle.main.url(forResource: "eff-updates", withExtension: "rss")!
let parser = Parser(contentsOf: url)
let document = parser?.parse().document
The document's structure is something like this:
<rss>
<channel>
<title>...</title>
<link>...</link>
<description>...</description>
<item>
<title>...</title>
<link>...</link>
<description>...</description>
</item>
<item>...</item>
...
</channel>
</rss>
Let's begin by getting the document element or root node of the document.
let rss = document?.documentElement
rss?.nodeName
Result
"rss"
The <rss>
element should have one child: a <channel>
element.
let channel = rss?.firstChildElement
channel?.nodeName
Result
"channel"
The <channel>
element should have 50 <item>
children.
let items = channel?.childElements(withName: "item")
items?.count
Result
50
Each of the <item>
elements should have a <title>
child.
let itemTitles = items?.flatMap { itemElement -> String? in
let titleElement = itemElement.childElements(withName: "title").first
return titleElement?.textValue
}
itemTitles
Result
0 "Stupid Patent of the Month: Storing Files in Folders"
1 "NAFTA Renegotiation Will Resurrect Failed TPP Proposals"
2 "New Report Aims to Help Criminal Defense Attorneys Challenge Secretive Government Hacking"
3 "The Most Powerful Single Click in Your Facebook Privacy Settings"
4 "Repealing Broadband Privacy Rules, Congress Sides with the Cable and Telephone Industry"
...
There are <link>
elements that are children of the <channel>
element, and that are children of each of the <item>
elements. We can find all of them.
let linkElementsFromDocument = document?.elements(withTagName: "link")
let linkURLsFromDocument = linkElementsFromDocument?.flatMap { $0.textValue }
linkURLsFromDocument
Result
0 "https://www.eff.org/rss/updates.xml"
1 "https://www.eff.org/deeplinks/2017/03/stupid-patent-month-storing-files-folders"
2 "https://www.eff.org/deeplinks/2017/03/nafta-renegotiation-will-resurrect-failed-tpp-proposals"
3 "https://www.eff.org/deeplinks/2017/03/eff-says-no-so-called-moral-rights-copyright-expansion"
4 "https://www.eff.org/deeplinks/2017/03/new-report-aims-help-criminal-defense-attorneys-challenge-secretive-government"
5 "https://www.eff.org/deeplinks/2017/03/most-powerful-single-click-your-facebook-privacy-settings"
...
The <item>
children of the <channel>
element should each have a <link>
child. Using a path expression, we can collect all of the text children of the <link>
elements under the <channel>
element.
let linkTextNodesViaPath = document?.evaluate(path: ["rss", "channel", "item", "link", "#text"])
let linkURLsViaPath = linkTextNodesViaPath?.flatMap { $0.nodeValue }
linkURLsViaPath
Result
0 "https://www.eff.org/deeplinks/2017/03/stupid-patent-month-storing-files-folders"
1 "https://www.eff.org/deeplinks/2017/03/nafta-renegotiation-will-resurrect-failed-tpp-proposals"
2 "https://www.eff.org/deeplinks/2017/03/eff-says-no-so-called-moral-rights-copyright-expansion"
3 "https://www.eff.org/deeplinks/2017/03/new-report-aims-help-criminal-defense-attorneys-challenge-secretive-government"
4 "https://www.eff.org/deeplinks/2017/03/most-powerful-single-click-your-facebook-privacy-settings"
...
We can collect all of the <title>
elements in the document using a visitor.
class TitleCollector: Visitor {
var titles: [String] = []
public func beginVisit(_ element: Element) {
if element.tagName == "title", let title = element.textValue {
titles.append(title)
}
}
}
let titleCollector = TitleCollector()
document?.accept(titleCollector)
titleCollector.titles
Result
0 "Deeplinks"
1 "Stupid Patent of the Month: Storing Files in Folders"
2 "NAFTA Renegotiation Will Resurrect Failed TPP Proposals"
3 "New Report Aims to Help Criminal Defense Attorneys Challenge Secretive Government Hacking"
4 "The Most Powerful Single Click in Your Facebook Privacy Settings"
5 "Repealing Broadband Privacy Rules, Congress Sides with the Cable and Telephone Industry"
Please report any issues you find.
Pull requests are welcome. Please make sure any additions are documented and unit tested. We aim to maintain 100% documentation and test coverage.