A Swift regex class abstracting away some of the complexities of NSRegularExpression
Every time I have to use NSRegularExpression
in Swift I make the same mistakes over and over regarding ranges and range conversions between NSRange
and Range<String.Index>
.
Also, pulling content out using capture groups is tedious and a little error-prone. I wanted to abstract away from of the things that I kept stuffing up.
let inputText: String = <some text to match against>
// Build the regex to match against (in this case, <number>\t<string>)
// This regex has two capture groups, one for the number and one for the string.
let regex = try DSFRegex(#"(\d*)\t\"([^\"]+)\""#)
// Retrieve ALL the matches for the supplied text
let searchResult = regex.matches(for: inputText)
// Loop over each of the matches found, and print them out
searchResult.forEach { match in
let foundStr = inputText[match.range] // The text of the entire match
let numberVal = inputText[match.captures[0]] // Retrieve the first capture group text.
let stringVal = inputText[match.captures[1]] // Retrieve the second capture group text.
Swift.print("Number is \(numberVal), String is \(stringVal)")
}
The basic structure of a 'matches' result is as follows
Matches
> matches: An array of regex matches
> range: A match range. This range specifies the match range within the original text being searched
> captures: An array of capture groups
> A capture range. This range represents the range of a capture within the original text being searched
All ranges provided back to the caller (and conversely, when passing ranges to the regex object) are in the range of the Swift String
passed in for the match.
This is important, as NSRegularExpression
uses NSString
and the code points and character range information are different between NSString
and String
, especially when dealing with characters in the high Unicode ranges such as emoji 🇦🇲 👨👩👦.
You create a regex matching object using the constructor and a regex pattern. If the regex is badly formatted or cannot be compiled, this constructor will throw.
// Match against dummy phone numbers XXXX-YYY-ZZZ
let phoneNumberRegex = try DSFRegex(#"(\d{4})-(\d{3})-(\d{3})"#)
To check whether a string matches against the regex, use the hasMatch
method.
let hasAMatch = phoneNumberRegex.hasMatch("0499-999-999") // true
let noMatch = phoneNumberRegex.hasMatch("0499 999 999") // false
If you want to extract all the match information, use the matches
method.
let result = phoneNumberRegex.matches(for: "0499-999-999 0491-111-444 4324-222-123")
result.forEach { match in
let matchText = result.text(for: match.element)
Swift.print("Match `\(matchText)`")
for capture in match.captures {
let captureText = result.text(for: capture)
Swift.print(" - `\(captureText)`")
}
}
If you have a large input text or a complex regex that will take a while to process or you have constrained memory conditions you can choose to enumerate the match results rather than process everything up front.
The enumeration method allows you to stop processing at any time or any point in the process (eg. if you have limited time constraints, or are looking for a specific match within a text).
/// Find all email addresses within a text
let inputString = "… some input string …"
let emailRegex = try DSFRegex("… some regex …")
emailRegex.enumerateMatches(in: inputString) { (match) -> Bool in
// Extract match information
let matchRange = match.range
let matchText = inputString[match.range]
Swift.print("Found '\(matchText)' at range \(matchRange)")
// Continue processing
return true
}
A string search cursor is useful when you are searching sporadically within a string, say in response to a user clicking on the 'next' button. The cursor keeps track of the current match, and is used when locating the next match in the string.
var searchCursor: DSFRegex.Cursor?
var content: String
@IBAction func startSearch(_ sender: Any) {
let regex = DSFRegex(... some pattern ...)
// Find the first match in the string
self.searchCursor = self.content.firstMatch(for: regex)
self.displayForCurrentSearch()
}
@IBAction func nextSearchResult(_ sender: Any) {
if let previous = self.searchCursor {
// Find the next match in the string from the
self.searchCursor = self.content.nextMatch(for: previous)
}
self.displayForCurrentSearch()
}
internal func displayForCurrentSearch() {
// Update the UI reflecting the search result found in self.searchCursor
...
}
Returns a new string containing matching regular expressions replaced with a template string.
// Redact email addresses within the text
let emailRegex = try DSFRegex("… some regex …")
let redacted = emailRegex.stringByReplacingMatches(
in: inputString,
withTemplate: NSRegularExpression.escapedTemplate(for: "<REDACTED-EMAIL-ADDRESS>")
)
The primary class used to perform a regex match.
An object that contains all of the results of the regex matched against a text. It also provides a number of methods to help extract text from a match and/or capture object.
A single match object. Stores the range of the match within the original string. If capture groups were defined within the regex also contains an array of the capture group objects.
A capture represents a single range matching a capture within a regex result. Each match
may contain 0 or more captures depending on the captures available in the regex
An incremental cursor object used when searching via the String
extension.
pod 'DSFRegex', :git => 'https://github.com/dagronf/DSFRegex/'
Add https://github.com/dagronf/DSFRegex
to your project.
Copy the files in the Sources/DSFRegex
into your project
For more examples and usage, you can find a series of tests in the Tests
folder.
let phoneNumberRegex = try DSFRegex(#"(\d{4})-(\d{3})-(\d{3})"#)
let results = phoneNumberRegex.matches(for: "4499-999-999 3491-111-444 4324-222-123")
// results.numberOfMatches == 3
// results.text(match: 0) == "4499-999-999"
// results.text(match: 1) == "3491-111-444"
// results.text(match: 2) == "4324-222-123"
// Just retrieve the text for each of the matches
let textMatches = results.textMatching() // == ["4499-999-999", "3491-111-444, "4324-222-123"]
If you're only interested in the first match, use
let first = phoneNumberRegex.firstMatch(in: "4499-999-999 3491-111-444 4324-222-123")
let allMatches = phoneNumberRegex.matches(for: "0499-999-999 0491-111-444 4324-222-123")
for match in allMatches.matches.enumerated() {
let matchText = allMatches.text(for: match.element)
Swift.print("Match (\(match.offset)) -> `\(matchText)`")
for capture in match.element.capture.enumerated() {
let captureText = allMatches.text(for: capture.element)
Swift.print(" Capture (\(capture.offset)) -> `\(captureText)`")
}
}
The output :-
Match (0) -> `0499-999-888`
Capture (0) -> `0499`
Capture (1) -> `999`
Capture (2) -> `888`
Match (1) -> `0491-111-444`
Capture (0) -> `0491`
Capture (1) -> `111`
Capture (2) -> `444`
Match (2) -> `4324-222-123`
Capture (0) -> `4324`
Capture (1) -> `222`
Capture (2) -> `123`
/// Find all email addresses within a text
let emailRegex = try DSFRegex("… some regex …")
let inputString = "This is a test.\n noodles@compuserve4.nginix.com and sillytest32@gmail.com, grubby@supernoodle.org lives here"
var count = 0
emailRegex.enumerateMatches(in: inputString) { (match) -> Bool in
count += 1
// Extract match information
let matchRange = match.range
let nsRange = NSRange(matchRange, in: inputString)
let matchText = inputString[match.range]
Swift.print("\(count) - Found '\(matchText)' at range \(nsRange)")
// Stop processing if we've found more than two
return count < 2
}
Output :-
1 - Found 'noodles@compuserve4.nginix.com' at range {17, 30}
2 - Found 'sillytest32@gmail.com' at range {52, 21}
MIT License
Copyright (c) 2024 Darren Ford
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.