DFSearchKit
A framework implementing a search index and summary generator using SKSearchKit for both Swift and Objective-C
Why?
I was interesting in learning about SKSearchKit and wanted a nice simple object to abstract away some of the unpleasantries when dealing with a C-style interface in Swift using native Swift types.
Usage
The base library is split into three classes and an async controller.
Copy
Copy everything from the DFSearchKit
sub-folder into your own project
Cocoapods
Add the following to your Podfiles
file
pod 'DFSearchKit', :git => 'https://github.com/dagronf/DFSearchKit'
Swift Package Manager
Import via Xcode.
Classes
DFSearchIndex.Memory
A class inheriting from DFSearchIndex that implements an in-memory index. This index exists purely in memory, and will be destroyed when the index is deallocated.
// Create a new memory index using the default settings
guard let indexer = DFSearchIndex.Memory() else {
assert(false)
}
indexer.add(textURL: "doc-url://d1.txt", text: "This is my first document")
let fileURL = // <the url for some file on disk>
indexer.add(fileURL, mimeType: "application/pdf")
// ... add more documents
indexer.flush()
let searchresult = indexer.search("first")
// Do something with the search results
DFSearchIndex.Memory
provides methods to get the raw index data for storing, and to load from data
Load from a raw Data object
let indexData = Data(...)
guard let indexer = DFSearchIndex.Memory(data: indexData) else {
assert(false)
}
Extract the raw Data object from the search index
let newIndexData = indexer.data()
DFSearchIndex.File
A class inheriting from DFSearchIndex that allows the creation and use of an index on disk.
// Create a index on disk
let newFileURL = // <some file url>
let createProperties = DSFSearchIndex.CreateProperties() // search index properties
guard let newIndex = DFSearchIndex.File(fileURL: newFileURL, properties: createProperties) else {
assert(false)
}
// Open a file index
let existingFileURL = // <some file url>
guard let fileIndex = DFSearchIndex.File(fileURL: existingFileURL, writable: true) else {
assert(false)
}
let documentURL = URL(string: ("doc-url://d1.txt")!
fileIndex.add(documentURL, text: "This is my first document"))
let fileURL = // <the url for some file on disk>
fileIndex.add(fileURL, mimeType: "application/pdf")
// Flush the index so that it is updated for searching
fileIndex.flush()
// Perform a basic search for the work 'first'
var result = indexer.search("first")
fileIndex.save()
fileIndex.close()
DFSearchIndex.AsyncController
DFSearchIndex.AsyncController
is a simple controller that takes an index object, and provides a safe method for handling async requests.
For example, to add a number of files asynchronously
guard let searchIndex = DFSearchIndex.Memory() else {
assert(false)
}
let asyncController = DFSearchIndex.AsyncController(index: searchIndex, delegate: nil)
// Create a file task containing the URLs to be indexed
let addTask = DFSearchIndex.AsyncController.FileTask(<file urls to add>)
asyncController.addURLs(async: addTask, complete: { task in
// <block that is executed when the files have been added to the index>
})
...
// Create a file task containing the URLs to be removed
let removeTask = DFSearchIndex.AsyncController.FileTask(<file urls to remove>)
asyncController.removeURLs(async: removeTask, complete: { task in
// <block that is executed when the files have been removed from the index>
})
Internally the async controller uses an operation queue for handling requests.
Searching
There are two methods for search
Search all
The search all is available on the indexer object, and returns all the results it can get. As such, for large indexes this may take quite a while to return. It is provided mostly as a convenience function for small indexes.
guard let searchIndex = DFSearchIndex.Memory() else {
assert(false)
}
// Add some documents...
searchIndex.add(textURL: "doc-url://d1.txt", text: "This is my first document"))
// Flush the index
searchIndex.flush()
// Search for the word 'first'
let searchResult = indexer.search("first")
// searchResult.count == 1
// searchResult[0].url == firstURL
// searchResult[0].score == 1
searchIndex.save()
searchIndex.close()
Progressive Search
For large indexes, the results may take quite a while to return. Thus, the progressive index is more useful by returning limited sets of results progressively, and can be used on a background thread (as SKSearchIndex is thread safe) to progressively retrieve results in another thread (for example)
/// ... load documents ...
let search = indexer.progressiveSearch(query: "dog")
var hasMoreResults = true
repeat {
var searchChunk = search.next(10)
// ... do something with searchChunk...
hasMoreResults = searchChunk.moreResults
}
while hasMoreResults
Summary Generation
The DFSearchIndex.Summarizer
class provides a wrapper around the SKSummary
interface, providing text rankings and orderings.
let text = // <some text
let summary = DFSummarizer(text)
// Get the number of sentences in the text
let sentenceCount = summary.sentenceCount()
// Get the number of paragraphs in the text
let paragraphCount = summary.paragraphCount()
Samples
SearchToy
is a (very!) basic UI to show integrationdfsearchindex
is a simple command line tool (that is very unforgiving to its parameters at this point!) that uses DFSearchIndexFile to create a command line tool interface to the index
Tests
-
DFSearchKitTests.swift
Swift tests. Comprehensive
-
DFSearchKitTests_objc.m
Objective-C tests, mainly for validating objc integration
-
DFSearchIndexAsyncTests.swift
Basic test suite to validate the async controller aspect of the library
-
DFSearchIndexSummaryTests.swift
Basic summary tests
Thanks
Mattt Thompson (NSHipster)
http://nshipster.com/search-kit/
Marc Charbonneau
https://blog.mbcharbonneau.com/2009/02/26/searchkit-example-project/
Apple
Philip Dow (SPSearchStore)