Native Swift SDK for offline Quran verse recognition on iOS.
The package is a Swift implementation of the offline-tarteel pipeline shape:
- Capture or provide 16 kHz mono audio.
- Compute 80-bin NeMo-compatible mel spectrogram features.
- Run the ONNX FastConformer CTC model with ONNX Runtime.
- Greedy CTC decode and fuzzy-match the transcript against all 6,236 Quran verses.
- Track recitation progress across verses and recover into discovery when the user starts another surah.
The SDK bundles the zipped ONNX model, vocab.json, and quran.json through Bundle.module.
- Swift 6.2 or newer.
- iOS 17 or newer.
- Xcode with a Swift 6.2 toolchain or newer for app integration.
- Microphone permission if you use live recognition through
startListening.
QuranRecognitionKit depends on Microsoft's onnxruntime-swift-package-manager package. Swift Package Manager resolves this dependency automatically.
In Xcode:
- Open the app project.
- Choose
File > Add Package Dependencies. - Choose
Add Local. - Select this package directory.
- Add the
QuranRecognitionKitproduct to the app target.
Then import it:
import QuranRecognitionKitIn Xcode:
- Choose
File > Add Package Dependencies. - Enter
https://github.com/akhandafm17/QuranRecognitionKit.git. - Select version
0.1.8or newer. - Add the
QuranRecognitionKitproduct to the app target.
In a Swift package manifest:
.package(url: "https://github.com/akhandafm17/QuranRecognitionKit.git", from: "0.1.8")The package requires iOS 17 or newer.
Live recognition uses the device microphone. Add a microphone usage description to the host app's Info.plist:
<key>NSMicrophoneUsageDescription</key>
<string>Microphone access is used to recognize Quran recitation on device.</string>The SDK configures an AVAudioSession for recording when startListening is called. If your app also plays audio, coordinate calls to your own audio session setup with recognition start and stop.
ModelDownloader uses URLSession. If your model URL is not HTTPS, configure App Transport Security in the host app.
QuranRecognitionKit includes a bundled zipped ONNX model. For the common path, create a ready-to-use recognizer from the bundled model:
let recognizer = try await QuranRecognizer.bundled()
try await recognizer.prepare()On first use, the SDK extracts and verifies the bundled model into the app cache directory. Later calls reuse the extracted model if the checksum still matches.
The bundled archive is about 96 MB, and the extracted ONNX model is about 126 MB. This makes the Swift package larger, but keeps integration ready-to-go and fully offline after installation.
The expected audio input for direct recognition is 16 kHz mono Float PCM samples. Live recognition captures microphone input and converts it internally before inference.
You can still pass your own compatible ONNX model URL:
let recognizer = QuranRecognizer(modelURL: modelURL)
try await recognizer.prepare()The model must produce CTC logits for the same vocabulary bundled in vocab.json. If the model vocabulary size does not match, prepare() throws RecognitionError.vocabModelMismatch.
If you use ModelDownloader, an expected SHA-256 is required.
let downloader = ModelDownloader()
let localModelURL = try await downloader.download(
from: modelArchiveURL,
expectedSHA256: expectedChecksum,
destinationURL: destinationURL
) { progress in
print("Download progress: \(progress)")
}ModelDownloader verifies and stores bytes. It does not unzip archives; extract compressed model files in the host app before passing the ONNX URL to QuranRecognizer.
import QuranRecognitionKit
let configuration = QuranRecognizer.Configuration(
processingInterval: 0.20,
discoveryWindowSeconds: 3.5,
trackingWindowSeconds: 2.25,
minimumDiscoveryWindowSeconds: 1.75,
minimumTrackingWindowSeconds: 0.90,
discoveryFreshAudioSeconds: 0.30,
trackingFreshAudioSeconds: 0.20,
maximumBufferedSeconds: 6.0,
intraOpThreadCount: 1,
minimumSpeechRMS: 0.0015,
minimumSpeechPeak: 0.006,
minimumSpeechFrameRatio: 0.03,
suppressLowInformationTranscriptions: true,
debugLogging: false
)
let recognizer = try await QuranRecognizer.bundled(configuration: configuration)
try await recognizer.prepare()
let session = try recognizer.startListening(surahHint: 1)
for await event in session.events {
switch event {
case .audioInput(let quality):
if !quality.isSpeechLikely {
print("Waiting for clear recitation: \(quality.status)")
}
case .transcription(let text):
// Intended for live UI. Low-information fragments are suppressed by default.
print(text)
case .verseDetected(let verse):
print("Detected \(verse.surahNumber):\(verse.verseNumber)")
case .stateChanged(let state):
print(state)
case .error(let error):
print(error)
}
}Stop safely:
session.stop()Pass the current surah number as surahHint when recognition starts from a Quran reader. Discovery will prefer that surah first, which improves startup speed and reduces false jumps for the common case where the user recites from the displayed surah.
Use recognize(samples:surahHint:) when you already have 16 kHz mono Float samples:
let recognizer = QuranRecognizer(modelURL: modelURL)
try await recognizer.prepare()
if let verse = try await recognizer.recognize(samples: samples, surahHint: 1) {
print("\(verse.surahNumber):\(verse.verseNumber)")
}You can also run a non-streaming manual check with:
swift run QuranRecognitionManualHarness /path/to/FastConformerQuranCTC.onnx /path/to/audio.wav 1 1public final class QuranRecognizer: @unchecked Sendable {
public init(modelURL: URL, configuration: Configuration = Configuration())
public static func bundled(configuration: Configuration = Configuration()) async throws -> QuranRecognizer
public func prepare() async throws
public func startListening(surahHint: Int? = nil) throws -> QuranRecognitionSession
public func recognize(samples: [Float], surahHint: Int? = nil) async throws -> RecognizedVerse?
}Call prepare() once before recognition. It loads bundled resources, creates the ONNX Runtime session, and validates the model against the bundled vocabulary.
public enum BundledQuranModel {
public static let archiveFileName: String
public static let modelFileName: String
public static func modelURL() async throws -> URL
public static func removeExtractedModel() throws
}modelURL() extracts and verifies the bundled model if needed, then returns the local extracted ONNX file URL. removeExtractedModel() removes the cached extracted model so it can be recreated from the bundled archive.
public final class QuranRecognitionSession: @unchecked Sendable {
public let events: AsyncStream<RecognitionEvent>
public func stop()
}events yields microphone quality updates, decoded transcripts, verse detections, state changes, and errors. Call stop() when the user leaves the recognition flow or disables listening.
public enum RecognitionState: Sendable, Equatable {
case idle
case preparing
case listening
case processing
case stopped
}
public struct RecognizedVerse: Sendable, Equatable {
public let surahNumber: Int
public let verseNumber: Int
public let ayahEnd: Int?
public let confidence: Double
public let arabicText: String
}
public enum AudioInputStatus: Sendable, Equatable {
case silence
case tooLittleSpeech
case speech
case clipped
}
public struct AudioInputQuality: Sendable, Equatable {
public let rms: Float
public let peak: Float
public let rmsDecibels: Float
public let speechFrameRatio: Double
public let windowSeconds: Double
public let status: AudioInputStatus
public let isSpeechLikely: Bool
}
public enum RecognitionEvent: Sendable, Equatable {
case audioInput(AudioInputQuality)
case transcription(String)
case verseDetected(RecognizedVerse)
case stateChanged(RecognitionState)
case error(RecognitionError)
}ayahEnd is set when the matcher identifies a span that covers multiple ayahs. For single-ayah detections it is nil.
public actor ModelDownloader {
public init()
public func download(
from sourceURL: URL,
expectedSHA256: String,
destinationURL: URL,
progress: (@Sendable (Double) -> Void)? = nil
) async throws -> URL
}ModelDownloader streams a remote file to disk and verifies SHA-256 before replacing the destination file.
public enum RecognitionError: Error, Sendable, Equatable {
case resourceMissing(String)
case resourceCorrupt(String)
case modelMissing(String)
case modelCorrupt(String)
case vocabModelMismatch(expected: Int, actual: Int)
case microphonePermissionDenied
case microphoneUnavailable(String)
case unsupportedPlatform
case invalidAudioSampleRate(expected: Double, actual: Double)
case inferenceFailed(String)
case downloadFailed(String)
case downloadChecksumMismatch(expected: String, actual: String)
case notPrepared
case alreadyStopped
}extension QuranRecognizer {
public struct Configuration: Sendable, Equatable {
public var processingInterval: TimeInterval
public var discoveryWindowSeconds: Double
public var trackingWindowSeconds: Double
public var minimumDiscoveryWindowSeconds: Double
public var minimumTrackingWindowSeconds: Double
public var discoveryFreshAudioSeconds: Double
public var trackingFreshAudioSeconds: Double
public var maximumBufferedSeconds: Double
public var intraOpThreadCount: Int
public var minimumSpeechRMS: Float
public var minimumSpeechPeak: Float
public var minimumSpeechFrameRatio: Double
public var suppressLowInformationTranscriptions: Bool
public var debugLogging: Bool
}
}The default streaming setup is tuned for mobile responsiveness while keeping rolling windows for recognition stability:
- Processing cadence: 200 ms. Inference is serialized, so slow devices skip overlapping cycles instead of piling up work.
- Discovery window: 3.5 seconds.
- Tracking window: 2.25 seconds.
- First inference gate: 1.75 seconds in discovery, 0.9 seconds in tracking.
- Fresh audio gate after the first inference: 300 ms in discovery and 200 ms in tracking.
- ONNX thread count: 1 by default to avoid CPU contention and UI stutter on phones.
- Audio quality gate: skips silence, very weak speech, and clipped windows before ONNX inference.
- Transcript quality gate: suppresses short fragments such as single letters from the public
.transcriptionevent by default.
The SDK avoids main-thread inference and audio processing:
prepare()loads resources and ONNX Runtime on a background queue.- Streaming audio capture appends bounded buffers from the audio callback.
- Inference runs on a serial background queue and reuses the ONNX session.
- Mel computation reuses FFT setup, Hann window, and mel filterbank.
- The audio buffer is capped by
maximumBufferedSeconds. - Low-speech audio windows are skipped before ONNX inference.
- A capture watchdog detects a silent input tap (no buffers while the engine reports running) and automatically restarts the audio engine, logging the active input route for diagnosis.
- Verse matching uses an evidence index and bounded span search instead of scanning every possible span.
- Tracking mode searches locally around the current verse before returning to global discovery.
Current validation:
swift testpasses on macOS arm64 with 87 tests.- The test suite covers hinted discovery, same-surah tracking, low-information noise, near-end recovery, post-completion surah switching, ambiguous candidate rejection, and audio-window quality analysis.
- A real-recitation replay test (
RealRecitationReplayTests) feeds 12 minutes of an actual Surah Al-Baqarah recitation (2:1-2:59) through the real tracker, using per-window decodes produced by the bundled ONNX model with the session's exact streaming window policy, and asserts sequential no-skip/no-regression following with bounded tracking losses. - Scenario tests (
RecitationScenarioTests) simulate full recitation sessions across structurally different surahs (Al-Fatihah, Al-Kahf, Al-Mulk, Al-Ikhlas, An-Nas) and recitation styles (clean per-ayah windows, rolling boundary-spanning windows, short fragments, noisy clipped decodes), asserting sequential no-skip/no-regression tracking and per-window latency bounds. - Generic iOS package builds pass with
xcodebuild -scheme QuranRecognitionKit-Package -destination 'generic/platform=iOS' build. - App-side iOS generic builds passed with this package integrated.
For release profiling and maintainer checks, see CONTRIBUTING.md.
Call try await recognizer.prepare() before startListening or recognize(samples:).
For the bundled model path, call BundledQuranModel.removeExtractedModel() and try again. For custom models, verify the local model URL points to an existing, non-empty .onnx file.
The ONNX model output vocabulary does not match the bundled vocab.json. Use a model exported for the same tokenizer/vocabulary as this SDK.
Make sure the host app includes NSMicrophoneUsageDescription. On device, also check iOS Settings if the user previously denied microphone access.
If debug logs show waiting for audio ... bufferSamples=0 repeating after audio engine started, the input tap is not delivering buffers even though the engine is running. This is a system audio state issue, not a recognition issue: the microphone may be held by another app or call, the audio route (often Bluetooth) may be broken, or CoreAudio is in a stale state. The SDK restarts the audio engine automatically up to three times and logs the active input route. The session also emits .audioInput silence events so the host app can show feedback. If it does not recover: disconnect Bluetooth audio devices, close apps that use the microphone, or restart the device.
Check these first:
- The model URL is correct and
prepare()succeeded. - The device microphone is receiving clear speech.
- The app is not feeding silence, clipped audio, or the wrong sample rate to one-shot recognition.
surahHintmatches the current reader context when the user is likely reciting the displayed surah.debugLogging: trueis enabled while diagnosing recognition quality.
In the host app, remove and re-add the package dependency, then choose File > Packages > Reset Package Caches. Confirm the app target links the QuranRecognitionKit product.
Swift Package Index and shields.io cache build and badge results. After pushing a fix or tag, wait for SPI to rescan the package and rebuild the compatibility matrix.
Run:
swift testCurrent tests cover:
- Audio quality gating and low-information transcript suppression.
- Arabic normalization.
- CTC decoding.
- Levenshtein distance and word alignment.
- Verse matching.
- Recitation tracking, surah hints, post-completion discovery, and recovery.
- Regression cases replayed from real on-device recitation logs (premature next-ayah advances, garbled rolling-window decodes, ending-word stem collisions, multi-ayah span resolution).
- Resource loading.
- Model path validation.
- Recognition session start/stop lifecycle with a mock capture source.
The manual harness is the integration path for real model + audio validation.
QuranRecognitionKit source code is available under the MIT license. See LICENSE.
The bundled model has separate upstream attribution and license terms. See MODEL_NOTICE.md.