Text To Speech (TTS) library for Swift
TTSKit was developed to solve the problem of intelligibility when synthesizing text. This is often an issue when generating short, single-word utterances without additional context. For example, spelling word drills and other similar use cases are especially troublesome. While the built-in AVFoundation utilities are useful for longer texts with more contextual support, TTSKit focuses on intelligibility over naturalness. TTSKit is based on CMU Flite, which is used for synthesis.
TTSKit is available as an SPM package.
- Add TTSKit to your Xcode project by selecting File, Add Package Dependencies... and entering the TTSKit GitHub repository URL:
https://github.com/ameter/TTSKitin the search box.
-
Select the Dependency Rule you want to use. Up to Next Major Version is a good choice to allow non-breaking updates for dependencies that use semantic versioning.
-
Click Add Package. Xcode will display the products. TTSKit ships with 2 products:
- TTSKit - the core library.
- TTSVoiceLibrary - a collection of optional voices.
- Select your app's target for each product you want to add. For prototyping and development, go ahead and add both products to your app's target. For additional information on voices, including best practices for production apps, see the Voices section of this README.
- Click Add Package.
import TTSKit
...
let tts = TTSKit()
try? tts.speak(text: "Hello, World!")By default, speak(text:) starts speaking immediately and interrupts any audio that is already playing. If you want multiple utterances to play back-to-back, use the queueing API.
Pass queue: true to append an utterance to the current playback queue instead of interrupting the one that is already in progress.
import TTSKit
let tts = TTSKit()
do {
try tts.speak(text: "The first message starts right away.")
try tts.speak(text: "This message is queued and will play after the first one finishes.", queue: true)
} catch {
print("TTS error: \(error)")
}Use the optional completion handler when you want to update your UI or trigger additional work after an utterance finishes playing.
import TTSKit
let tts = TTSKit()
try? tts.speak(text: "Hello, World!") {
print("Finished speaking.")
}Call pause() to temporarily halt playback and resume() to continue from the paused position.
tts.pause()
tts.resume()Call stop() to stop the current utterance immediately. This also clears any queued audio.
tts.stop()You can also inspect tts.isPlaying to drive the enabled state of playback controls in your UI.
See the Demo App for additional usage examples. It also provides a quick way to get started with TTSKit and to experiment with different voices.
TTSKit ships with built-in male and female voices. Additional voices are available in the optional TTSVoiceLibrary product. You can also load your own voices. All flitevox compatible voices are supported. If you do not load a specific voice, the default female voice will be used.
There are three different ways to load voices:
import TTSKit
...
let tts = TTSKit()
tts.loadVoice(.male)import TTSKit
import TTSVoiceLibrary
...
let tts = TTSKit()
try? tts.loadVoice(fromLibrary: .cmuUsRms)import TTSKit
...
let tts = TTSKit()
if let voiceURL = Bundle.main.url(forResource: "cmu_us_rms", withExtension: "flitevox") {
try? tts.loadVoice(at: voiceURL)
}For production apps, you should only include the full TTSVoiceLibrary if you specifically want to include all available voices. Otherwise, you can decrease your app's bundle size by not including TTSVoiceLibrary in your app's target. If you previously added it, you can remove it by navigating to your App's Target, General, Frameworks, Libraries, and Embedded Content, selecting TTSVoiceLibrary, and clicking the Minus.
You can import individual voices by copying the .flitevox files into your app and then loading them via the URL.
TTSKit exposes voice tuning through tts.settings. Settings are stored on the TTSKit instance, applied immediately to the currently loaded voice, and automatically re-applied when you load a different voice later.
import TTSKit
let tts = TTSKit()
tts.settings.duration = 0.9
tts.settings.shift = 1.05
tts.settings.pitchMean = 160
tts.settings.pitchStdDeviation = 20
try? tts.speak(text: "Customized speech.")Controls overall speaking rate by stretching the synthesized duration.
1.0is the default rate.- Values greater than
1.0slow speech down. - Values less than
1.0speed speech up.
Scales the voice's pitch contour.
1.0is neutral.- Values greater than
1.0raise the perceived pitch. - Values less than
1.0lower the perceived pitch.
Sets the baseline pitch in Hz. This is useful when you want to move the overall voice higher or lower without changing how much variation it has.
Controls how much the pitch varies around pitchMean, in Hz.
- Higher values sound more expressive.
- Lower values sound flatter and more even.
Enables TTSKit's pronunciation substitutions for short utterances and single words. This setting defaults to true and can improve intelligibility for spelling drills and similar use cases.
tts.settings.substitutionsEnabled = falseCall clear() to remove any custom duration, shift, pitchMean, and pitchStdDeviation values and return to the voice defaults. clear() does not change substitutionsEnabled.
tts.settings.clear()I ❤️ community contributions. An effective workflow for library development is to create a workspace (e.g. TTSKit.workspace) outside the TTSKit repo folder and then add both the Demo App and the TTSKit folder to the workspace. Please submit pull requests for any changes you would like to see included in the library.


