A Swift Package for Microsoft Edge Text-to-Speech (TTS) API integration. This package provides a clean, simple interface to generate high-quality audio files from text using Edge-TTS without any Python dependencies.
- ✅ Pure Swift implementation - no external dependencies or Python required
- ✅ Simple, clean API - easy to use async/await interface
- ✅ 400+ neural voices across 100+ languages and locales
- ✅ High-quality MP3 audio output (24kHz, 48kbitrate)
- ✅ Automatic SSML generation with proper XML escaping
- ✅ Thread-safe token caching and clock synchronization
- ✅ iOS 15+ and macOS 12+ support
- ✅ Comprehensive error handling with detailed error types
Add the package to your Package.swift:
dependencies: [
.package(url: "https://github.com/herrkaefer/SwiftEdgeTTS.git", from: "1.0.0")
]Or in Xcode:
- File → Add Packages...
- Enter the repository URL:
https://github.com/herrkaefer/SwiftEdgeTTS.git - Select the version or branch
import SwiftEdgeTTS
// Create a TTS service instance
let ttsService = EdgeTTSService()
// Synthesize text to audio file
let outputURL = FileManager.default.temporaryDirectory
.appendingPathComponent("output.mp3")
do {
let audioURL = try await ttsService.synthesize(
text: "Hello, world!",
voice: "en-US-JennyNeural",
outputURL: outputURL
)
print("Audio saved to: \(audioURL.path)")
} catch {
print("Error: \(error)")
}Generate multiple audio files at once:
let texts = [
"First sentence.",
"Second sentence.",
"Third sentence."
]
let outputDirectory = FileManager.default.temporaryDirectory
.appendingPathComponent("audio")
let results = try await ttsService.synthesizeMultiple(
texts: texts,
voice: "en-US-JennyNeural",
outputDirectory: outputDirectory
)
// Process results (nil indicates a failed synthesis)
for (index, url) in results.enumerated() {
if let url = url {
print("File \(index + 1) saved: \(url.path)")
} else {
print("File \(index + 1) failed to generate")
}
}Discover available voices and filter by language:
let voices = try await ttsService.getAvailableVoices()
// Filter by language
let chineseVoices = voices.filter { $0.locale.hasPrefix("zh") }
let englishVoices = voices.filter { $0.locale.hasPrefix("en") }
// Print voice information
for voice in chineseVoices.prefix(5) {
print("\(voice.name) - \(voice.locale) - \(voice.gender)")
}// Chinese (Mandarin)
try await ttsService.synthesize(
text: "你好,世界!",
voice: "zh-CN-XiaoxiaoNeural",
outputURL: chineseURL
)
// Japanese
try await ttsService.synthesize(
text: "こんにちは、世界!",
voice: "ja-JP-NanamiNeural",
outputURL: japaneseURL
)
// Spanish
try await ttsService.synthesize(
text: "Hola, mundo.",
voice: "es-ES-ElviraNeural",
outputURL: spanishURL
)
// French
try await ttsService.synthesize(
text: "Bonjour, le monde.",
voice: "fr-FR-DeniseNeural",
outputURL: frenchURL
)
// German
try await ttsService.synthesize(
text: "Hallo, Welt.",
voice: "de-DE-KatjaNeural",
outputURL: germanURL
)protocol EdgeTTSClient {
func synthesize(text: String, voice: String, outputURL: URL) async throws -> URL
func synthesizeMultiple(texts: [String], voice: String, outputDirectory: URL) async throws -> [URL?]
func getAvailableVoices() async throws -> [EdgeTTSVoice]
}The default implementation of EdgeTTSClient.
let client = EdgeTTSService()The package uses EdgeTTSError for error handling:
enum EdgeTTSError: Error {
case synthesisFailed
case invalidVoice
case networkError(Error)
case invalidResponse
case fileWriteFailed(Error)
}Example:
do {
let audioURL = try await client.synthesize(
text: "Hello",
voice: "en-US-JennyNeural",
outputURL: outputURL
)
} catch EdgeTTSError.synthesisFailed {
print("Synthesis failed")
} catch EdgeTTSError.networkError(let error) {
print("Network error: \(error)")
} catch {
print("Unknown error: \(error)")
}Voices follow the format: {locale}-{VoiceName}Neural
en-US-JennyNeural- English (US), Femaleen-US-GuyNeural- English (US), Maleen-GB-LibbyNeural- English (UK), Femaleen-AU-NatashaNeural- English (Australia), Female
zh-CN-XiaoxiaoNeural- Chinese (Mandarin), Femalezh-CN-YunjianNeural- Chinese (Mandarin), Malezh-TW-HsiaoChenNeural- Chinese (Taiwan), Femalezh-HK-HiuGaaiNeural- Chinese (Hong Kong), Female
ja-JP-NanamiNeural- Japanese, Femaleja-JP-KeitaNeural- Japanese, Male
es-ES-ElviraNeural- Spanish (Spain), Femalefr-FR-DeniseNeural- French, Femalede-DE-KatjaNeural- German, Femaleko-KR-SunHiNeural- Korean, Femaleit-IT-ElsaNeural- Italian, Femalept-BR-FranciscaNeural- Portuguese (Brazil), Femaleru-RU-SvetlanaNeural- Russian, Female
Use getAvailableVoices() to discover all available voices for your use case.
The package provides detailed error types for better error handling:
do {
let audioURL = try await ttsService.synthesize(
text: "Hello",
voice: "en-US-JennyNeural",
outputURL: outputURL
)
} catch EdgeTTSError.synthesisFailed {
print("Audio synthesis failed")
} catch EdgeTTSError.invalidVoice {
print("Invalid voice identifier")
} catch EdgeTTSError.networkError(let error) {
print("Network error: \(error.localizedDescription)")
} catch EdgeTTSError.fileWriteFailed(let error) {
print("Failed to write file: \(error.localizedDescription)")
} catch {
print("Unknown error: \(error)")
}You can provide a custom URLSession for advanced configuration (proxy, timeouts, etc.):
let configuration = URLSessionConfiguration.default
configuration.timeoutIntervalForRequest = 30
configuration.timeoutIntervalForResource = 60
let customSession = URLSession(configuration: configuration)
let ttsService = EdgeTTSService(session: customSession)- iOS 15.0+ / macOS 12.0+
- Swift 5.9+
This work has been inspired by the Python edge-tts library by rany2.
MIT License
Copyright (c) 2024 SwiftEdgeTTS Contributors