Kotlin Speech Features

Quick Links

📒 Introduction

This library is a complete port of python_speech_features in pure Kotlin.

It provides common speech features for Automated speech recognition (ASR) including MFCCs and filterbank energies.
To know more about MFCCs read more.

Features

🙋 How to use

We support multiple platforms using Kotlin multiplatform.

Android

Integration

Add jitpack.io to your project's repositories:

allProjects {
  repositories {
    google()
    maven { url 'https://jitpack.io' }
  }
}

Add artifact to your project:

dependencies {
    implementation "com.github.MerlynMind:kotlin_speech_features:${version}"
}

Example implementation

A sample app is included in this repo to help understand the implementation.

Convert your audio signal in the form of a float array. (A demo provided in the sample app)

Initialize speech features

private val speechFeatures = SpeechFeatures()

Perform any of the 4 operations:

val result = speechFeatures.mfcc(MathUtils.normalize(wav), nFilt = 64)
val result = speechFeatures.fbank(MathUtils.normalize(wav), nFilt = 64)
val result = speechFeatures.logfbank(MathUtils.normalize(wav), nFilt = 64)
val result = speechFeatures.ssc(MathUtils.normalize(wav), nFilt = 64)

The result will contain metrices with the expected features. Pass in these features for further processes (e.g. classification, speech recognition).

iOS

Integration

In XCode, go to File > Add Packages...
Paste in the URL of this repo in the search box
Select the package found
Click Add Package button

Example implementation

A sample app is included in this repo to help understand the implementation.

Convert your audio signal in the form of an KotlinIntArray and normalize it.

import KotlinSpeechFeatures

let signal = [Int](1...1000) // Example signal
let normalized = MathUtils.Companion.init().normalize(sig: toKotlinIntArray(arr: signal))

func toKotlinIntArray(arr: [Int]) -> KotlinIntArray {
    let result = KotlinIntArray(size: Int32(arr.capacity))
    for i in 0...(arr.count-1) {
        result.set(index: Int32(i), value: Int32(arr[i]))
    }
    return result
}

Initialize speech features
```
let speechFeatures = SpeechFeatures()
```

Perform any of the 4 operations:

let result = speechFeatures.mfcc(signal: normalized, sampleRate: 16000, winLen: 0.025, winStep: 0.01, numCep: 13, nFilt: 64, nfft: 512, lowFreq: 0, highFreq: ni;, preemph: 0.97, ceplifter: 22, appendEnergy: true, winFunc: nil)
let result = speechFeatures.fbank(signal: normalized, sampleRate: 16000, winLen: 0.025, winStep: 0.01, nFilt: 64, nfft: 512, lowFreq: 0, highFreq: nil, preemph: 0.97, winFunc: nil)
let result = speechFeatures.logfbank(signal: normalized, sampleRate: 16000, winLen: 0.025, winStep: 0.01, nFilt: 64, nfft: 512, lowFreq: 0, highFreq: nil, preemph: 0.97, winFunc: nil)
let result = speechFeatures.ssc(signal: normalized, sampleRate: 16000, winLen: 0.025, winStep: 0.01, nFilt: 64, nfft: 512, lowFreq: 0, highFreq: nil, preemph: 0.97, winFunc: nil)

The result will contain metrices with the expected features. Pass in these features for further processes (e.g. classification, speech recognition).

JavaScript

Coming soon...

✍️ Contributing

Interested in contributing to the library? Thank you so much for your interest! We are always looking for improvements to the project and contributions from open-source developers are greatly appreciated.

Clone repo and create a new branch:

git checkout https://github.com/merlynmind/kotlin_speech_features -b name_for_new_branch

Make changes and test
Submit Pull Request with comprehensive description of changes

🌟 Spread the word!

If you want to say thank you and/or support active development of this library:

Add a GitHub Star to the project!
Tweet about the project on your Twitter! Tag @MerlynMind and/or #heyMerlnyn

Thank you so much for your interest in growing the reach of our library!

🧡 Credits

Arjun Sunil - Original Author of kotlin speech features
Raquib-Ul Alam - For major refactoring and making the code presentable
Rob Smith - For Mentoring and helping us to navigate through the task

📝 References

Original library - Python Speech Features
Reference Library - C Speech Features
Sample english.wav was obtained from

wget http://voyager.jpl.nasa.gov/spacecraft/audio/english.au
sox english.au -e signed-integer english.wav

Error processing