kotlin_speech_features

1.0.0

This library provides common speech features for ASR including MFCCs and filterbank energies for Android and iOS.
EmergenceAI/kotlin_speech_features

What's New

v1.0.0

2022-09-19T15:14:09Z

Kotlin Speech Features

ย  ย  ย 


Quick Links

ย 

๐Ÿ“’ Introduction

This library is a complete port of python_speech_features in pure Kotlin.

It provides common speech features for Automated speech recognition (ASR) including MFCCs and filterbank energies.
To know more about MFCCs read more.

Features

๐Ÿ™‹ How to use

We support multiple platforms using Kotlin multiplatform.

Android

Integration

Add jitpack.io to your project's repositories:

allProjects {
  repositories {
    google()
    maven { url 'https://jitpack.io' }
  }
}

Add artifact to your project:

dependencies {
    implementation "com.github.MerlynMind:kotlin_speech_features:${version}"
}

Example implementation

A sample app is included in this repo to help understand the implementation.

  1. Convert your audio signal in the form of a float array. (A demo provided in the sample app)
  2. Initialize speech features
    private val speechFeatures = SpeechFeatures()
  3. Perform any of the 4 operations:
    val result = speechFeatures.mfcc(MathUtils.normalize(wav), nFilt = 64)
    val result = speechFeatures.fbank(MathUtils.normalize(wav), nFilt = 64)
    val result = speechFeatures.logfbank(MathUtils.normalize(wav), nFilt = 64)
    val result = speechFeatures.ssc(MathUtils.normalize(wav), nFilt = 64)
  4. The result will contain metrices with the expected features. Pass in these features for further processes (e.g. classification, speech recognition).

iOS

Integration

  1. In XCode, go to File > Add Packages...
  2. Paste in the URL of this repo in the search box
  3. Select the package found
  4. Click Add Package button

Example implementation

A sample app is included in this repo to help understand the implementation.

  1. Convert your audio signal in the form of an KotlinIntArray and normalize it.
    import KotlinSpeechFeatures
    
    let signal = [Int](1...1000) // Example signal
    let normalized = MathUtils.Companion.init().normalize(sig: toKotlinIntArray(arr: signal))
    
    func toKotlinIntArray(arr: [Int]) -> KotlinIntArray {
        let result = KotlinIntArray(size: Int32(arr.capacity))
        for i in 0...(arr.count-1) {
            result.set(index: Int32(i), value: Int32(arr[i]))
        }
        return result
    }
  2. Initialize speech features
    let speechFeatures = SpeechFeatures()
  3. Perform any of the 4 operations:
    let result = speechFeatures.mfcc(signal: normalized, sampleRate: 16000, winLen: 0.025, winStep: 0.01, numCep: 13, nFilt: 64, nfft: 512, lowFreq: 0, highFreq: ni;, preemph: 0.97, ceplifter: 22, appendEnergy: true, winFunc: nil)
    let result = speechFeatures.fbank(signal: normalized, sampleRate: 16000, winLen: 0.025, winStep: 0.01, nFilt: 64, nfft: 512, lowFreq: 0, highFreq: nil, preemph: 0.97, winFunc: nil)
    let result = speechFeatures.logfbank(signal: normalized, sampleRate: 16000, winLen: 0.025, winStep: 0.01, nFilt: 64, nfft: 512, lowFreq: 0, highFreq: nil, preemph: 0.97, winFunc: nil)
    let result = speechFeatures.ssc(signal: normalized, sampleRate: 16000, winLen: 0.025, winStep: 0.01, nFilt: 64, nfft: 512, lowFreq: 0, highFreq: nil, preemph: 0.97, winFunc: nil)
  4. The result will contain metrices with the expected features. Pass in these features for further processes (e.g. classification, speech recognition).
JavaScript
Coming soon...

โœ๏ธ Contributing

Interested in contributing to the library? Thank you so much for your interest! We are always looking for improvements to the project and contributions from open-source developers are greatly appreciated.

  1. Clone repo and create a new branch:
git checkout https://github.com/merlynmind/kotlin_speech_features -b name_for_new_branch
  1. Make changes and test
  2. Submit Pull Request with comprehensive description of changes

๐ŸŒŸ Spread the word!

If you want to say thank you and/or support active development of this library:

  • Add a GitHub Star to the project!
  • Tweet about the project on your Twitter! Tag @MerlynMind and/or #heyMerlnyn

Thank you so much for your interest in growing the reach of our library!

๐Ÿงก Credits

  • Arjun Sunil - Original Author of kotlin speech features
  • Raquib-Ul Alam - For major refactoring and making the code presentable
  • Rob Smith - For Mentoring and helping us to navigate through the task

๐Ÿ“ References

wget http://voyager.jpl.nasa.gov/spacecraft/audio/english.au
sox english.au -e signed-integer english.wav

Description

  • Swift Tools
View More Packages from this Author

Dependencies

  • None
Last updated: Thu Oct 17 2024 17:56:38 GMT-0900 (Hawaii-Aleutian Daylight Time)