Algorithms

Sound Event Detection

  1. YamNetarrow-up-right, and Real-time sound event detection githubarrow-up-right, Event types labels listarrow-up-right - Relevant labels: 420:430

Query-based separation

  1. Zero Shot Audio Source Separationarrow-up-right, paperarrow-up-right, interfacearrow-up-right - is a three-component pipeline that allows you to train an audio source separator to separate any source from the track. All you need is a mixture audio to separate, and a given source sample as a query. Then the model will separate your specified source from the track.

Audio Source Separation

  1. Audio Separrow-up-right - AudioSep is a foundation model for open-domain sound separation with natural language queries. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability on numerous tasks such as audio event separation, musical instrument separation, and speech enhancement"

Blind Source Separation

  1. Deep Audio Priorarrow-up-right - Our deep audio prior can enable several audio applications: blind sound source separation, interactive mask-based editing, audio textual synthesis, and audio watermarker removal.

  2. BSS (EM source separationarrow-up-right) - This repository covers EM algorithms to separate speech sources in multi-channel recordings. In particular, the repository contains methods to integrate Deep Clustering (a neural network-based source separation algorithm) with a probabilistic spatial mixture model as proposed in the Interspeech paper "Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings" presented at Interspeech 2017 in Stockholm.

Image embeddings and others

  1. Openl3arrow-up-right - OpenL3: Open-source deep audio and image embeddings

  2. Speaker recognitionarrow-up-right - Speaker recognition is the identification of a person given an audio file. It is used to answer the question "Who is speaking?" Speaker verification (also called speaker authentication) is similar to speaker recognition, but instead of returning the speaker who is speaking, it returns whether the speaker (who is claiming to be a certain one) is truthful or not. Speaker Verification is considered to be a little easier than speaker recognition.

Other Tools

  1. KALDIarrow-up-right speech recognition toolkit with many SOTA models.

  2. speech recognition with DL -arrow-up-right how to convert sounds to vectors, feeding into an RNN.

  3. Geckoarrow-up-right - (github.com/gong-io/geckoarrow-up-right) youtubearrow-up-right, is an open-source tool for the annotation of the linguistic content of conversations. It can be used for segmentation, diarization, and transcription. With Gecko, you can create and perfect audio-based datasets, compare the results of multiple models simultaneously, and highlight differences between transcriptions.

Last updated

Was this helpful?