Commits · ffeba11aff8bc9f1c3f7b214da74e44f97f456ef · OpenDAS / Torchaudio

02 Sep, 2024 1 commit
- UPDATE · ffeba11a
  mayp777 authored Sep 02, 2024
  
  ffeba11a
09 Aug, 2022 1 commit

Add NNLM support to CTC Decoder (#2528) · 03a0d68e

Caroline Chen authored Aug 09, 2022

Summary:
Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.

The `ctc_decoder` API is as follows
- To decode with KenLM, pass in KenLM language model path to `lm` variable
- To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
- To decode without a language model, set `lm` to `None` (default)

Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.

Follow ups:
- Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/2528

Reviewed By: mthrok

Differential Revision: D38243802

Pulled By: carolineechen

fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7

03a0d68e

26 Apr, 2022 1 commit

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

21 Jan, 2022 1 commit

Add video test asset for streaming API (#2167) · db10bdfb

moto authored Jan 21, 2022

Summary:
Split from https://github.com/pytorch/audio/issues/2164
Add new test assets. Adding this commit separately so that
this commit message about the origin of the file is easier to find.

The original video is in public domain par
- https://svs.gsfc.nasa.gov/13013
- https://www.nasa.gov/multimedia/guidelines/index.html
(The YouTube page directly says so)
- https://www.youtube.com/watch?v=6zNsc0e3Zns

So, the video is modified to fit the needs for testing.
1. multiple audio/video streams
2. Non-audio/video (subtitle) streams
3. Different FPS and sampling rate
4. Ones without audio and video.

```
#!/usr/bin/env bash
original=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
subtitle=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-SRT-CC.en_US.srt

# Fetch the original video, embed the subtitle
ffmpeg -i "${original}" -i "${subtitle}" -c:v copy -c:a copy -c:s mov_text -metadata:s:2 language=eng original.mp4 -y

# Extract, rescale video and resample audio
ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=480:270 -af aresample=16000 tmp1.mp4 -y
ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=320:180 -r 25 -af aresample=8000  tmp2.mp4 -y

# Merge them, retaining all the streams (6 in total)
ffmpeg -i tmp2.mp4 -i tmp1.mp4 -map 0 -map 1 -c:s copy nasa_13013.mp4 -y

# Make versions without audio / video
ffmpeg -i tmp2.mp4 -c copy -vn nasa_13013_no_video.mp4 -y
ffmpeg -i tmp2.mp4 -c copy -an nasa_13013_no_video.mp4 -y
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2167

Reviewed By: carolineechen

Differential Revision: D33712954

Pulled By: mthrok

fbshipit-source-id: b7cfc1358043a4abd1c0b416e8a8fb0039867211

db10bdfb

23 Dec, 2021 2 commits

Add Python CTC decoder API (#2089) · a76b0066

Caroline Chen authored Dec 23, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review

This PR adds Python decoder API and basic README

Pull Request resolved: https://github.com/pytorch/audio/pull/2089

Reviewed By: mthrok

Differential Revision: D33299818

Pulled By: carolineechen

fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc

a76b0066

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

18 Nov, 2021 1 commit
- Re-sync with internal repository (#2017) · b4184dc6
  Facebook Community Bot authored Nov 18, 2021
```
Co-authored-by: Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
```
  b4184dc6
17 Nov, 2021 1 commit

Remove facebook folder in wav2vec unittests (#2015) · 2a5fe5ff

Zhaoheng Ni authored Nov 17, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2015

as titled

Reviewed By: hwangjeff, mthrok

Differential Revision: D32495691

fbshipit-source-id: 60d8a2337585e3147f24ca9f0b6518e30cd9134a

2a5fe5ff

22 Oct, 2021 1 commit
- Refactor integration test (#1922) · 19d8f1c2
  moto authored Oct 22, 2021
```
- Make the test support other languages
- Fetch tetst asset on-the-fly
```
  19d8f1c2
05 Oct, 2021 1 commit
- Add HUBERT_BASE and HUBERT_ASR_LARGE pretrained models (#1821) · 358e9e93
  moto authored Oct 05, 2021
  
  358e9e93
28 Sep, 2021 1 commit

Add HuBERT model architectures (#1769) · a7854f33

moto authored Sep 28, 2021

This commit adds the following HuBERT model architectures

 - `base` (pre-training)
 - `large` (pre-training / fine-tuning)
 - `xlarge` (pre-training / fine-tuning)

Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
With these models, it is possible to 
- import the pre-trained model published by `fairseq` and TorchScript it.
- fine-tune the existing model for downstream task.

a7854f33

22 Sep, 2021 1 commit

Update reference from master to main elsewhere (#1784) · 1b4b82e0

moto authored Sep 22, 2021



Summary: Update fairseq reference from master to main elsewhere

Reviewed By: alexeib

Differential Revision: D30938472

fbshipit-source-id: 243b98550207f241c9d3265bf3d4060350aaf0a8
Co-authored-by: Diana Liskovich <dianaml@fb.com>

1b4b82e0

04 Jun, 2021 1 commit

[BC-Breaking] Remove kaldi.resample_waveform (#1555) · 30de797c

moto authored Jun 04, 2021

`torchaudio.compliance.kaldi.resample_waveform` has been replaced with `torchaudio.funcitonal.resample`.

30de797c

01 Jun, 2021 1 commit
- Add wav2vec2 fairseq importer (#1531) · f1a0b605
  moto authored Jun 01, 2021
  
  f1a0b605
27 May, 2021 1 commit
- Add wav2vec2 HuggingFace importer (#1530) · c8239c64
  moto authored May 27, 2021
  
  c8239c64
15 Mar, 2021 1 commit
- Fix JSON Lines file extension (#1392) · 32cd700a
  moto authored Mar 15, 2021
  
  32cd700a
09 Feb, 2021 1 commit
- Add Kaldi Pitch feature (#1243) · 7ee1c46b
  moto authored Feb 09, 2021
  
  7ee1c46b
30 Dec, 2020 1 commit
- Make Dataset utility test independent of CommonVoice (#1132) · 93c3025f
  Aziz authored Dec 30, 2020
  
  93c3025f
22 Dec, 2020 1 commit
- Add format override to load and related I/O functions (#1104) · be442561
  moto authored Dec 22, 2020
  
  be442561
05 Aug, 2020 1 commit

[CI] Run unit test with non-editable installation (#845) · 9ba02d5b

moto authored Aug 04, 2020

We have been running unit test with editable installation. (i.e. `python setup.py develop`), with which we missed issues like #842.

This CC makes installation in CI non-editable, and change test directory structure so that the source code will not shadow the installed version of `torchaudio`. With simple `pytest test`, `pytest` modifies `sys.path` and prepend checked out repository, which shadows the installed version.

To remedy this, the whole test suites has been moved from `./test` to `./test/torchaudio_unittest`. This adds nice module structure to our test code and we can do absolute import in each test module, which makes it possible again to run test with `python -m unittest torchaudio_unittest/XXX.py`

This change does not affect the regular development process (`python setup.py develop` && `pytest test`)

9ba02d5b