- 02 Sep, 2024 1 commit
-
-
mayp777 authored
-
- 09 Aug, 2022 1 commit
-
-
Caroline Chen authored
Summary: Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs. The `ctc_decoder` API is as follows - To decode with KenLM, pass in KenLM language model path to `lm` variable - To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`. - To decode without a language model, set `lm` to `None` (default) Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM. Follow ups: - Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory cc jacobkahn Pull Request resolved: https://github.com/pytorch/audio/pull/2528 Reviewed By: mthrok Differential Revision: D38243802 Pulled By: carolineechen fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7
-
- 26 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
- 21 Jan, 2022 1 commit
-
-
moto authored
Summary: Split from https://github.com/pytorch/audio/issues/2164 Add new test assets. Adding this commit separately so that this commit message about the origin of the file is easier to find. The original video is in public domain par - https://svs.gsfc.nasa.gov/13013 - https://www.nasa.gov/multimedia/guidelines/index.html (The YouTube page directly says so) - https://www.youtube.com/watch?v=6zNsc0e3Zns So, the video is modified to fit the needs for testing. 1. multiple audio/video streams 2. Non-audio/video (subtitle) streams 3. Different FPS and sampling rate 4. Ones without audio and video. ``` #!/usr/bin/env bash original=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 subtitle=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-SRT-CC.en_US.srt # Fetch the original video, embed the subtitle ffmpeg -i "${original}" -i "${subtitle}" -c:v copy -c:a copy -c:s mov_text -metadata:s:2 language=eng original.mp4 -y # Extract, rescale video and resample audio ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=480:270 -af aresample=16000 tmp1.mp4 -y ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=320:180 -r 25 -af aresample=8000 tmp2.mp4 -y # Merge them, retaining all the streams (6 in total) ffmpeg -i tmp2.mp4 -i tmp1.mp4 -map 0 -map 1 -c:s copy nasa_13013.mp4 -y # Make versions without audio / video ffmpeg -i tmp2.mp4 -c copy -vn nasa_13013_no_video.mp4 -y ffmpeg -i tmp2.mp4 -c copy -an nasa_13013_no_video.mp4 -y ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2167 Reviewed By: carolineechen Differential Revision: D33712954 Pulled By: mthrok fbshipit-source-id: b7cfc1358043a4abd1c0b416e8a8fb0039867211
-
- 23 Dec, 2021 2 commits
-
-
Caroline Chen authored
Summary: Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review This PR adds Python decoder API and basic README Pull Request resolved: https://github.com/pytorch/audio/pull/2089 Reviewed By: mthrok Differential Revision: D33299818 Pulled By: carolineechen fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
- 18 Nov, 2021 1 commit
-
-
Facebook Community Bot authored
Co-authored-by:Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
-
- 17 Nov, 2021 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2015 as titled Reviewed By: hwangjeff, mthrok Differential Revision: D32495691 fbshipit-source-id: 60d8a2337585e3147f24ca9f0b6518e30cd9134a
-
- 22 Oct, 2021 1 commit
-
-
moto authored
- Make the test support other languages - Fetch tetst asset on-the-fly
-
- 05 Oct, 2021 1 commit
-
-
moto authored
-
- 28 Sep, 2021 1 commit
-
-
moto authored
This commit adds the following HuBERT model architectures - `base` (pre-training) - `large` (pre-training / fine-tuning) - `xlarge` (pre-training / fine-tuning) Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules.. With these models, it is possible to - import the pre-trained model published by `fairseq` and TorchScript it. - fine-tune the existing model for downstream task.
-
- 22 Sep, 2021 1 commit
-
-
moto authored
Summary: Update fairseq reference from master to main elsewhere Reviewed By: alexeib Differential Revision: D30938472 fbshipit-source-id: 243b98550207f241c9d3265bf3d4060350aaf0a8 Co-authored-by:Diana Liskovich <dianaml@fb.com>
-
- 04 Jun, 2021 1 commit
-
-
moto authored
`torchaudio.compliance.kaldi.resample_waveform` has been replaced with `torchaudio.funcitonal.resample`.
-
- 01 Jun, 2021 1 commit
-
-
moto authored
-
- 27 May, 2021 1 commit
-
-
moto authored
-
- 15 Mar, 2021 1 commit
-
-
moto authored
-
- 09 Feb, 2021 1 commit
-
-
moto authored
-
- 30 Dec, 2020 1 commit
-
-
Aziz authored
-
- 22 Dec, 2020 1 commit
-
-
moto authored
-
- 05 Aug, 2020 1 commit
-
-
moto authored
We have been running unit test with editable installation. (i.e. `python setup.py develop`), with which we missed issues like #842. This CC makes installation in CI non-editable, and change test directory structure so that the source code will not shadow the installed version of `torchaudio`. With simple `pytest test`, `pytest` modifies `sys.path` and prepend checked out repository, which shadows the installed version. To remedy this, the whole test suites has been moved from `./test` to `./test/torchaudio_unittest`. This adds nice module structure to our test code and we can do absolute import in each test module, which makes it possible again to run test with `python -m unittest torchaudio_unittest/XXX.py` This change does not affect the regular development process (`python setup.py develop` && `pytest test`)
-