- 30 Dec, 2021 5 commits
-
-
hwangjeff authored
Summary: * Removes redundant declaration `right_context_blocks = []`, as flagged by kobenaxie. * Adds random seed to tests, as flagged by carolineechen in other PRs. Pull Request resolved: https://github.com/pytorch/audio/pull/2091 Reviewed By: mthrok Differential Revision: D33340964 Pulled By: hwangjeff fbshipit-source-id: a9de43e28d1bae7bd4806b280717b0d822bb42fc
-
moto authored
Summary: This PR adds `BUILD_FFMPEG` switch to torchaudio build process so that features related to ffmpeg are built. The flag is false by default, so no CI jobs or development flow are affected. This is because handling the dependencies around ffmpeg is a bit tricky. Currently, the CMake file uses `pkg-config` to find an ffmpeg installation in the system. This works fine for both conda-based installation and system-managed installation (like `apt`). In subsequent PRs, I will find a solution that works for local development and binary distributions. Pull Request resolved: https://github.com/pytorch/audio/pull/2048 Reviewed By: hwangjeff, nateanl Differential Revision: D33367260 Pulled By: mthrok fbshipit-source-id: 94517acecb62bd6d4e96d4b7cbc3ab3c2a25706c
-
moto authored
Summary: - Introduce AudioBuffer and VideoBuffer for different way of handling frames - Update the way option dictionary is passed - Remove unused AutoFrameUnref - Add SrcStreamInfo/OutputStreamInfo classes Pull Request resolved: https://github.com/pytorch/audio/pull/2113 Reviewed By: nateanl Differential Revision: D33356144 Pulled By: mthrok fbshipit-source-id: e837e84fae48baa7befd5c70599bcd2cbb61514d
-
CodemodService Bot authored
Reviewed By: zertosh Differential Revision: D33361077 fbshipit-source-id: 007db010bd38c28f597ea66f68f97b13309e878c
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review. Add `Streamer` TorchBind. For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md. Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later. Needs to be imported after https://github.com/pytorch/audio/issues/2046. Pull Request resolved: https://github.com/pytorch/audio/pull/2047 Reviewed By: hwangjeff Differential Revision: D33355190 Pulled By: mthrok fbshipit-source-id: a3ad4c2822ed3a7ddc19b1aaca9dddabd59ce2f8
-
- 29 Dec, 2021 9 commits
-
-
hwangjeff authored
Summary: Adds parameter `p` to `TimeMasking` to allow for enforcing an upper bound on the proportion of time steps that it can mask. This behavior is consistent with the specifications provided in the SpecAugment paper (https://arxiv.org/abs/1904.08779). Pull Request resolved: https://github.com/pytorch/audio/pull/2090 Reviewed By: carolineechen Differential Revision: D33344772 Pulled By: hwangjeff fbshipit-source-id: 6ff65f5304e489fa1c23e15c3d96b9946229fdcf
-
Caroline Chen authored
Summary: Additionally accept list of tokens as CTC decoder input. This makes it possible to directly pass in something like `bundles.get_labels()` into the decoder factory function instead of requiring a separate tokens file. Pull Request resolved: https://github.com/pytorch/audio/pull/2112 Reviewed By: hwangjeff, nateanl, mthrok Differential Revision: D33352909 Pulled By: carolineechen fbshipit-source-id: 6d22072e34f6cd7c6f931ce4eaf294ae4cf0c5cc
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review. Add `Streamer` class that bundles `StreamProcessor` and handle input. For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md. Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later. Needs to be imported after https://github.com/pytorch/audio/issues/2045. Pull Request resolved: https://github.com/pytorch/audio/pull/2046 Reviewed By: carolineechen Differential Revision: D33299863 Pulled By: mthrok fbshipit-source-id: 6470cbe061057c8cb970ce7bb5692be04efb5fe9
-
hwangjeff authored
Summary: Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`. Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html Pull Request resolved: https://github.com/pytorch/audio/pull/2110 Reviewed By: carolineechen, mthrok Differential Revision: D33354116 Pulled By: hwangjeff fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review. Add StreamProcessor class that bundles `Buffer`, `FilterGraph` and `Decoder`. Note: The API to retrieve the buffered Tensors is tentative. For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md. Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later. Needs to be imported after https://github.com/pytorch/audio/issues/2044. Pull Request resolved: https://github.com/pytorch/audio/pull/2045 Reviewed By: carolineechen Differential Revision: D33299858 Pulled By: mthrok fbshipit-source-id: d85bececed475f45622743f137dd59cb1390ceed
-
moto authored
Summary: Add Sink class that bundles FilterGraph and Buffer. Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review. For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md. Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later. Pull Request resolved: https://github.com/pytorch/audio/pull/2111 Reviewed By: carolineechen Differential Revision: D33350388 Pulled By: mthrok fbshipit-source-id: 8f42c5fe4be39ef2432c51fc0d0ac72ba3f06a26
-
CodemodService Bot authored
Reviewed By: zertosh Differential Revision: D33347867 fbshipit-source-id: 7672f65392e363c0359de2d86e745782a09cf9dc
-
moto authored
Summary: ### Change list * Split the documentation of prototypes * Add a new API reference section dedicated for prototypes. * Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder * Hide the signature of RNNT constructor. (cc hwangjeff ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT * Tweak CTC tutorial * Replace hyperlinks to API reference with backlinks * Add `progress=False` to download ### Follow-up RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous. I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough. ### Before https://pytorch.org/audio/main/prototype.html <img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png"> ### After https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html <img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html <img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html <img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2108 Reviewed By: hwangjeff, carolineechen, nateanl Differential Revision: D33340816 Pulled By: mthrok fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187
-
hwangjeff authored
Summary: Adds pretrained Emformer RNN-T inference pipeline that's capable of performing streaming and non-streaming ASR. Includes demo script that uses pipeline to alternately perform streaming and non-streaming ASR on LibriSpeech test samples (video below). https://user-images.githubusercontent.com/8345689/147590753-d5126557-d575-4551-8dfe-5977276cb4ad.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2093 Reviewed By: mthrok Differential Revision: D33340776 Pulled By: hwangjeff fbshipit-source-id: fbb3b1d471b4e9f1b93fa9dea9c464154537a8ac
-
- 28 Dec, 2021 6 commits
-
-
Zhaoheng Ni authored
Summary: Remove it as it's already introduced in the [gallery](https://github.com/pytorch/audio/blob/main/examples/tutorials/mvdr_tutorial.py). Pull Request resolved: https://github.com/pytorch/audio/pull/2109 Reviewed By: carolineechen Differential Revision: D33341574 Pulled By: nateanl fbshipit-source-id: e5c1c8537063b9453947dc3ecafa70e9b6c74146
-
Caroline Chen authored
Summary: demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html follow-ups: - incorporate `nbest` - demonstrate customizability of different beam search parameters Pull Request resolved: https://github.com/pytorch/audio/pull/2106 Reviewed By: mthrok Differential Revision: D33340946 Pulled By: carolineechen fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7
-
Zhaoheng Ni authored
Summary: - Add three factory functions:`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`, to enable the HuBERT model to train from scratch. - Add `num_classes` argument to `hubert_pretrain_base` factory function because the base model has two iterations of training, the first iteration the `num_cluster` is 100, in the second iteration `num_cluster` is 500. - The model takes `waveforms`, `labels`, and `lengths` as inputs - The model generates the last layer of transformer embedding, `logit_m`, `logit_u` as the outputs. Pull Request resolved: https://github.com/pytorch/audio/pull/2064 Reviewed By: hwangjeff, mthrok Differential Revision: D33338587 Pulled By: nateanl fbshipit-source-id: 534bc17c576c5f344043d8ba098204b8da6e630a
-
moto authored
Summary: *Before:* https://pytorch.org/audio/main/tutorials/audio_data_augmentation_tutorial.html#effects-applied <img width="831" alt="Screen Shot 2021-12-28 at 11 25 08 AM" src="https://user-images.githubusercontent.com/855818/147586457-55d566bf-f016-4327-a07e-5de68f80e984.png"> *After:* https://484994-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html#effects-applied <img width="830" alt="Screen Shot 2021-12-28 at 11 25 57 AM" src="https://user-images.githubusercontent.com/855818/147586531-90333201-b9e3-450f-a2d7-6fb987b7e9d9.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2107 Reviewed By: carolineechen Differential Revision: D33337164 Pulled By: mthrok fbshipit-source-id: 20e3309f0d11d46619f516dc46d967b34f22ec95
-
moto authored
Summary: This commit updates the documentation configuration so that if an API (function or class) is used in tutorials, then it automatically add the links to the tutorials. It also adds `py:func:` so that it's easy to jump from tutorials to API reference. Note: the use of `py:func:` is not required to be recognized by Shpinx-gallery. * https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#feature-extractions <img width="776" alt="Screen Shot 2021-12-24 at 12 41 43 PM" src="https://user-images.githubusercontent.com/855818/147367407-cd86f114-7177-426a-b5ee-a25af17ae476.png"> * https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#mvdr <img width="769" alt="Screen Shot 2021-12-24 at 12 42 31 PM" src="https://user-images.githubusercontent.com/855818/147367422-01fd245f-2f25-4875-a206-910e17ae0161.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2101 Reviewed By: hwangjeff Differential Revision: D33311283 Pulled By: mthrok fbshipit-source-id: e0c124d2a761e0f8d81c3d14c4ffc836ffffe288
-
moto authored
Summary: *Before* <img width="1094" alt="Screen Shot 2021-12-24 at 12 34 14 PM" src="https://user-images.githubusercontent.com/855818/147367213-b1e539c1-6e06-4e9b-aaf4-0458c502379b.png"> *After* https://app.circleci.com/pipelines/github/pytorch/audio/8870/workflows/0445f1ac-ad48-412f-8045-2400d0cef4f4/jobs/482060 <img width="1096" alt="Screen Shot 2021-12-24 at 12 33 32 PM" src="https://user-images.githubusercontent.com/855818/147367210-a9b759bb-f992-4dc1-9359-0ec3912b3070.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2102 Reviewed By: carolineechen Differential Revision: D33311253 Pulled By: mthrok fbshipit-source-id: 6944921a8be58a2062b66a7dfd2c7ffe8c0866c3
-
- 24 Dec, 2021 2 commits
-
-
CodemodService Bot authored
Reviewed By: zertosh Differential Revision: D33307283 fbshipit-source-id: 55a95689b8c20b17b7c882070bc3e24706c44444
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review. Add Buffer class that is responsible for converting `AVFrame` to `Tensor`. Note: The API to retrieve the buffered Tensors is tentative. For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md. Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later. Needs to be imported after https://github.com/pytorch/audio/issues/2043. Pull Request resolved: https://github.com/pytorch/audio/pull/2044 Reviewed By: carolineechen Differential Revision: D32940553 Pulled By: mthrok fbshipit-source-id: 8b8b2222ad7b47edc17e9139420e8a71c00d726a
-
- 23 Dec, 2021 6 commits
-
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review. Add FilterGraph class that is responsible for handling AVFilterGraph structure and the application of filters. For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md. Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later. Needs to be imported after https://github.com/pytorch/audio/issues/2042. Pull Request resolved: https://github.com/pytorch/audio/pull/2043 Reviewed By: carolineechen Differential Revision: D32940535 Pulled By: mthrok fbshipit-source-id: 231e3ad17df2d67b6c7b323e5c89e718a3d48d0d
-
Caroline Chen authored
Summary: Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review This PR adds Python decoder API and basic README Pull Request resolved: https://github.com/pytorch/audio/pull/2089 Reviewed By: mthrok Differential Revision: D33299818 Pulled By: carolineechen fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
moto authored
Summary: Follow-up of https://github.com/pytorch/audio/issues/2086 The CI job to download the third party code and cache daily has not been properly updated. Pull Request resolved: https://github.com/pytorch/audio/pull/2095 Reviewed By: hwangjeff Differential Revision: D33291738 Pulled By: mthrok fbshipit-source-id: 6fc61f76b35c6f032085eda9d6053eefd2a1e0a9
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2094 Reviewed By: nateanl Differential Revision: D33288439 fbshipit-source-id: 385e0e4257755dbaf143287f612e19bede189757
-
hwangjeff authored
Summary: Adds implementation of Conformer module. Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770. Pull Request resolved: https://github.com/pytorch/audio/pull/2068 Reviewed By: mthrok Differential Revision: D33236957 Pulled By: hwangjeff fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6
-
- 22 Dec, 2021 2 commits
-
-
Joao Gomes authored
Summary: - Deprecates data utils (with warning that will be removed in v0.12) - replaces all usages of `torchaudio.datasets.utils.download_url` with `torch.hub.download_url_to_file` - replaces all MD5 hashes with SHA256 hash #Addresses https://github.com/pytorch/audio/issues/1883 Pull Request resolved: https://github.com/pytorch/audio/pull/2073 Reviewed By: mthrok Differential Revision: D33241756 Pulled By: jdsgomes fbshipit-source-id: 49388ec5965bfc91d9a1d8d0786eeafb2969f6cf
-
Joao Gomes authored
Summary: After discussing with Moto Hira, we decided to revert linting exemptions introduced previously in order to keep the entire audio project as formatted as possible, to reduce the time we spend on formatting discussion. Pull Request resolved: https://github.com/pytorch/audio/pull/2087 Reviewed By: mthrok Differential Revision: D33236949 Pulled By: jdsgomes fbshipit-source-id: e13079f532c4534d8a168059b0ded6fa375fdecf
-
- 21 Dec, 2021 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2092 Reviewed By: carolineechen Differential Revision: D33169110 fbshipit-source-id: e422ad93efe50b91f1ac5d572dc82768c1000c05
-
moto authored
Summary: 1. Reorder Audio display so that audios are playable from browser in doc 2. Add link to function documentations https://470342-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2082 Reviewed By: carolineechen Differential Revision: D33227725 Pulled By: mthrok fbshipit-source-id: c7ee360b6f9b84c8e0a9b72193b98487d03b57ab
-
moto authored
Summary: ## bug description When a 24 bits-par-sample audio is loaded via file-like object, the loaded Tensor is wrong. It was fine if the audio is loaded from local file. ## The cause of the bug The core of the sox's decoding mechanism is `sox_read` function, one of which parameter is the maximum number of samples to decode from the given buffer. https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)] The `sox_read` function is called in what is called `drain` effect, callback and this callback receives output buffer and its size in byte. The previous implementation passed this size value as the argument of `sox_read` for the maximum number of samples to read. Since buffer size is larger than the number of samples fit in the buffer, `sox_read` function always consumed the entire buffer. (This behavior is not wrong except when the input is 24 bits-per-sample and file-like object.) When the input is read from file-like object, inside of drain callback, new data are fetched via Python's `read` method and loaded on fixed-size memory region. The size of this memory region can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`, but the default value is 8096. If the input format is 24 bits-per-sample, the end of memory region does not necessarily correspond to the end of a valid sample. When `sox_read` consumes all the data in the buffer region, the data at the end introduces some unexpected values. This causes the aforementioned bug ## Fix Pass proper (better estimated) maximum number of samples decodable to `sox_read`. Pull Request resolved: https://github.com/pytorch/audio/pull/2084 Reviewed By: carolineechen Differential Revision: D33236947 Pulled By: mthrok fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
-
- 20 Dec, 2021 3 commits
-
-
moto authored
Summary: Previously sox-related third-party source code was archived at `third_party/sox/archives`. Recently KenLM-related third-party source code was added and they are archived at `third_party/archives`. This PR changes the sox archive location to `third_party/archives`, so that all the archvies are cached at the same location. Pull Request resolved: https://github.com/pytorch/audio/pull/2086 Reviewed By: carolineechen Differential Revision: D33236927 Pulled By: mthrok fbshipit-source-id: 2f2aa5f4b386fefb46d7c98f7179c04995219f3c
-
Joao Gomes authored
Summary: The urls for this dataset seem to have changed so I am updating to the new location Pull Request resolved: https://github.com/pytorch/audio/pull/2074 Reviewed By: mthrok Differential Revision: D33234996 Pulled By: jdsgomes fbshipit-source-id: e09c35a122e8227fcce7fa97aeeeea312cb89173
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2085 Reviewed By: carolineechen Differential Revision: D33235225 Pulled By: mthrok fbshipit-source-id: 47fe9ec4c93a26322b3a362202ddd3c4654c3f8c
-
- 18 Dec, 2021 1 commit
-
-
moto authored
Summary: After all the C++ code from https://github.com/pytorch/audio/issues/2072 are added, this commit will enable decoder/KenLM integration in the build process. Pull Request resolved: https://github.com/pytorch/audio/pull/2078 Reviewed By: carolineechen Differential Revision: D33198183 Pulled By: mthrok fbshipit-source-id: 9d7fa76151d06fbbac3785183c7c2ff9862d3128
-
- 17 Dec, 2021 3 commits
-
-
Caroline Chen authored
Summary: part of https://github.com/pytorch/audio/issues/2072 -- splitting up the PR for easier review Add C++ files for binding CTC decoder functionality for Python Note: the code here will not be compiled until the build process is changed Pull Request resolved: https://github.com/pytorch/audio/pull/2079 Reviewed By: mthrok Differential Revision: D33196286 Pulled By: carolineechen fbshipit-source-id: 9fe4a8635b60ebfb594918bab00f5c3dccf96bd2
-
Caroline Chen authored
Summary: part of https://github.com/pytorch/audio/issues/2072 -- splitting up the PR for easier review Add C++ files from [flashlight](https://github.com/flashlight/flashlight) that are needed for building CTC decoder w/ Lexicon and KenLM support Note: the code here will not be compiled until the build process is changed (future PR) Pull Request resolved: https://github.com/pytorch/audio/pull/2075 Reviewed By: mthrok Differential Revision: D33186825 Pulled By: carolineechen fbshipit-source-id: 5b69eea7634f3fae686471d988422942bb784cd9
-
moto authored
Summary: Add KenLM and its dependencies required for static build (`zlib`, `bzip2`, `lzma` and `boost-thread`). The KenLM and its dependencies are build but since no corresponding code on torchaudio side is changed, the resulting torchaudio extension module is not changed. (therefore, as long as build process passes on CI this PR should be good to go.) Pull Request resolved: https://github.com/pytorch/audio/pull/2076 Reviewed By: carolineechen Differential Revision: D33189980 Pulled By: mthrok fbshipit-source-id: 6096113128b939f3cf70990c99aacc4aaa954584
-