research/audioset/yamnet/features.py · 9b179e8ecd21d236b2e814d1a67f32e23387b67b · ModelZoo / ResNet50_tensorflow

Input/Output tweaks for YAMNet and VGGish. (#9092) · 9b179e8e

Manoj Plakal authored Aug 12, 2020

* Input/Output tweaks for YAMNet and VGGish.

- Waveform input for YAMNet is now padded so that we get at least
  one patch of log mel spectrogram. The VGGish TF-Hub exporter
  uses YAMNet's feature computation so the VGGish export will
  also pad waveform input similarly.
- Added a 1024-D embedding output to YAMNet so we now produce
  predicted scores, log mel spectrogram features, and embeddings,
  to satisfy a variety of uses: class prediction, acoustic
  feature visualization, semantic feature extraction.
- Simplified usage of YAMNet in inference mode. Instead of trying
  to work around implicit batch size issues in the Model.predict()
  API, we simply __call__() the Model.
- Switched inference.py to TF 2 and Eager execution.
- Updated the visualization notebook: now uses TF2/Eager and
  can be loaded and run in Google Colab.

* Responded to DAn's comments in https://github.com/tensorflow/models/pull/9092

- Merged spectrogram computation and framing into a single function
  that returns both spectrogram and framed features.
- Extended waveform padding to pad up to an integral number of hops
  in addition to the final STFT analysis window.

9b179e8e

features.py 4.57 KB

Replace features.py