-
Manoj Plakal authored
* Input/Output tweaks for YAMNet and VGGish. - Waveform input for YAMNet is now padded so that we get at least one patch of log mel spectrogram. The VGGish TF-Hub exporter uses YAMNet's feature computation so the VGGish export will also pad waveform input similarly. - Added a 1024-D embedding output to YAMNet so we now produce predicted scores, log mel spectrogram features, and embeddings, to satisfy a variety of uses: class prediction, acoustic feature visualization, semantic feature extraction. - Simplified usage of YAMNet in inference mode. Instead of trying to work around implicit batch size issues in the Model.predict() API, we simply __call__() the Model. - Switched inference.py to TF 2 and Eager execution. - Updated the visualization notebook: now uses TF2/Eager and can be loaded and run in Google Colab. * Responded to DAn's comments in https://github.com/tensorflow/models/pull/9092 - Merged spectrogram computation and framing into a single function that returns both spectrogram and framed features. - Extended waveform padding to pad up to an integral number of hops in addition to the final STFT analysis window.
9b179e8e