Input/Output tweaks for YAMNet and VGGish. (#9092)
* Input/Output tweaks for YAMNet and VGGish. - Waveform input for YAMNet is now padded so that we get at least one patch of log mel spectrogram. The VGGish TF-Hub exporter uses YAMNet's feature computation so the VGGish export will also pad waveform input similarly. - Added a 1024-D embedding output to YAMNet so we now produce predicted scores, log mel spectrogram features, and embeddings, to satisfy a variety of uses: class prediction, acoustic feature visualization, semantic feature extraction. - Simplified usage of YAMNet in inference mode. Instead of trying to work around implicit batch size issues in the Model.predict() API, we simply __call__() the Model. - Switched inference.py to TF 2 and Eager execution. - Updated the visualization notebook: now uses TF2/Eager and can be loaded and run in Google Colab. * Responded to DAn's comments in https://github.com/tensorflow/models/pull/9092 - Merged spectrogram computation and framing into a single function that returns both spectrogram and framed features. - Extended waveform padding to pad up to an integral number of hops in addition to the final STFT analysis window.
Showing
This source diff could not be displayed because it is too large. You can view the blob instead.
Please register or sign in to comment