• Manoj Plakal's avatar
    Input/Output tweaks for YAMNet and VGGish. (#9092) · 9b179e8e
    Manoj Plakal authored
    * Input/Output tweaks for YAMNet and VGGish.
    
    - Waveform input for YAMNet is now padded so that we get at least
      one patch of log mel spectrogram. The VGGish TF-Hub exporter
      uses YAMNet's feature computation so the VGGish export will
      also pad waveform input similarly.
    - Added a 1024-D embedding output to YAMNet so we now produce
      predicted scores, log mel spectrogram features, and embeddings,
      to satisfy a variety of uses: class prediction, acoustic
      feature visualization, semantic feature extraction.
    - Simplified usage of YAMNet in inference mode. Instead of trying
      to work around implicit batch size issues in the Model.predict()
      API, we simply __call__() the Model.
    - Switched inference.py to TF 2 and Eager execution.
    - Updated the visualization notebook: now uses TF2/Eager and
      can be loaded and run in Google Colab.
    
    * Responded to DAn's comments in https://github.com/tensorflow/models/pull/9092
    
    - Merged spectrogram computation and framing into a single function
      that returns both spectrogram and framed features.
    - Extended waveform padding to pad up to an integral number of hops
      in addition to the final STFT analysis window.
    9b179e8e
features.py 4.57 KB