Cleaned up dependences and install instructions for vggish and yamnet. (#8059)

- Made code work with either TF v1.x or TF v2.x, while explicitly enabling v1.x behavior.l - Pulled slim from tf_slim package instead of through tensorflow contrib. Note that tf_slim itself uses tensorflow contrib so it requires using TF v1.x for now (referenced a relevant PR which should remove this limitation once it gets merged). - Removed all mention of scipy. Switched wav writing to soundfile. - Switched package name to soundfile instead of pysoundfile. The former is the newer name. - Updated installation instructions for both vggish and yamnet to reflect these changes. - Tested new installation procedures. vggish works with TF v1.15, yamnet works with TF v1.15.0 as well as TF v2.1.0.

Cleaned up dependences and install instructions for vggish and yamnet. (#8059)
- Made code work with either TF v1.x or TF v2.x, while explicitly enabling v1.x behavior.l - Pulled slim from tf_slim package instead of through tensorflow contrib. Note that tf_slim itself uses tensorflow contrib so it requires using TF v1.x for now (referenced a relevant PR which should remove this limitation once it gets merged). - Removed all mention of scipy. Switched wav writing to soundfile. - Switched package name to soundfile instead of pysoundfile. The former is the newer name. - Updated installation instructions for both vggish and yamnet to reflect these changes. - Tested new installation procedures. vggish works with TF v1.15, yamnet works with TF v1.15.0 as well as TF v2.1.0.
831281ce · Manoj Plakal · Dan Ellis · 855ed8bc · 831281ce · 831281ce
Commit 831281ce authored Jan 18, 2020 by Manoj Plakal Committed by Dan Ellis Jan 18, 2020
7 changed files
--- a/research/audioset/vggish/README.md
+++ b/research/audioset/vggish/README.md
@@ -15,19 +15,18 @@ the released embedding features.
 VGGish depends on the following Python packages:
 * [`numpy`](http://www.numpy.org/)
-* [`scipy`](http://www.scipy.org/)
 * [`resampy`](http://resampy.readthedocs.io/en/latest/)
-* [`tensorflow`](http://www.tensorflow.org/)
+* [`tensorflow`](http://www.tensorflow.org/) (currently, only TF v1.x)
+* [`tf_slim`](https://github.com/google-research/tf-slim)
 * [`six`](https://pythonhosted.org/six/)
-* [`pysoundfile`](https://pysoundfile.readthedocs.io/)
+* [`soundfile`](https://pysoundfile.readthedocs.io/)
 These are all easily installable via, e.g., `pip install numpy` (as in the
-example command sequence below).
+sample installation session below).
-Any reasonably recent version of these packages should work. TensorFlow should
+Any reasonably recent version of these packages shold work. Note that we currently only support
-be at least version 1.0.  We have tested that everything works on Ubuntu and
+TensorFlow v1.x due to a [`tf_slim` limitation](https://github.com/google-research/tf-slim/pull/1).
-Windows 10 with Python 3.6.6, Numpy v1.15.4, SciPy v1.1.0, resampy v0.2.1,
+TensorFlow v1.15 (the latest version as of Jan 2020) has been tested to work.
-TensorFlow v1.3.0, Six v1.11.0 and PySoundFile 0.9.0.
 VGGish also requires downloading two data files:
@@ -57,14 +56,11 @@ Here's a sample installation and test session:
 #   $ deactivate
 # Within the virtual environment, do not use 'sudo'.
-# Upgrade pip first.
+# Upgrade pip first. Also make sure wheel is installed.
-$ sudo python -m pip install --upgrade pip
+$ sudo python -m pip install --upgrade pip wheel
-# Install dependences. Resampy needs to be installed after NumPy and SciPy
+# Install all dependences.
-# are already installed.
+$ sudo pip install numpy resampy tensorflow==1.15 tf_slim six soundfile
-$ sudo pip install numpy scipy soundfile
-$ sudo pip install resampy six
-$ sudo pip install tensorflow==1.14
 # Clone TensorFlow models repo into a 'models' directory.
 $ git clone https://github.com/tensorflow/models.git

--- a/research/audioset/vggish/vggish_inference_demo.py
+++ b/research/audioset/vggish/vggish_inference_demo.py
@@ -47,9 +47,10 @@ Usage:
 from __future__ import print_function
 import numpy as np
-from scipy.io import wavfile
 import six
-import tensorflow as tf
+import soundfile
+import tensorflow.compat.v1 as tf
+tf.disable_v2_behavior()
 import vggish_input
 import vggish_params
@@ -93,7 +94,7 @@ def main(_):
    # Convert to signed 16-bit samples.
    samples = np.clip(x * 32768, -32768, 32767).astype(np.int16)
    wav_file = six.BytesIO()
-    wavfile.write(wav_file, sr, samples)
+    soundfile.write(wav_file, samples, sr, format='WAV', subtype='PCM_16')
    wav_file.seek(0)
  examples_batch = vggish_input.wavfile_to_examples(wav_file)
  print(examples_batch)

--- a/research/audioset/vggish/vggish_input.py
+++ b/research/audioset/vggish/vggish_input.py
@@ -27,7 +27,7 @@ try:
  def wav_read(wav_file):
    wav_data, sr = sf.read(wav_file, dtype='int16')
    return wav_data, sr
 except ImportError:
  def wav_read(wav_file):

--- a/research/audioset/vggish/vggish_slim.py
+++ b/research/audioset/vggish/vggish_slim.py
@@ -31,10 +31,10 @@ https://github.com/tensorflow/models/blob/master/research/slim/nets/vgg.py
 """
 import tensorflow.compat.v1 as tf
-from tensorflow.contrib import slim as contrib_slim
+tf.disable_v2_behavior()
-import vggish_params as params
+import tf_slim as slim
-slim = contrib_slim
+import vggish_params as params
 def define_vggish_slim(training=False):

--- a/research/audioset/vggish/vggish_smoke_test.py
+++ b/research/audioset/vggish/vggish_smoke_test.py
@@ -32,7 +32,8 @@ Usage:
 from __future__ import print_function
 import numpy as np
-import tensorflow as tf
+import tensorflow.compat.v1 as tf
+tf.disable_v2_behavior()
 import vggish_input
 import vggish_params

--- a/research/audioset/vggish/vggish_train_demo.py
+++ b/research/audioset/vggish/vggish_train_demo.py
@@ -48,14 +48,15 @@ from __future__ import print_function
 from random import shuffle
 import numpy as np
-import tensorflow as tf
+import tensorflow.compat.v1 as tf
+tf.disable_v2_behavior()
+import tf_slim as slim
 import vggish_input
 import vggish_params
 import vggish_slim
 flags = tf.app.flags
-slim = tf.contrib.slim
 flags.DEFINE_integer(
    'num_batches', 30,

--- a/research/audioset/yamnet/README.md
+++ b/research/audioset/yamnet/README.md
@@ -13,7 +13,6 @@ for applying the model to input sound files.
 YAMNet depends on the following Python packages:
 * [`numpy`](http://www.numpy.org/)
-* [`scipy`](http://www.scipy.org/)
 * [`resampy`](http://resampy.readthedocs.io/en/latest/)
 * [`tensorflow`](http://www.tensorflow.org/)
 * [`pysoundfile`](https://pysoundfile.readthedocs.io/)
@@ -22,9 +21,9 @@ These are all easily installable via, e.g., `pip install numpy` (as in the
 example command sequence below).
 Any reasonably recent version of these packages should work. TensorFlow should
-be at least version 1.8 to ensure Keras support is included.  We have tested
+be at least version 1.8 to ensure Keras support is included. Note that while
-that everything works on Ubuntu and MacOS with Python 3.7.2, Numpy v1.15.4,
+the code works fine with TensorFlow v1.x or v2.x, we explicitly enable v1.x
-SciPy v1.1.0, resampy v0.2.1, TensorFlow v1.14.0, and PySoundFile 0.9.0.
+behavior.
 YAMNet also requires downloading the following data file:
@@ -38,13 +37,11 @@ runs some synthetic signals through the model and checks the outputs.
 Here's a sample installation and test session:
 ```shell
-# Upgrade pip first.
+# Upgrade pip first. Also make sure wheel is installed.
-python -m pip install --upgrade pip
+python -m pip install --upgrade pip wheel.
-# Install dependences. Resampy needs to be installed after NumPy and SciPy
+# Install dependences.
-# are already installed.
+pip install numpy resampy tensorflow soundfile
-pip install numpy scipy
-pip install resampy tensorflow soundfile
 # Clone TensorFlow models repo into a 'models' directory.
 git clone https://github.com/tensorflow/models.git