Merge pull request #5722 from SNeugber/master

Switching to more robust pysoundfile for reading wav files

Merge pull request #5722 from SNeugber/master
Switching to more robust pysoundfile for reading wav files
d299118e · Manoj Plakal · GitHub · 5b8e8cdf · de51e746 · d299118e
Unverified Commit d299118e authored Nov 20, 2018 by Manoj Plakal Committed by GitHub Nov 20, 2018
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 5 deletions

research/audioset/README.md research/audioset/README.md +4 -3

research/audioset/vggish_input.py research/audioset/vggish_input.py +3 -2

No files found.
--- a/research/audioset/README.md
+++ b/research/audioset/README.md
@@ -49,14 +49,15 @@ VGGish depends on the following Python packages:
 * [`resampy`](http://resampy.readthedocs.io/en/latest/)
 * [`tensorflow`](http://www.tensorflow.org/)
 * [`six`](https://pythonhosted.org/six/)
+* [`pysoundfile`](https://pysoundfile.readthedocs.io/)
 These are all easily installable via, e.g., `pip install numpy` (as in the
 example command sequence below).
 Any reasonably recent version of these packages should work. TensorFlow should
-be at least version 1.0.  We have tested with Python 2.7.6 and 3.4.3 on an
+be at least version 1.0.  We have tested that everything works on Ubuntu and
-Ubuntu-like system with NumPy v1.13.1, SciPy v0.19.1, resampy v0.1.5, TensorFlow
+Windows 10 with Python 3.6.6, Numpy v1.15.4, SciPy v1.1.0, resampy v0.2.1,
-v1.2.1, and Six v1.10.0.
+TensorFlow v1.3.0, Six v1.11.0 and PySoundFile 0.9.0.
 VGGish also requires downloading two data files:

--- a/research/audioset/vggish_input.py
+++ b/research/audioset/vggish_input.py
@@ -17,11 +17,12 @@
 import numpy as np
 import resampy
-from scipy.io import wavfile
 import mel_features
 import vggish_params
+import soundfile as sf
 def waveform_to_examples(data, sample_rate):
  """Converts audio waveform into an array of examples for VGGish.
@@ -80,7 +81,7 @@ def wavfile_to_examples(wav_file):
  Returns:
    See waveform_to_examples.
  """
-  sr, wav_data = wavfile.read(wav_file)
+  wav_data, sr = sf.read(wav_file, dtype='int16')
  assert wav_data.dtype == np.int16, 'Bad sample type: %r' % wav_data.dtype
  samples = wav_data / 32768.0  # Convert to [-1.0, +1.0]
  return waveform_to_examples(samples, sr)