Update squim tutorial (#3313)

Summary: Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths. Pull Request resolved: https://github.com/pytorch/audio/pull/3313 Reviewed By: hwangjeff Differential Revision: D45620311 Pulled By: nateanl fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557

Update squim tutorial (#3313)
Summary: Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths. Pull Request resolved: https://github.com/pytorch/audio/pull/3313 Reviewed By: hwangjeff Differential Revision: D45620311 Pulled By: nateanl fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557
05ef7dc6 · Zhaoheng Ni · Facebook GitHub Bot · 82febc59 · 05ef7dc6
Commit 05ef7dc6 authored May 05, 2023 by Zhaoheng Ni Committed by Facebook GitHub Bot May 05, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 41 additions and 0 deletions

examples/tutorials/squim_tutorial.py examples/tutorials/squim_tutorial.py +41 -0

No files found.
--- a/examples/tutorials/squim_tutorial.py
+++ b/examples/tutorials/squim_tutorial.py
@@ -357,3 +357,44 @@ print(f"Estimated MOS for distorted speech at {snr_dbs[0]}dB is MOS: {mos[0]}")
 mos = subjective_model(WAVEFORM_DISTORTED[1:2, :], WAVEFORM_NMR)
 print(f"Estimated MOS for distorted speech at {snr_dbs[1]}dB is MOS: {mos[0]}")
+######################################################################
+# 8. Comparison with ground truths and baselines
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# Visualizing the estimated metrics by the ``SquimObjective`` and
+# ``SquimSubjective`` models can help users better understand how the
+# models can be applicable in real scenario. The graph below shows scatter
+# plots of three different systems: MOSA-Net [1], AMSA [2], and the
+# ``SquimObjective`` model, where y axis represents the estimated STOI,
+# PESQ, and Si-SDR scores, and x axis represents the corresponding ground
+# truth.
+#
+# .. image:: https://download.pytorch.org/torchaudio/tutorial-assets/objective_plot.png
+#    :width: 500px
+#    :align: center
+#
+# [1] Zezario, Ryandhimas E., Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh,
+# Hsin-Min Wang, and Yu Tsao. “Deep learning-based non-intrusive
+# multi-objective speech assessment model with cross-domain features.”
+# IEEE/ACM Transactions on Audio, Speech, and Language Processing 31
+# (2022): 54-70.
+#
+# [2] Dong, Xuan, and Donald S. Williamson. “An attention enhanced
+# multi-task model for objective speech assessment in real-world
+# environments.” In ICASSP 2020-2020 IEEE International Conference on
+# Acoustics, Speech and Signal Processing (ICASSP), pp. 911-915. IEEE,
+# 2020.
+#
+######################################################################
+# The graph below shows scatter plot of the ``SquimSubjective`` model,
+# where y axis represents the estimated MOS metric score, and x axis
+# represents the corresponding ground truth.
+#
+# .. image:: https://download.pytorch.org/torchaudio/tutorial-assets/subjective_plot.png
+#    :width: 500px
+#    :align: center
+#