Unverified Commit b6a0434a authored by nateanl's avatar nateanl Committed by GitHub
Browse files

Add equations to MVDR docstring (#1789)

parent 78d41d57
...@@ -1560,16 +1560,43 @@ class PSD(torch.nn.Module): ...@@ -1560,16 +1560,43 @@ class PSD(torch.nn.Module):
class MVDR(torch.nn.Module): class MVDR(torch.nn.Module):
"""MVDR module that performs MVDR beamforming with Time-Frequency masks. """Minimum Variance Distortionless Response (MVDR) module that performs MVDR beamforming with Time-Frequency masks.
Based on https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py Based on https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py
We provide three solutions of MVDR beamforming. One is based on *reference channel selection* We provide three solutions of MVDR beamforming. One is based on *reference channel selection*
[:footcite:`souden2009optimal`]. [:footcite:`souden2009optimal`] (``solution=ref_channel``).
The other two solutions are based on the steering vector. We apply either *eigenvalue decomposition* .. math::
\\textbf{w}_{\\text{MVDR}}(f) =\
\\frac{{{\\bf{\\Phi}_{\\textbf{NN}}^{-1}}(f){\\bf{\\Phi}_{\\textbf{SS}}}}(f)}\
{\\text{Trace}({{{\\bf{\\Phi}_{\\textbf{NN}}^{-1}}(f) \\bf{\\Phi}_{\\textbf{SS}}}(f))}}\\bm{u}
where :math:`\\bf{\\Phi}_{\\textbf{SS}}` and :math:`\\bf{\\Phi}_{\\textbf{NN}}` are the covariance\
matrices of speech and noise, respectively. :math:`\\bf{u}` is an one-hot vector to determine the\
reference channel.
The other two solutions are based on the steering vector (``solution=stv_evd`` or ``solution=stv_power``).
.. math::
\\textbf{w}_{\\text{MVDR}}(f) =\
\\frac{{{\\bf{\\Phi}_{\\textbf{NN}}^{-1}}(f){\\bm{v}}(f)}}\
{{\\bm{v}^{\\mathsf{H}}}(f){\\bf{\\Phi}_{\\textbf{NN}}^{-1}}(f){\\bm{v}}(f)}
where :math:`\\bm{v}` is the acoustic transfer function or the steering vector.\
:math:`.^{\\mathsf{H}}` denotes the Hermitian Conjugate operation.
We apply either *eigenvalue decomposition*
[:footcite:`higuchi2016robust`] or the *power method* [:footcite:`mises1929praktische`] to get the [:footcite:`higuchi2016robust`] or the *power method* [:footcite:`mises1929praktische`] to get the
steering vector from the PSD matrices. steering vector from the PSD matrix of speech.
After estimating the beamforming weight, the enhanced Short-time Fourier Transform (STFT) is obtained by
.. math::
\\hat{\\bf{S}} = {\\bf{w}^\\mathsf{H}}{\\bf{Y}}, {\\bf{w}} \\in \\mathbb{C}^{M \\times F}
where :math:`\\bf{Y}` and :math:`\\hat{\\bf{S}}` are the STFT of the multi-channel noisy speech and\
the single-channel enhanced speech, respectively.
For online streaming audio, we provide a *recursive method* [:footcite:`higuchi2017online`] to update the For online streaming audio, we provide a *recursive method* [:footcite:`higuchi2017online`] to update the
PSD matrices of speech and noise, respectively. PSD matrices of speech and noise, respectively.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment