README.md

# Whisper

This version was tested with [rocm 5.7](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/tree/rocm-5.7.0) revision.

## Jupyter notebook

There is a dedicated step-by-step notebook. See [whisper.ipynb](./whisper.ipynb)

## Console application

To run the console application, follow these steps below.

Setup python environment

```bash
# this will require the python venv to installed (e.g. apt install python3.8-venv)
python3 -m venv w_venv
. w_venv/bin/activate
```

Install dependencies

`ffmpeg` needed to handle audio files.

```bash
apt install ffmpeg
```

```bash
pip install -r requirements.txt
```

Use MIGraphX Python Module

```bash
export PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
```

Use the helper script to download with optimum.
The attention_mask for decoder is not exposed by default, but required to work with MIGraphX.

```bash
python download_whisper.py
```

*Note: `models/whisper-tiny.en_modified` will be used in the scripts*

There are *optional* samples which can be downloaded. But the example can be tested without them.

```bash
./download_samples.sh
```

Run the automatic-speech-recognition script with the following example input:

```bash
python asr.py --audio audio/sample1.flac --log-process
```

Or without any audio input to run the [Hugging Face dummy dataset](https://huggingface.co/datasets/hf-internal-testing/librispeech_asr_dummy) samples.


## Gradio application

Note: requires `Console application` to work

Install gradio dependencies

```bash
pip install -r gradio_requirements.txt
```

Usage

```bash
python gradio_app.py
```

This will load the models (which can take several minutes), and when the setup is ready, starts a server on `http://127.0.0.1:7860`.