Commit 524d5540 authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Add note about ffmpeg code (#2115)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2115

Reviewed By: carolineechen

Differential Revision: D33370700

Pulled By: mthrok

fbshipit-source-id: 591b67870247f69cc542f649cd62444ee3c45934
parent 8ed14782
# FFMpeg binding dev note
The ffmpeg binding is based on ver 4.1.
## Learning material
For understanding the concept of stream processing, some tutorials are useful.
https://github.com/leandromoreira/ffmpeg-libav-tutorial
The best way to learn how to use ffmpeg is to look at the official examples.
Practically all the code is re-organization of examples;
https://ffmpeg.org/doxygen/4.1/examples.html
## Streamer Architecture
The top level class is `Streamer` class. This class handles the input (via `AVFormatContext*`), and manages `StreamProcessor`s for each stream in the input.
The `Streamer` object slices the input data into a series of `AVPacket` objects and it feeds the objects to corresponding `StreamProcessor`s.
```
Streamer
┌─────────────────────────────────────────────────┐
│ │
│ AVFormatContext* ┌──► StreamProcessor[0] │
│ │ │ │
│ └─────────────┼──► StreamProcessor[1] │
│ AVPacket* │ │
│ └──► ... │
│ │
└─────────────────────────────────────────────────┘
```
The `StreamProcessor` class is composed of one `Decoder` and multiple of `Sink` objects.
`Sink` objects correspond to output streams that users set.
`Sink` class is a wrapper `FilterGraph` and `Buffer` classes.
The `AVPacket*` passed to `StreamProcessor` is first passed to `Decoder`.
`Decoder` generates audio / video frames (`AVFrame`) and pass it to `Sink`s.
Firstly `Sink` class passes the incoming frame to `FilterGraph`.
`FilterGraph` is a class based on [`AVFilterGraph` structure](https://ffmpeg.org/doxygen/4.1/structAVFilterGraph.html),
and it can apply various filters.
At minimum, it performs format conversion so that the resuling data is suitable for Tensor representation,
such as YUV to RGB.
The output `AVFrame` from `FilterGraph` is passed to `Buffer` class, which converts it to Tensor.
```
StreamProcessor
┌─────────────────────────────────────────────────────────┐
│ AVPacket* │
│ │ │
│ │ AVFrame* AVFrame* │
│ └► Decoder ──┬─► FilterGraph ─────► Buffer ───► Tensor │
│ │ │
│ ├─► FilterGraph ─────► Buffer ───► Tensor │
│ │ │
│ └─► ... │
│ │
└─────────────────────────────────────────────────────────┘
```
## Implementation guideline
### Memory management and object lifecycle
Ffmpeg uses raw pointers, which needs to be allocated and freed with dedicated functions.
In the binding code, these pointers are encapsulated in a class with RAII semantic and
`std::unique_ptr<>` to guarantee sole ownership.
**Decoder lifecycle**
```c++
// Default construction (no memory allocation)
decoder = Decoder(...);
// Decode
decoder.process_packet(pPacket);
// Retrieve result
decoder.get_frame(pFrame);
// Release resources
decoder::~Decoder();
```
**FilterGraph lifecycle**
```c++
// Default construction (no memory allocation)
filter_graph = FilterGraph();
// Filter configuration
...
filter_graph.create_filter();
// Apply filter
fitler_graph.add_frame(pFrame);
// Retrieve result
filter_graph.get_frame(pFrame);
// Release resources
filter_graph::~FilterGraph();
```
**StreamProcessor lifecycle**
```c++
// Default construction (no memory allocation)
processor = Processor(...);
// Define the process stream
processor.add_audio_stream(...);
processor.add_audio_stream(...);
// Process the packet
processor.process_packet(pPacket);
// Retrieve result
tensor = processor.get_chunk(...);
// Release resources
processor::~Processor();
```
### ON/OFF semantic and `std::unique_ptr<>`
Since we want to make some components (such as stream processors and filters)
separately configurable, we introduce states for ON/OFF.
To make the code simple, we use `std::unique_ptr<>`.
`nullptr` means the component is turned off.
This pattern applies to `StreamProcessor` (output streams).
### Exception and return value
To report the error during the configuration and initialization of objects,
we use `Exception`. However, throwing errors is expensive during the streaming,
so we use return value for that.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment