Commit cbd35438 authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Update precise seek behavior for t=0 (#2915)

Summary:
It was reported that when videos with invalid PTS values are fed to StreamReader, StreamReader returns only the last frame.

https://github.com/pytorch/vision/blob/677fc939b21a8893f07db4c1f90482b648b6573f/test/assets/videos/RATRACE_wave_f_nm_np1_fr_goo_37.avi

```
import torchaudio

src = "RATRACE_wave_f_nm_np1_fr_goo_37.avi"

streamer = torchaudio.io.StreamReader(src=src)
streamer.add_basic_video_stream(frames_per_chunk=-1)
streamer.process_all_packets()
video, = streamer.pop_chunks()

print(video.size(0))  # prints 1, but there are more than 70 frames
```

The reason why all the frames are not returned is due to invalid PTS values. All the frames's PTS values are `-9223372036854775808` so the internal mechanism discards them.

The reason why the last frame is output is because when entering drain mode, the discard value of -1 is used, which is interpreted as no discard.

For the second issue, the discard behavior should be consistent across regular decoding and drain mode.

For the first issue, although the normal behavior is not guaranteed for such invalid input, we can support the case where one reads video from start (or when one seeks into t=0)

 ---

This commits make the following changes to address the above two.

1. Define the discard_before_pts attribtue on StreamProcessor, so that StreamProcessor is aware of the discard behavior without being told by StreamReader, and its behavior is consistent between regular decoding and drain.

   This gets rid of the discard_before_pts computation that is currently happening at the every time a frame is processed, so this should improve the peformance a bit.

2. Change the meaning of discard_before_pts, so that when it's 0, no discard happens. With this change, the negative value is not necessary so we put it a UB status.

Note:
   Even with those changes seeking videos with invalid PTS is not plausible, client codes can implement a fallback which decodes frames first and discard undesired ones.

Pull Request resolved: https://github.com/pytorch/audio/pull/2915

Reviewed By: nateanl

Differential Revision: D41957784

Pulled By: mthrok

fbshipit-source-id: 2dafdbada5aa33bfc81c986306f80642ba6277df
parent 123c7319
......@@ -47,6 +47,12 @@ void StreamProcessor::remove_stream(KeyType key) {
sinks.erase(key);
}
void StreamProcessor::set_discard_timestamp(int64_t timestamp) {
TORCH_CHECK(timestamp >= 0, "timestamp must be non-negative.");
discard_before_pts =
av_rescale_q(timestamp, av_get_time_base_q(), stream->time_base);
}
////////////////////////////////////////////////////////////////////////////////
// Query methods
////////////////////////////////////////////////////////////////////////////////
......@@ -68,9 +74,7 @@ bool StreamProcessor::is_buffer_ready() const {
////////////////////////////////////////////////////////////////////////////////
// 0: some kind of success
// <0: Some error happened
int StreamProcessor::process_packet(
AVPacket* packet,
int64_t discard_before_pts) {
int StreamProcessor::process_packet(AVPacket* packet) {
int ret = decoder.process_packet(packet);
while (ret >= 0) {
ret = decoder.get_frame(pFrame1);
......@@ -83,7 +87,22 @@ int StreamProcessor::process_packet(
if (ret < 0)
return ret;
if (discard_before_pts < 0 || pFrame1->pts >= discard_before_pts) {
// When the value of discard_before_pts is 0, we consider that the seek is
// not performed and all the frames are passed to downstream
// unconditionally.
//
// Two reasons for this behavior;
// 1. When seek mode is not precise, we do not discard any frame.
// In this case discard_before_pts is set to zero.
// 2. When users seek to zero, what they expect is to get to the beginning
// of the data.
// There are many videos with invalid PTS values, such as
// -9223372036854775808, and though it is not possible to seek videos
// without decoding, we can still support `seek(0)` as a special case,
// and just not discard any.
//
// Note: discard_before_pts < 0 is UB.
if (discard_before_pts <= 0 || pFrame1->pts >= discard_before_pts) {
send_frame(pFrame1);
}
......
......@@ -25,6 +25,13 @@ class StreamProcessor {
KeyType current_key = 0;
std::map<KeyType, Sink> sinks;
// Used for precise seek.
// 0: no discard
// Positive Values: decoded frames with PTS values less than this are
// discarded.
// Negative values: UB. Should not happen.
int64_t discard_before_pts = 0;
public:
StreamProcessor(
AVStream* stream,
......@@ -57,6 +64,10 @@ class StreamProcessor {
// 1. Remove the stream
void remove_stream(KeyType key);
// Set discard
// The input timestamp must be expressed in AV_TIME_BASE unit.
void set_discard_timestamp(int64_t timestamp);
//////////////////////////////////////////////////////////////////////////////
// Query methods
//////////////////////////////////////////////////////////////////////////////
......@@ -70,7 +81,7 @@ class StreamProcessor {
// 2. pass the decoded data to filters
// 3. each filter store the result to the corresponding buffer
// - Sending NULL will drain (flush) the internal
int process_packet(AVPacket* packet, int64_t discard_before_pts = -1);
int process_packet(AVPacket* packet);
// flush the internal buffer of decoder.
// To be use when seeking
......
......@@ -174,13 +174,13 @@ void StreamReader::seek(double timestamp_s, int64_t mode) {
int flag = AVSEEK_FLAG_BACKWARD;
switch (mode) {
case 0:
seek_timestamp =
-1; // reset seek_timestap as it is only used for precise seek
// reset seek_timestap as it is only used for precise seek
seek_timestamp = 0;
break;
case 1:
flag |= AVSEEK_FLAG_ANY;
seek_timestamp =
-1; // reset seek_timestap as it is only used for precise seek
// reset seek_timestap as it is only used for precise seek
seek_timestamp = 0;
break;
case 2:
seek_timestamp = timestamp_av_tb;
......@@ -192,12 +192,13 @@ void StreamReader::seek(double timestamp_s, int64_t mode) {
int ret = av_seek_frame(pFormatContext, -1, timestamp_av_tb, flag);
if (ret < 0) {
seek_timestamp = -1;
seek_timestamp = 0;
TORCH_CHECK(false, "Failed to seek. (" + av_err2string(ret) + ".)");
}
for (const auto& it : processors) {
if (it) {
it->flush();
it->set_discard_timestamp(seek_timestamp);
}
}
}
......@@ -278,6 +279,7 @@ void StreamReader::add_stream(
if (!processors[i]) {
processors[i] = std::make_unique<StreamProcessor>(
stream, decoder, decoder_option, device);
processors[i]->set_discard_timestamp(seek_timestamp);
}
stream->discard = AVDISCARD_DEFAULT;
int key = processors[i]->add_stream(
......@@ -328,12 +330,7 @@ int StreamReader::process_packet() {
return 0;
}
AVRational stream_tb =
pFormatContext->streams[pPacket->stream_index]->time_base;
int64_t seek_timestamp_in_stream_tb =
av_rescale_q(seek_timestamp, av_get_time_base_q(), stream_tb);
ret = processor->process_packet(packet, seek_timestamp_in_stream_tb);
ret = processor->process_packet(packet);
return (ret < 0) ? ret : 0;
}
......
......@@ -18,9 +18,16 @@ class StreamReader {
std::vector<std::pair<int, int>> stream_indices;
// timestamp to seek to expressed in AV_TIME_BASE
// < 0 : No seek
//
// 0 : No seek
// Positive value: Skip AVFrames with timestamps before it
int64_t seek_timestamp = -1;
// Negative value: UB. Should not happen
//
// Note:
// When precise seek is performed, this value is set to the value provided
// by client code, and PTS values of decoded frames are compared against it
// to determine whether the frames should be passed to downstream.
int64_t seek_timestamp = 0;
public:
explicit StreamReader(AVFormatInputContextPtr&& p);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment