Unverified Commit 32e16805 authored by Francisco Massa's avatar Francisco Massa Committed by GitHub
Browse files

Update video reader to use new decoder (#1978)

* Base decoder for video. (#1747)

Summary:
Pull Request resolved: https://github.com/pytorch/vision/pull/1747

Pull Request resolved: https://github.com/pytorch/vision/pull/1746

Added the implementation of ffmpeg based decoder with functionality that can be used in VUE and TorchVision.

Reviewed By: fmassa

Differential Revision: D19358914

fbshipit-source-id: abb672f89bfaca6351dda2354f0d35cf8e47fa0f

* Integrated base decoder into VideoReader class and video_utils.py (#1766)

Summary:
Pull Request resolved: https://github.com/pytorch/vision/pull/1766

Replaced FfmpegDecoder (incompativle with VUE) by base decoder (compatible with VUE).
Modified python utilities video_utils.py for internal simplification. Public interface got preserved.

Reviewed By: fmassa

Differential Revision: D19415903

fbshipit-source-id: 4d7a0158bd77bac0a18732fe4183fdd9a57f6402

* Optimizating base decoder performance. (#1852)

Summary:
Pull Request resolved: https://github.com/pytorch/vision/pull/1852

Changed base decoder internals for a faster clip processing.

Reviewed By: stephenyan1231

Differential Revision: D19748379

fbshipit-source-id: 58a435f0a0b25545e7bd1a3edb0b1d558176a806

* Minor fix and decoder class members access.

Summary:
Found and fix a bug in cropping algorithm (simple mistyping).
Also derived classes need access to some decoder class members, like initialization parameters - make it protected.

Reviewed By: stephenyan1231, fmassa

Differential Revision: D19895076

fbshipit-source-id: 691336c8e18526b085ae5792ac3546bc387a6db9

* Added missing header for less dependencies. (#1898)

Summary:
Pull Request resolved: https://github.com/pytorch/vision/pull/1898

Include streams/samplers shouldn't depend on decoder headers. Add dependencies directly to the place where they are required.

Reviewed By: stephenyan1231

Differential Revision: D19911404

fbshipit-source-id: ef322a053708405c02cee4562b456b1602fb12fc

* Implemented VUE Asynchronous Decoder

Summary: For Mothership we have found that asynchronous decoder provides a better performance.

Differential Revision: D20026194

fbshipit-source-id: 627b91844b4e3f917002031dd32cb19c239f4ba8

* fix a bug in API read_video_from_memory (#1942)

Summary:
Pull Request resolved: https://github.com/pytorch/vision/pull/1942

In D18720474, it introduces a bug in `read_video_from_memory` API. Thank weiyaowang for reporting it.

Reviewed By: weiyaowang

Differential Revision: D20270179

fbshipit-source-id: 66348c99a5ad1f9129b90e934524ddfaad59de03

* extend decoder to support new video_max_dimension argument (#1924)

Summary:
Pull Request resolved: https://github.com/pytorch/vision/pull/1924

Extend `video reader` decoder python API in Torchvision to support a new argument `video_max_dimension`. This enables the new video decoding use cases. When setting `video_width=0`, `video_height=0`, `video_min_dimension != 0`, and `video_max_dimension != 0`, we can rescale the video clips so that its spatial resolution (height, width) becomes
 - (video_min_dimension, video_max_dimension) if original height < original width
 - (video_max_dimension, video_min_dimension) if original height >= original width

This is useful at video model testing stage, where we perform fully convolution evaluation and take entire video frames without cropping as input. Previously, for instance we can only set `video_width=0`, `video_height=0`, `video_min_dimension = 128`, which will preserve aspect ratio. In production dataset, there are a small number of videos where aspect ratio is either extremely large or small, and when the shorter edge is rescaled to 128, the longer edge is still large. This will easily cause GPU memory OOM when we sample multiple video clips, and put them in a single minibatch.

Now, we can set (for instance) `video_width=0`, `video_height=0`, `video_min_dimension = 128` and `video_max_dimension = 171` so that the rescale resolution is either (128, 171) or (171, 128) depending on whether original height is larger than original width. Thus, we are less likely to have gpu OOM because the spatial size of video clips is determined.

Reviewed By: putivsky

Differential Revision: D20182529

fbshipit-source-id: f9c40afb7590e7c45e6908946597141efa35f57c

* Fixing samplers initialization (#1967)

Summary:
Pull Request resolved: https://github.com/pytorch/vision/pull/1967



No-ops for torchvision diff, which fixes samplers.

Differential Revision: D20397218

fbshipit-source-id: 6dc4d04364f305fbda7ca4f67a25ceecd73d0f20

* Exclude C++ test files
Co-authored-by: default avatarYuri Putivsky <yuri@fb.com>
Co-authored-by: default avatarZhicheng Yan <zyan3@fb.com>
parent 8b9859d3
#pragma once
#include "stream.h"
#include "subtitle_sampler.h"
namespace ffmpeg {
/**
* Class uses FFMPEG library to decode one subtitle stream.
*/
struct AVSubtitleKeeper : AVSubtitle {
int64_t release{0};
};
class SubtitleStream : public Stream {
public:
SubtitleStream(
AVFormatContext* inputCtx,
int index,
bool convertPtsToWallTime,
const SubtitleFormat& format);
~SubtitleStream() override;
protected:
void setFramePts(DecoderHeader* header, bool flush) override;
private:
int initFormat() override;
int analyzePacket(const AVPacket* packet, bool* gotFrame) override;
int copyFrameBytes(ByteStorage* out, bool flush) override;
void releaseSubtitle();
private:
SubtitleSampler sampler_;
AVSubtitleKeeper sub_;
};
} // namespace ffmpeg
#include "sync_decoder.h"
#include <c10/util/Logging.h>
namespace ffmpeg {
SyncDecoder::AVByteStorage::AVByteStorage(size_t n) {
ensure(n);
}
SyncDecoder::AVByteStorage::~AVByteStorage() {
av_free(buffer_);
}
void SyncDecoder::AVByteStorage::ensure(size_t n) {
if (tail() < n) {
capacity_ = offset_ + length_ + n;
buffer_ = static_cast<uint8_t*>(av_realloc(buffer_, capacity_));
}
}
uint8_t* SyncDecoder::AVByteStorage::writableTail() {
CHECK_LE(offset_ + length_, capacity_);
return buffer_ + offset_ + length_;
}
void SyncDecoder::AVByteStorage::append(size_t n) {
CHECK_LE(n, tail());
length_ += n;
}
void SyncDecoder::AVByteStorage::trim(size_t n) {
CHECK_LE(n, length_);
offset_ += n;
length_ -= n;
}
const uint8_t* SyncDecoder::AVByteStorage::data() const {
return buffer_ + offset_;
}
size_t SyncDecoder::AVByteStorage::length() const {
return length_;
}
size_t SyncDecoder::AVByteStorage::tail() const {
CHECK_LE(offset_ + length_, capacity_);
return capacity_ - offset_ - length_;
}
void SyncDecoder::AVByteStorage::clear() {
offset_ = 0;
length_ = 0;
}
std::unique_ptr<ByteStorage> SyncDecoder::createByteStorage(size_t n) {
return std::make_unique<AVByteStorage>(n);
}
void SyncDecoder::onInit() {
eof_ = false;
queue_.clear();
}
int SyncDecoder::decode(DecoderOutputMessage* out, uint64_t timeoutMs) {
if (eof_ && queue_.empty()) {
return ENODATA;
}
if (queue_.empty()) {
int result = getFrame(timeoutMs);
// assign EOF
eof_ = result == ENODATA;
// check unrecoverable error, any error but ENODATA
if (result && result != ENODATA) {
return result;
}
// still empty
if (queue_.empty()) {
if (eof_) {
return ENODATA;
} else {
LOG(INFO) << "Queue is empty";
return ETIMEDOUT;
}
}
}
*out = std::move(queue_.front());
queue_.pop_front();
return 0;
}
void SyncDecoder::push(DecoderOutputMessage&& buffer) {
queue_.push_back(std::move(buffer));
}
} // namespace ffmpeg
#pragma once
#include <list>
#include "decoder.h"
namespace ffmpeg {
/**
* Class uses FFMPEG library to decode media streams.
* Media bytes can be explicitly provided through read-callback
* or fetched internally by FFMPEG library
*/
class SyncDecoder : public Decoder {
public:
// Allocation of memory must be done with a proper alignment.
class AVByteStorage : public ByteStorage {
public:
explicit AVByteStorage(size_t n);
~AVByteStorage() override;
void ensure(size_t n) override;
uint8_t* writableTail() override;
void append(size_t n) override;
void trim(size_t n) override;
const uint8_t* data() const override;
size_t length() const override;
size_t tail() const override;
void clear() override;
private:
size_t offset_{0};
size_t length_{0};
size_t capacity_{0};
uint8_t* buffer_{nullptr};
};
public:
int decode(DecoderOutputMessage* out, uint64_t timeoutMs) override;
private:
void push(DecoderOutputMessage&& buffer) override;
void onInit() override;
std::unique_ptr<ByteStorage> createByteStorage(size_t n) override;
private:
std::list<DecoderOutputMessage> queue_;
bool eof_{false};
};
} // namespace ffmpeg
#include <c10/util/Logging.h>
#include <dirent.h>
#include <gtest/gtest.h>
#include "memory_buffer.h"
#include "sync_decoder.h"
#include "util.h"
using namespace ffmpeg;
namespace {
struct VideoFileStats {
std::string name;
size_t durationPts{0};
int num{0};
int den{0};
int fps{0};
};
void gotAllTestFiles(
const std::string& folder,
std::vector<VideoFileStats>* stats) {
DIR* d = opendir(folder.c_str());
CHECK(d);
struct dirent* dir;
while ((dir = readdir(d))) {
if (dir->d_type != DT_DIR && 0 != strcmp(dir->d_name, "README")) {
VideoFileStats item;
item.name = folder + '/' + dir->d_name;
LOG(INFO) << "Found video file: " << item.name;
stats->push_back(std::move(item));
}
}
closedir(d);
}
void gotFilesStats(std::vector<VideoFileStats>& stats) {
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.seekAccuracy = 100000;
params.formats = {MediaFormat(0)};
params.headerOnly = true;
params.preventStaleness = false;
size_t avgProvUs = 0;
const size_t rounds = 100;
for (auto& item : stats) {
LOG(INFO) << "Decoding video file in memory: " << item.name;
FILE* f = fopen(item.name.c_str(), "rb");
CHECK(f != nullptr);
fseek(f, 0, SEEK_END);
std::vector<uint8_t> buffer(ftell(f));
rewind(f);
CHECK_EQ(buffer.size(), fread(buffer.data(), 1, buffer.size(), f));
fclose(f);
for (size_t i = 0; i < rounds; ++i) {
SyncDecoder decoder;
std::vector<DecoderMetadata> metadata;
const auto now = std::chrono::steady_clock::now();
CHECK(decoder.init(
params,
MemoryBuffer::getCallback(buffer.data(), buffer.size()),
&metadata));
const auto then = std::chrono::steady_clock::now();
decoder.shutdown();
avgProvUs +=
std::chrono::duration_cast<std::chrono::microseconds>(then - now)
.count();
CHECK_EQ(metadata.size(), 1);
item.num = metadata[0].num;
item.den = metadata[0].den;
item.fps = metadata[0].fps;
item.durationPts =
av_rescale_q(metadata[0].duration, AV_TIME_BASE_Q, {1, item.fps});
}
}
LOG(INFO) << "Probing (us) " << avgProvUs / stats.size() / rounds;
}
size_t measurePerformanceUs(
const std::vector<VideoFileStats>& stats,
size_t rounds,
size_t num,
size_t stride) {
size_t avgClipDecodingUs = 0;
std::srand(time(nullptr));
for (const auto& item : stats) {
FILE* f = fopen(item.name.c_str(), "rb");
CHECK(f != nullptr);
fseek(f, 0, SEEK_END);
std::vector<uint8_t> buffer(ftell(f));
rewind(f);
CHECK_EQ(buffer.size(), fread(buffer.data(), 1, buffer.size(), f));
fclose(f);
for (size_t i = 0; i < rounds; ++i) {
// randomy select clip
size_t rOffset = std::rand();
size_t fOffset = rOffset % item.durationPts;
size_t clipFrames = num + (num - 1) * stride;
if (fOffset + clipFrames > item.durationPts) {
fOffset = item.durationPts - clipFrames;
}
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.seekAccuracy = 100000;
params.preventStaleness = false;
for (size_t n = 0; n < num; ++n) {
std::list<DecoderOutputMessage> msgs;
params.startOffset =
av_rescale_q(fOffset, {1, item.fps}, AV_TIME_BASE_Q);
params.endOffset = params.startOffset + 100;
auto now = std::chrono::steady_clock::now();
SyncDecoder decoder;
CHECK(decoder.init(
params,
MemoryBuffer::getCallback(buffer.data(), buffer.size()),
nullptr));
DecoderOutputMessage out;
while (0 == decoder.decode(&out, params.timeoutMs)) {
msgs.push_back(std::move(out));
}
decoder.shutdown();
const auto then = std::chrono::steady_clock::now();
fOffset += 1 + stride;
avgClipDecodingUs +=
std::chrono::duration_cast<std::chrono::microseconds>(then - now)
.count();
}
}
}
return avgClipDecodingUs / rounds / num / stats.size();
}
void runDecoder(SyncDecoder& decoder) {
DecoderOutputMessage out;
size_t audioFrames = 0, videoFrames = 0, totalBytes = 0;
while (0 == decoder.decode(&out, 10000)) {
if (out.header.format.type == TYPE_AUDIO) {
++audioFrames;
} else if (out.header.format.type == TYPE_VIDEO) {
++videoFrames;
} else if (out.header.format.type == TYPE_SUBTITLE && out.payload) {
// deserialize
LOG(INFO) << "Deserializing subtitle";
AVSubtitle sub;
memset(&sub, 0, sizeof(sub));
EXPECT_TRUE(Util::deserialize(*out.payload, &sub));
LOG(INFO) << "Found subtitles"
<< ", num rects: " << sub.num_rects;
for (int i = 0; i < sub.num_rects; ++i) {
std::string text = "picture";
if (sub.rects[i]->type == SUBTITLE_TEXT) {
text = sub.rects[i]->text;
} else if (sub.rects[i]->type == SUBTITLE_ASS) {
text = sub.rects[i]->ass;
}
LOG(INFO) << "Rect num: " << i << ", type:" << sub.rects[i]->type
<< ", text: " << text;
}
avsubtitle_free(&sub);
}
if (out.payload) {
totalBytes += out.payload->length();
}
}
LOG(INFO) << "Decoded audio frames: " << audioFrames
<< ", video frames: " << videoFrames
<< ", total bytes: " << totalBytes;
}
} // namespace
TEST(SyncDecoder, TestSyncDecoderPerformance) {
// Measure the average time of decoding per clip
// 1. list of the videos in testing directory
// 2. for each video got number of frames with timestamps
// 3. randomly select frame offset
// 4. adjust offset for number frames and strides,
// if it's out out upper boundary
// 5. repeat multiple times, measuring and accumulating decoding time
// per clip.
/*
1) 4 x 2
2) 8 x 8
3) 16 x 8
4) 32 x 4
*/
const std::string kFolder = "pytorch/vision/test/assets/videos";
std::vector<VideoFileStats> stats;
gotAllTestFiles(kFolder, &stats);
gotFilesStats(stats);
const size_t kRounds = 10;
auto new4x2 = measurePerformanceUs(stats, kRounds, 4, 2);
auto new8x8 = measurePerformanceUs(stats, kRounds, 8, 8);
auto new16x8 = measurePerformanceUs(stats, kRounds, 16, 8);
auto new32x4 = measurePerformanceUs(stats, kRounds, 32, 4);
LOG(INFO) << "Clip decoding (us)"
<< ", new(4x2): " << new4x2 << ", new(8x8): " << new8x8
<< ", new(16x8): " << new16x8 << ", new(32x4): " << new32x4;
}
TEST(SyncDecoder, Test) {
SyncDecoder decoder;
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.seekAccuracy = 100000;
params.formats = {MediaFormat(), MediaFormat(0), MediaFormat('0')};
params.uri = "pytorch/vision/test/assets/videos/R6llTwEh07w.mp4";
CHECK(decoder.init(params, nullptr, nullptr));
runDecoder(decoder);
decoder.shutdown();
}
TEST(SyncDecoder, TestSubtitles) {
SyncDecoder decoder;
DecoderParameters params;
params.timeoutMs = 10000;
params.formats = {MediaFormat(), MediaFormat(0), MediaFormat('0')};
params.uri = "vue/synergy/data/robotsub.mp4";
CHECK(decoder.init(params, nullptr, nullptr));
runDecoder(decoder);
decoder.shutdown();
}
TEST(SyncDecoder, TestHeadersOnly) {
SyncDecoder decoder;
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.seekAccuracy = 100000;
params.headerOnly = true;
params.formats = {MediaFormat(), MediaFormat(0), MediaFormat('0')};
params.uri = "pytorch/vision/test/assets/videos/R6llTwEh07w.mp4";
CHECK(decoder.init(params, nullptr, nullptr));
runDecoder(decoder);
decoder.shutdown();
params.uri = "pytorch/vision/test/assets/videos/SOX5yA1l24A.mp4";
CHECK(decoder.init(params, nullptr, nullptr));
runDecoder(decoder);
decoder.shutdown();
params.uri = "pytorch/vision/test/assets/videos/WUzgd7C1pWA.mp4";
CHECK(decoder.init(params, nullptr, nullptr));
runDecoder(decoder);
decoder.shutdown();
}
TEST(SyncDecoder, TestHeadersOnlyDownSampling) {
SyncDecoder decoder;
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.seekAccuracy = 100000;
params.headerOnly = true;
MediaFormat format;
format.type = TYPE_AUDIO;
format.format.audio.samples = 8000;
params.formats.insert(format);
format.type = TYPE_VIDEO;
format.format.video.width = 224;
format.format.video.height = 224;
params.formats.insert(format);
params.uri = "pytorch/vision/test/assets/videos/R6llTwEh07w.mp4";
CHECK(decoder.init(params, nullptr, nullptr));
runDecoder(decoder);
decoder.shutdown();
params.uri = "pytorch/vision/test/assets/videos/SOX5yA1l24A.mp4";
CHECK(decoder.init(params, nullptr, nullptr));
runDecoder(decoder);
decoder.shutdown();
params.uri = "pytorch/vision/test/assets/videos/WUzgd7C1pWA.mp4";
CHECK(decoder.init(params, nullptr, nullptr));
runDecoder(decoder);
decoder.shutdown();
}
TEST(SyncDecoder, TestInitOnlyNoShutdown) {
SyncDecoder decoder;
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.seekAccuracy = 100000;
params.headerOnly = false;
params.formats = {MediaFormat(), MediaFormat(0), MediaFormat('0')};
params.uri = "pytorch/vision/test/assets/videos/R6llTwEh07w.mp4";
std::vector<DecoderMetadata> metadata;
CHECK(decoder.init(params, nullptr, &metadata));
}
TEST(SyncDecoder, TestMemoryBuffer) {
SyncDecoder decoder;
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.endOffset = 9000000;
params.seekAccuracy = 10000;
params.formats = {MediaFormat(), MediaFormat(0), MediaFormat('0')};
FILE* f = fopen(
"pytorch/vision/test/assets/videos/RATRACE_wave_f_nm_np1_fr_goo_37.avi",
"rb");
CHECK(f != nullptr);
fseek(f, 0, SEEK_END);
std::vector<uint8_t> buffer(ftell(f));
rewind(f);
CHECK_EQ(buffer.size(), fread(buffer.data(), 1, buffer.size(), f));
fclose(f);
CHECK(decoder.init(
params,
MemoryBuffer::getCallback(buffer.data(), buffer.size()),
nullptr));
LOG(INFO) << "Decoding from memory bytes: " << buffer.size();
runDecoder(decoder);
decoder.shutdown();
}
TEST(SyncDecoder, TestMemoryBufferNoSeekableWithFullRead) {
SyncDecoder decoder;
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.endOffset = 9000000;
params.seekAccuracy = 10000;
params.formats = {MediaFormat(), MediaFormat(0), MediaFormat('0')};
FILE* f = fopen("pytorch/vision/test/assets/videos/R6llTwEh07w.mp4", "rb");
CHECK(f != nullptr);
fseek(f, 0, SEEK_END);
std::vector<uint8_t> buffer(ftell(f));
rewind(f);
CHECK_EQ(buffer.size(), fread(buffer.data(), 1, buffer.size(), f));
fclose(f);
params.maxSeekableBytes = buffer.size() + 1;
MemoryBuffer object(buffer.data(), buffer.size());
CHECK(decoder.init(
params,
[object](uint8_t* out, int size, int whence, uint64_t timeoutMs) mutable
-> int {
if (out) { // see defs.h file
// read mode
return object.read(out, size);
}
// seek mode
if (!timeoutMs) {
// seek capabilty, yes - no
return -1;
}
return object.seek(size, whence);
},
nullptr));
runDecoder(decoder);
decoder.shutdown();
}
TEST(SyncDecoder, TestMemoryBufferNoSeekableWithPartialRead) {
SyncDecoder decoder;
DecoderParameters params;
params.timeoutMs = 10000;
params.startOffset = 1000000;
params.endOffset = 9000000;
params.seekAccuracy = 10000;
params.formats = {MediaFormat(), MediaFormat(0), MediaFormat('0')};
FILE* f = fopen("pytorch/vision/test/assets/videos/R6llTwEh07w.mp4", "rb");
CHECK(f != nullptr);
fseek(f, 0, SEEK_END);
std::vector<uint8_t> buffer(ftell(f));
rewind(f);
CHECK_EQ(buffer.size(), fread(buffer.data(), 1, buffer.size(), f));
fclose(f);
params.maxSeekableBytes = buffer.size() / 2;
MemoryBuffer object(buffer.data(), buffer.size());
CHECK(!decoder.init(
params,
[object](uint8_t* out, int size, int whence, uint64_t timeoutMs) mutable
-> int {
if (out) { // see defs.h file
// read mode
return object.read(out, size);
}
// seek mode
if (!timeoutMs) {
// seek capabilty, yes - no
return -1;
}
return object.seek(size, whence);
},
nullptr));
}
#include "time_keeper.h"
#include "defs.h"
namespace ffmpeg {
namespace {
const long kMaxTimeBaseDiference = 10;
}
long TimeKeeper::adjust(long& decoderTimestamp) {
const long now = std::chrono::duration_cast<std::chrono::microseconds>(
std::chrono::system_clock::now().time_since_epoch())
.count();
if (startTime_ == 0) {
startTime_ = now;
}
if (streamTimestamp_ == 0) {
streamTimestamp_ = decoderTimestamp;
}
const auto runOut = startTime_ + decoderTimestamp - streamTimestamp_;
if (std::labs((now - runOut) / AV_TIME_BASE) > kMaxTimeBaseDiference) {
streamTimestamp_ = startTime_ - now + decoderTimestamp;
}
const auto sleepAdvised = runOut - now;
decoderTimestamp += startTime_ - streamTimestamp_;
return sleepAdvised > 0 ? sleepAdvised : 0;
}
} // namespace ffmpeg
#pragma once
#include <stdlib.h>
#include <chrono>
namespace ffmpeg {
/**
* Class keeps the track of the decoded timestamps (us) for media streams.
*/
class TimeKeeper {
public:
TimeKeeper() = default;
// adjust provided @timestamp to the corrected value
// return advised sleep time before next frame processing in (us)
long adjust(long& decoderTimestamp);
private:
long startTime_{0};
long streamTimestamp_{0};
};
} // namespace ffmpeg
#include "util.h"
#include <c10/util/Logging.h>
namespace ffmpeg {
namespace Serializer {
// fixed size types
template <typename T>
inline size_t getSize(const T& x) {
return sizeof(x);
}
template <typename T>
inline bool serializeItem(
uint8_t* dest,
size_t len,
size_t& pos,
const T& src) {
VLOG(6) << "Generic serializeItem";
const auto required = sizeof(src);
if (len < pos + required) {
return false;
}
memcpy(dest + pos, &src, required);
pos += required;
return true;
}
template <typename T>
inline bool deserializeItem(
const uint8_t* src,
size_t len,
size_t& pos,
T& dest) {
const auto required = sizeof(dest);
if (len < pos + required) {
return false;
}
memcpy(&dest, src + pos, required);
pos += required;
return true;
}
// AVSubtitleRect specialization
inline size_t getSize(const AVSubtitleRect& x) {
auto rectBytes = [](const AVSubtitleRect& y) -> size_t {
size_t s = 0;
switch (y.type) {
case SUBTITLE_BITMAP:
for (int i = 0; i < y.nb_colors; ++i) {
s += sizeof(y.pict.linesize[i]);
s += y.pict.linesize[i];
}
break;
case SUBTITLE_TEXT:
s += sizeof(size_t);
s += strlen(y.text);
break;
case SUBTITLE_ASS:
s += sizeof(size_t);
s += strlen(y.ass);
break;
default:
break;
}
return s;
};
return getSize(x.x) + getSize(x.y) + getSize(x.w) + getSize(x.h) +
getSize(x.nb_colors) + getSize(x.type) + getSize(x.flags) + rectBytes(x);
}
// AVSubtitle specialization
inline size_t getSize(const AVSubtitle& x) {
auto rectBytes = [](const AVSubtitle& y) -> size_t {
size_t s = getSize(y.num_rects);
for (unsigned i = 0; i < y.num_rects; ++i) {
s += getSize(*y.rects[i]);
}
return s;
};
return getSize(x.format) + getSize(x.start_display_time) +
getSize(x.end_display_time) + getSize(x.pts) + rectBytes(x);
}
inline bool serializeItem(
uint8_t* dest,
size_t len,
size_t& pos,
const AVSubtitleRect& src) {
auto rectSerialize =
[](uint8_t* d, size_t l, size_t& p, const AVSubtitleRect& x) -> size_t {
switch (x.type) {
case SUBTITLE_BITMAP:
for (int i = 0; i < x.nb_colors; ++i) {
if (!serializeItem(d, l, p, x.pict.linesize[i])) {
return false;
}
if (p + x.pict.linesize[i] > l) {
return false;
}
memcpy(d + p, x.pict.data[i], x.pict.linesize[i]);
p += x.pict.linesize[i];
}
return true;
case SUBTITLE_TEXT: {
const size_t s = strlen(x.text);
if (!serializeItem(d, l, p, s)) {
return false;
}
if (p + s > l) {
return false;
}
memcpy(d + p, x.text, s);
p += s;
return true;
}
case SUBTITLE_ASS: {
const size_t s = strlen(x.ass);
if (!serializeItem(d, l, p, s)) {
return false;
}
if (p + s > l) {
return false;
}
memcpy(d + p, x.ass, s);
p += s;
return true;
}
default:
return true;
}
};
return serializeItem(dest, len, pos, src.x) &&
serializeItem(dest, len, pos, src.y) &&
serializeItem(dest, len, pos, src.w) &&
serializeItem(dest, len, pos, src.h) &&
serializeItem(dest, len, pos, src.nb_colors) &&
serializeItem(dest, len, pos, src.type) &&
serializeItem(dest, len, pos, src.flags) &&
rectSerialize(dest, len, pos, src);
}
inline bool serializeItem(
uint8_t* dest,
size_t len,
size_t& pos,
const AVSubtitle& src) {
auto rectSerialize =
[](uint8_t* d, size_t l, size_t& p, const AVSubtitle& x) -> bool {
bool res = serializeItem(d, l, p, x.num_rects);
for (unsigned i = 0; res && i < x.num_rects; ++i) {
res = serializeItem(d, l, p, *(x.rects[i]));
}
return res;
};
VLOG(6) << "AVSubtitle serializeItem";
return serializeItem(dest, len, pos, src.format) &&
serializeItem(dest, len, pos, src.start_display_time) &&
serializeItem(dest, len, pos, src.end_display_time) &&
serializeItem(dest, len, pos, src.pts) &&
rectSerialize(dest, len, pos, src);
}
inline bool deserializeItem(
const uint8_t* src,
size_t len,
size_t& pos,
AVSubtitleRect& dest) {
auto rectDeserialize =
[](const uint8_t* y, size_t l, size_t& p, AVSubtitleRect& x) -> bool {
switch (x.type) {
case SUBTITLE_BITMAP:
for (int i = 0; i < x.nb_colors; ++i) {
if (!deserializeItem(y, l, p, x.pict.linesize[i])) {
return false;
}
if (p + x.pict.linesize[i] > l) {
return false;
}
x.pict.data[i] = (uint8_t*)av_malloc(x.pict.linesize[i]);
memcpy(x.pict.data[i], y + p, x.pict.linesize[i]);
p += x.pict.linesize[i];
}
return true;
case SUBTITLE_TEXT: {
size_t s = 0;
if (!deserializeItem(y, l, p, s)) {
return false;
}
if (p + s > l) {
return false;
}
x.text = (char*)av_malloc(s + 1);
memcpy(x.text, y + p, s);
x.text[s] = 0;
p += s;
return true;
}
case SUBTITLE_ASS: {
size_t s = 0;
if (!deserializeItem(y, l, p, s)) {
return false;
}
if (p + s > l) {
return false;
}
x.ass = (char*)av_malloc(s + 1);
memcpy(x.ass, y + p, s);
x.ass[s] = 0;
p += s;
return true;
}
default:
return true;
}
};
return deserializeItem(src, len, pos, dest.x) &&
deserializeItem(src, len, pos, dest.y) &&
deserializeItem(src, len, pos, dest.w) &&
deserializeItem(src, len, pos, dest.h) &&
deserializeItem(src, len, pos, dest.nb_colors) &&
deserializeItem(src, len, pos, dest.type) &&
deserializeItem(src, len, pos, dest.flags) &&
rectDeserialize(src, len, pos, dest);
}
inline bool deserializeItem(
const uint8_t* src,
size_t len,
size_t& pos,
AVSubtitle& dest) {
auto rectDeserialize =
[](const uint8_t* y, size_t l, size_t& p, AVSubtitle& x) -> bool {
bool res = deserializeItem(y, l, p, x.num_rects);
if (res && x.num_rects) {
x.rects =
(AVSubtitleRect**)av_malloc(x.num_rects * sizeof(AVSubtitleRect*));
}
for (unsigned i = 0; res && i < x.num_rects; ++i) {
x.rects[i] = (AVSubtitleRect*)av_malloc(sizeof(AVSubtitleRect));
memset(x.rects[i], 0, sizeof(AVSubtitleRect));
res = deserializeItem(y, l, p, *x.rects[i]);
}
return res;
};
return deserializeItem(src, len, pos, dest.format) &&
deserializeItem(src, len, pos, dest.start_display_time) &&
deserializeItem(src, len, pos, dest.end_display_time) &&
deserializeItem(src, len, pos, dest.pts) &&
rectDeserialize(src, len, pos, dest);
}
} // namespace Serializer
namespace Util {
std::string generateErrorDesc(int errorCode) {
std::array<char, 1024> buffer;
if (av_strerror(errorCode, buffer.data(), buffer.size()) < 0) {
return std::string("Unknown error code: ") + std::to_string(errorCode);
}
buffer.back() = 0;
return std::string(buffer.data());
}
size_t serialize(const AVSubtitle& sub, ByteStorage* out) {
const auto len = size(sub);
CHECK_LE(len, out->tail());
size_t pos = 0;
if (!Serializer::serializeItem(out->writableTail(), len, pos, sub)) {
return 0;
}
out->append(len);
return len;
}
bool deserialize(const ByteStorage& buf, AVSubtitle* sub) {
size_t pos = 0;
return Serializer::deserializeItem(buf.data(), buf.length(), pos, *sub);
}
size_t size(const AVSubtitle& sub) {
return Serializer::getSize(sub);
}
bool validateVideoFormat(const VideoFormat& f) {
/*
Valid parameters values for decoder
____________________________________________________________________________________
| W | H | minDimension | maxDimension | cropImage | algorithm |
|__________________________________________________________________________________|
| 0 | 0 | 0 | 0 | N/A | original |
|__________________________________________________________________________________|
| >0 | 0 | N/A | N/A | N/A | scale keeping W |
|__________________________________________________________________________________|
| 0 | >0 | N/A | N/A | N/A | scale keeping H |
|__________________________________________________________________________________|
| >0 | >0 | N/A | N/A | 0 | stretch/scale |
|__________________________________________________________________________________|
| >0 | >0 | N/A | N/A | >0 | scale/crop |
|__________________________________________________________________________________|
| 0 | 0 | >0 | 0 | N/A |scale to min dimension |
|__________________________________________________________________________________|
| 0 | 0 | 0 | >0 | N/A |scale to max dimension |
|__________________________________________________________________________________|
| 0 | 0 | >0 | >0 | N/A |stretch to min/max dimension|
|_____|_____|______________|______________|___________|____________________________|
*/
return (f.width == 0 && // #1, #6, #7 and #8
f.height == 0 && f.cropImage == 0) ||
(f.width != 0 && // #4 and #5
f.height != 0 && f.minDimension == 0 && f.maxDimension == 0) ||
(((f.width != 0 && // #2
f.height == 0) ||
(f.width == 0 && // #3
f.height != 0)) &&
f.minDimension == 0 && f.maxDimension == 0 && f.cropImage == 0);
}
void setFormatDimensions(
size_t& destW,
size_t& destH,
size_t userW,
size_t userH,
size_t srcW,
size_t srcH,
size_t minDimension,
size_t maxDimension,
size_t cropImage) {
// rounding rules
// int -> double -> round up
// if fraction is >= 0.5 or round down if fraction is < 0.5
// int result = double(value) + 0.5
// here we rounding double to int according to the above rule
// #1, #6, #7 and #8
if (userW == 0 && userH == 0) {
if (minDimension > 0 && maxDimension == 0) { // #6
if (srcW > srcH) {
// landscape
destH = minDimension;
destW = round(double(srcW * minDimension) / srcH);
} else {
// portrait
destW = minDimension;
destH = round(double(srcH * minDimension) / srcW);
}
}
else if (minDimension == 0 && maxDimension > 0) { // #7
if (srcW > srcH) {
// landscape
destW = maxDimension;
destH = round(double(srcH * maxDimension) / srcW);
} else {
// portrait
destH = maxDimension;
destW = round(double(srcW * maxDimension) / srcH);
}
}
else if (minDimension > 0 && maxDimension > 0) { // #8
if (srcW > srcH) {
// landscape
destW = maxDimension;
destH = minDimension;
} else {
// portrait
destW = minDimension;
destH = maxDimension;
}
}
else { // #1
destW = srcW;
destH = srcH;
}
} else if (userW != 0 && userH == 0) { // #2
destW = userW;
destH = round(double(srcH * userW) / srcW);
} else if (userW == 0 && userH != 0) { // #3
destW = round(double(srcW * userH) / srcH);
destH = userH;
} else { // userW != 0 && userH != 0
if (cropImage == 0) { // #4
destW = userW;
destH = userH;
} else { // #5
double userSlope = double(userH) / userW;
double srcSlope = double(srcH) / srcW;
if (srcSlope < userSlope) {
destW = round(double(srcW * userH) / srcH);
destH = userH;
} else {
destW = userW;
destH = round(double(srcH * userW) / srcW);
}
}
}
// prevent zeros
destW = std::max(destW, 1UL);
destH = std::max(destH, 1UL);
}
} // namespace Util
} // namespace ffmpeg
#pragma once
#include "defs.h"
namespace ffmpeg {
/**
* FFMPEG library utility functions.
*/
namespace Util {
std::string generateErrorDesc(int errorCode);
size_t serialize(const AVSubtitle& sub, ByteStorage* out);
bool deserialize(const ByteStorage& buf, AVSubtitle* sub);
size_t size(const AVSubtitle& sub);
void setFormatDimensions(
size_t& destW,
size_t& destH,
size_t userW,
size_t userH,
size_t srcW,
size_t srcH,
size_t minDimension,
size_t maxDimension,
size_t cropImage);
bool validateVideoFormat(const VideoFormat& format);
} // namespace Util
} // namespace ffmpeg
#include <c10/util/Logging.h>
#include <dirent.h>
#include <gtest/gtest.h>
#include "util.h"
TEST(Util, TestSetFormatDimensions) {
const size_t test_cases[][9] = {
// (userW, userH, srcW, srcH, minDimension, maxDimension, cropImage, destW, destH)
{0, 0, 172, 128, 0, 0, 0, 172, 128}, // #1
{86, 0, 172, 128, 0, 0, 0, 86, 64}, // #2
{64, 0, 128, 172, 0, 0, 0, 64, 86}, // #2
{0, 32, 172, 128, 0, 0, 0, 43, 32}, // #3
{32, 0, 128, 172, 0, 0, 0, 32, 43}, // #3
{60, 50, 172, 128, 0, 0, 0, 60, 50}, // #4
{50, 60, 128, 172, 0, 0, 0, 50, 60}, // #4
{86, 40, 172, 128, 0, 0, 1, 86, 64}, // #5
{86, 92, 172, 128, 0, 0, 1, 124, 92}, // #5
{0, 0, 172, 128, 256, 0, 0, 344, 256}, // #6
{0, 0, 128, 172, 256, 0, 0, 256, 344}, // #6
{0, 0, 128, 172, 0, 344, 0, 256, 344}, // #7
{0, 0, 172, 128, 0, 344, 0, 344, 256}, // #7
{0, 0, 172, 128, 100, 344, 0, 344, 100},// #8
{0, 0, 128, 172, 100, 344, 0, 100, 344} // #8
};
for (const auto& tc : test_cases) {
size_t destW = 0;
size_t destH = 0;
ffmpeg::Util::setFormatDimensions(destW, destH, tc[0], tc[1], tc[2], tc[3], tc[4], tc[5], tc[6]);
CHECK(destW == tc[7]);
CHECK(destH == tc[8]);
}
}
#include "video_sampler.h"
#include <c10/util/Logging.h>
#include "util.h"
// www.ffmpeg.org/doxygen/0.5/swscale-example_8c-source.html
namespace ffmpeg {
namespace {
int preparePlanes(
const VideoFormat& fmt,
const uint8_t* buffer,
uint8_t** planes,
int* lineSize) {
int result;
if ((result = av_image_fill_arrays(
planes,
lineSize,
buffer,
(AVPixelFormat)fmt.format,
fmt.width,
fmt.height,
1)) < 0) {
LOG(ERROR) << "av_image_fill_arrays failed, err: "
<< Util::generateErrorDesc(result);
}
return result;
}
int transformImage(
SwsContext* context,
const uint8_t* const srcSlice[],
int srcStride[],
VideoFormat inFormat,
VideoFormat outFormat,
uint8_t* out,
uint8_t* planes[],
int lines[]) {
int result;
if ((result = preparePlanes(outFormat, out, planes, lines)) < 0) {
return result;
}
if ((result = sws_scale(
context, srcSlice, srcStride, 0, inFormat.height, planes, lines)) <
0) {
LOG(ERROR) << "sws_scale failed, err: " << Util::generateErrorDesc(result);
return result;
}
return 0;
}
} // namespace
VideoSampler::VideoSampler(int swsFlags, int64_t loggingUuid)
: swsFlags_(swsFlags), loggingUuid_(loggingUuid) {}
VideoSampler::~VideoSampler() {
cleanUp();
}
void VideoSampler::shutdown() {
cleanUp();
}
bool VideoSampler::init(const SamplerParameters& params) {
cleanUp();
if (params.out.video.cropImage != 0) {
if (!Util::validateVideoFormat(params.out.video)) {
LOG(ERROR) << "Invalid video format"
<< ", width: " << params.out.video.width
<< ", height: " << params.out.video.height
<< ", format: " << params.out.video.format
<< ", minDimension: " << params.out.video.minDimension
<< ", crop: " << params.out.video.cropImage;
return false;
}
scaleFormat_.format = params.out.video.format;
Util::setFormatDimensions(
scaleFormat_.width,
scaleFormat_.height,
params.out.video.width,
params.out.video.height,
params.in.video.width,
params.in.video.height,
0,
0,
1);
if (!(scaleFormat_ == params_.out.video)) { // crop required
cropContext_ = sws_getContext(
params.out.video.width,
params.out.video.height,
(AVPixelFormat)params.out.video.format,
params.out.video.width,
params.out.video.height,
(AVPixelFormat)params.out.video.format,
swsFlags_,
nullptr,
nullptr,
nullptr);
if (!cropContext_) {
LOG(ERROR) << "sws_getContext failed for crop context";
return false;
}
const auto scaleImageSize = av_image_get_buffer_size(
(AVPixelFormat)scaleFormat_.format,
scaleFormat_.width,
scaleFormat_.height,
1);
scaleBuffer_.resize(scaleImageSize);
}
} else {
scaleFormat_ = params.out.video;
}
VLOG(1) << "Input format #" << loggingUuid_ << ", width "
<< params.in.video.width << ", height " << params.in.video.height
<< ", format " << params.in.video.format << ", minDimension "
<< params.in.video.minDimension << ", cropImage "
<< params.in.video.cropImage;
VLOG(1) << "Scale format #" << loggingUuid_ << ", width "
<< scaleFormat_.width << ", height " << scaleFormat_.height
<< ", format " << scaleFormat_.format << ", minDimension "
<< scaleFormat_.minDimension << ", cropImage "
<< scaleFormat_.cropImage;
VLOG(1) << "Crop format #" << loggingUuid_ << ", width "
<< params.out.video.width << ", height " << params.out.video.height
<< ", format " << params.out.video.format << ", minDimension "
<< params.out.video.minDimension << ", cropImage "
<< params.out.video.cropImage;
scaleContext_ = sws_getContext(
params.in.video.width,
params.in.video.height,
(AVPixelFormat)params.in.video.format,
scaleFormat_.width,
scaleFormat_.height,
(AVPixelFormat)scaleFormat_.format,
swsFlags_,
nullptr,
nullptr,
nullptr);
// set output format
params_ = params;
return scaleContext_ != nullptr;
}
int VideoSampler::sample(
const uint8_t* const srcSlice[],
int srcStride[],
ByteStorage* out) {
int result;
// scaled and cropped image
int outImageSize = av_image_get_buffer_size(
(AVPixelFormat)params_.out.video.format,
params_.out.video.width,
params_.out.video.height,
1);
out->ensure(outImageSize);
uint8_t* scalePlanes[4] = {nullptr};
int scaleLines[4] = {0};
// perform scale first
if ((result = transformImage(
scaleContext_,
srcSlice,
srcStride,
params_.in.video,
scaleFormat_,
// for crop use internal buffer
cropContext_ ? scaleBuffer_.data() : out->writableTail(),
scalePlanes,
scaleLines))) {
return result;
}
// is crop required?
if (cropContext_) {
uint8_t* cropPlanes[4] = {nullptr};
int cropLines[4] = {0};
if (params_.out.video.height < scaleFormat_.height) {
// Destination image is wider of source image: cut top and bottom
for (size_t i = 0; i < 4 && scalePlanes[i] != nullptr; ++i) {
scalePlanes[i] += scaleLines[i] *
(scaleFormat_.height - params_.out.video.height) / 2;
}
} else {
// Source image is wider of destination image: cut sides
for (size_t i = 0; i < 4 && scalePlanes[i] != nullptr; ++i) {
scalePlanes[i] += scaleLines[i] *
(scaleFormat_.width - params_.out.video.width) / 2 /
scaleFormat_.width;
}
}
// crop image
if ((result = transformImage(
cropContext_,
scalePlanes,
scaleLines,
params_.out.video,
params_.out.video,
out->writableTail(),
cropPlanes,
cropLines))) {
return result;
}
}
out->append(outImageSize);
return outImageSize;
}
int VideoSampler::sample(AVFrame* frame, ByteStorage* out) {
if (!frame) {
return 0; // no flush for videos
}
return sample(frame->data, frame->linesize, out);
}
int VideoSampler::sample(const ByteStorage* in, ByteStorage* out) {
if (!in) {
return 0; // no flush for videos
}
int result;
uint8_t* inPlanes[4] = {nullptr};
int inLineSize[4] = {0};
if ((result = preparePlanes(
params_.in.video, in->data(), inPlanes, inLineSize)) < 0) {
return result;
}
return sample(inPlanes, inLineSize, out);
}
void VideoSampler::cleanUp() {
if (scaleContext_) {
sws_freeContext(scaleContext_);
scaleContext_ = nullptr;
}
if (cropContext_) {
sws_freeContext(cropContext_);
cropContext_ = nullptr;
scaleBuffer_.clear();
}
}
} // namespace ffmpeg
#pragma once
#include "defs.h"
namespace ffmpeg {
/**
* Class transcode video frames from one format into another
*/
class VideoSampler : public MediaSampler {
public:
VideoSampler(int swsFlags = SWS_AREA, int64_t loggingUuid = 0);
~VideoSampler() override;
// MediaSampler overrides
bool init(const SamplerParameters& params) override;
int sample(const ByteStorage* in, ByteStorage* out) override;
void shutdown() override;
// returns number processed/scaling bytes
int sample(AVFrame* frame, ByteStorage* out);
int getImageBytes() const;
private:
// close resources
void cleanUp();
// helper functions for rescaling, cropping, etc.
int sample(
const uint8_t* const srcSlice[],
int srcStride[],
ByteStorage* out);
private:
VideoFormat scaleFormat_;
SwsContext* scaleContext_{nullptr};
SwsContext* cropContext_{nullptr};
int swsFlags_{SWS_AREA};
std::vector<uint8_t> scaleBuffer_;
int64_t loggingUuid_{0};
};
} // namespace ffmpeg
#include "video_stream.h"
#include <c10/util/Logging.h>
#include "util.h"
namespace ffmpeg {
namespace {
bool operator==(const VideoFormat& x, const AVFrame& y) {
return x.width == y.width && x.height == y.height && x.format == y.format;
}
bool operator==(const VideoFormat& x, const AVCodecContext& y) {
return x.width == y.width && x.height == y.height && x.format == y.pix_fmt;
}
VideoFormat& toVideoFormat(VideoFormat& x, const AVFrame& y) {
x.width = y.width;
x.height = y.height;
x.format = y.format;
return x;
}
VideoFormat& toVideoFormat(VideoFormat& x, const AVCodecContext& y) {
x.width = y.width;
x.height = y.height;
x.format = y.pix_fmt;
return x;
}
} // namespace
VideoStream::VideoStream(
AVFormatContext* inputCtx,
int index,
bool convertPtsToWallTime,
const VideoFormat& format,
int64_t loggingUuid)
: Stream(
inputCtx,
MediaFormat::makeMediaFormat(format, index),
convertPtsToWallTime,
loggingUuid) {}
VideoStream::~VideoStream() {
if (sampler_) {
sampler_->shutdown();
sampler_.reset();
}
}
int VideoStream::initFormat() {
// set output format
if (!Util::validateVideoFormat(format_.format.video)) {
LOG(ERROR) << "Invalid video format"
<< ", width: " << format_.format.video.width
<< ", height: " << format_.format.video.height
<< ", format: " << format_.format.video.format
<< ", minDimension: " << format_.format.video.minDimension
<< ", crop: " << format_.format.video.cropImage;
return -1;
}
// keep aspect ratio
Util::setFormatDimensions(
format_.format.video.width,
format_.format.video.height,
format_.format.video.width,
format_.format.video.height,
codecCtx_->width,
codecCtx_->height,
format_.format.video.minDimension,
format_.format.video.maxDimension,
0);
if (format_.format.video.format == AV_PIX_FMT_NONE) {
format_.format.video.format = codecCtx_->pix_fmt;
}
return format_.format.video.width != 0 && format_.format.video.height != 0 &&
format_.format.video.format != AV_PIX_FMT_NONE
? 0
: -1;
}
int VideoStream::copyFrameBytes(ByteStorage* out, bool flush) {
if (!sampler_) {
sampler_ = std::make_unique<VideoSampler>(SWS_AREA, loggingUuid_);
}
// check if input format gets changed
if (flush ? !(sampler_->getInputFormat().video == *codecCtx_)
: !(sampler_->getInputFormat().video == *frame_)) {
// - reinit sampler
SamplerParameters params;
params.type = format_.type;
params.out = format_.format;
params.in = FormatUnion(0);
flush ? toVideoFormat(params.in.video, *codecCtx_)
: toVideoFormat(params.in.video, *frame_);
if (!sampler_->init(params)) {
return -1;
}
VLOG(1) << "Set input video sampler format"
<< ", width: " << params.in.video.width
<< ", height: " << params.in.video.height
<< ", format: " << params.in.video.format
<< " : output video sampler format"
<< ", width: " << format_.format.video.width
<< ", height: " << format_.format.video.height
<< ", format: " << format_.format.video.format
<< ", minDimension: " << format_.format.video.minDimension
<< ", crop: " << format_.format.video.cropImage;
}
return sampler_->sample(flush ? nullptr : frame_, out);
}
void VideoStream::setHeader(DecoderHeader* header, bool flush) {
Stream::setHeader(header, flush);
if (!flush) { // no frames for video flush
header->keyFrame = frame_->key_frame;
header->fps = av_q2d(av_guess_frame_rate(
inputCtx_, inputCtx_->streams[format_.stream], nullptr));
}
}
} // namespace ffmpeg
#pragma once
#include "stream.h"
#include "video_sampler.h"
namespace ffmpeg {
/**
* Class uses FFMPEG library to decode one video stream.
*/
class VideoStream : public Stream {
public:
VideoStream(
AVFormatContext* inputCtx,
int index,
bool convertPtsToWallTime,
const VideoFormat& format,
int64_t loggingUuid);
~VideoStream() override;
private:
int initFormat() override;
int copyFrameBytes(ByteStorage* out, bool flush) override;
void setHeader(DecoderHeader* header, bool flush) override;
private:
std::unique_ptr<VideoSampler> sampler_;
};
} // namespace ffmpeg
#include "FfmpegAudioSampler.h"
#include <memory>
#include "FfmpegUtil.h"
using namespace std;
FfmpegAudioSampler::FfmpegAudioSampler(
const AudioFormat& in,
const AudioFormat& out)
: inFormat_(in), outFormat_(out) {}
FfmpegAudioSampler::~FfmpegAudioSampler() {
if (swrContext_) {
swr_free(&swrContext_);
}
}
int FfmpegAudioSampler::init() {
swrContext_ = swr_alloc_set_opts(
nullptr, // we're allocating a new context
av_get_default_channel_layout(outFormat_.channels), // out_ch_layout
static_cast<AVSampleFormat>(outFormat_.format), // out_sample_fmt
outFormat_.samples, // out_sample_rate
av_get_default_channel_layout(inFormat_.channels), // in_ch_layout
static_cast<AVSampleFormat>(inFormat_.format), // in_sample_fmt
inFormat_.samples, // in_sample_rate
0, // log_offset
nullptr); // log_ctx
if (swrContext_ == nullptr) {
LOG(ERROR) << "swr_alloc_set_opts fails";
return -1;
}
int result = 0;
if ((result = swr_init(swrContext_)) < 0) {
LOG(ERROR) << "swr_init failed, err: " << ffmpeg_util::getErrorDesc(result)
<< ", in -> format: " << inFormat_.format
<< ", channels: " << inFormat_.channels
<< ", samples: " << inFormat_.samples
<< ", out -> format: " << outFormat_.format
<< ", channels: " << outFormat_.channels
<< ", samples: " << outFormat_.samples;
return -1;
}
return 0;
}
int64_t FfmpegAudioSampler::getSampleBytes(const AVFrame* frame) const {
auto outSamples = getOutNumSamples(frame->nb_samples);
return av_samples_get_buffer_size(
nullptr,
outFormat_.channels,
outSamples,
static_cast<AVSampleFormat>(outFormat_.format),
1);
}
// https://www.ffmpeg.org/doxygen/3.2/group__lswr.html
unique_ptr<DecodedFrame> FfmpegAudioSampler::sample(const AVFrame* frame) {
if (!frame) {
return nullptr; // no flush for videos
}
auto inNumSamples = frame->nb_samples;
auto outNumSamples = getOutNumSamples(frame->nb_samples);
auto outSampleSize = getSampleBytes(frame);
AvDataPtr frameData(static_cast<uint8_t*>(av_malloc(outSampleSize)));
uint8_t* outPlanes[AVRESAMPLE_MAX_CHANNELS];
int result = 0;
if ((result = av_samples_fill_arrays(
outPlanes,
nullptr, // linesize is not needed
frameData.get(),
outFormat_.channels,
outNumSamples,
static_cast<AVSampleFormat>(outFormat_.format),
1)) < 0) {
LOG(ERROR) << "av_samples_fill_arrays failed, err: "
<< ffmpeg_util::getErrorDesc(result)
<< ", outNumSamples: " << outNumSamples
<< ", format: " << outFormat_.format;
return nullptr;
}
if ((result = swr_convert(
swrContext_,
&outPlanes[0],
outNumSamples,
(const uint8_t**)&frame->data[0],
inNumSamples)) < 0) {
LOG(ERROR) << "swr_convert faield, err: "
<< ffmpeg_util::getErrorDesc(result);
return nullptr;
}
// result returned by swr_convert is the No. of actual output samples.
// So update the buffer size using av_samples_get_buffer_size
result = av_samples_get_buffer_size(
nullptr,
outFormat_.channels,
result,
static_cast<AVSampleFormat>(outFormat_.format),
1);
return make_unique<DecodedFrame>(std::move(frameData), result, 0);
}
/*
Because of decoding delay, the returned value is an upper bound of No. of
output samples
*/
int64_t FfmpegAudioSampler::getOutNumSamples(int inNumSamples) const {
return av_rescale_rnd(
swr_get_delay(swrContext_, inFormat_.samples) + inNumSamples,
outFormat_.samples,
inFormat_.samples,
AV_ROUND_UP);
}
#pragma once
#include "FfmpegSampler.h"
#define AVRESAMPLE_MAX_CHANNELS 32
/**
* Class transcode audio frames from one format into another
*/
class FfmpegAudioSampler : public FfmpegSampler {
public:
explicit FfmpegAudioSampler(const AudioFormat& in, const AudioFormat& out);
~FfmpegAudioSampler() override;
int init() override;
int64_t getSampleBytes(const AVFrame* frame) const;
// FfmpegSampler overrides
// returns number of bytes of the sampled data
std::unique_ptr<DecodedFrame> sample(const AVFrame* frame) override;
const AudioFormat& getInFormat() const {
return inFormat_;
}
private:
int64_t getOutNumSamples(int inNumSamples) const;
AudioFormat inFormat_;
AudioFormat outFormat_;
SwrContext* swrContext_{nullptr};
};
#include "FfmpegAudioStream.h"
#include "FfmpegUtil.h"
using namespace std;
namespace {
bool operator==(const AudioFormat& x, const AVCodecContext& y) {
return x.samples == y.sample_rate && x.channels == y.channels &&
x.format == y.sample_fmt;
}
AudioFormat& toAudioFormat(
AudioFormat& audioFormat,
const AVCodecContext& codecCtx) {
audioFormat.samples = codecCtx.sample_rate;
audioFormat.channels = codecCtx.channels;
audioFormat.format = codecCtx.sample_fmt;
return audioFormat;
}
} // namespace
FfmpegAudioStream::FfmpegAudioStream(
AVFormatContext* inputCtx,
int index,
enum AVMediaType avMediaType,
MediaFormat mediaFormat,
double seekFrameMargin)
: FfmpegStream(inputCtx, index, avMediaType, seekFrameMargin),
mediaFormat_(mediaFormat) {}
FfmpegAudioStream::~FfmpegAudioStream() {}
void FfmpegAudioStream::checkStreamDecodeParams() {
auto timeBase = getTimeBase();
if (timeBase.first > 0) {
CHECK_EQ(timeBase.first, inputCtx_->streams[index_]->time_base.num);
CHECK_EQ(timeBase.second, inputCtx_->streams[index_]->time_base.den);
}
}
void FfmpegAudioStream::updateStreamDecodeParams() {
auto timeBase = getTimeBase();
if (timeBase.first == 0) {
mediaFormat_.format.audio.timeBaseNum =
inputCtx_->streams[index_]->time_base.num;
mediaFormat_.format.audio.timeBaseDen =
inputCtx_->streams[index_]->time_base.den;
}
mediaFormat_.format.audio.duration = inputCtx_->streams[index_]->duration;
}
int FfmpegAudioStream::initFormat() {
AudioFormat& format = mediaFormat_.format.audio;
if (format.samples == 0) {
format.samples = codecCtx_->sample_rate;
}
if (format.channels == 0) {
format.channels = codecCtx_->channels;
}
if (format.format == AV_SAMPLE_FMT_NONE) {
format.format = codecCtx_->sample_fmt;
VLOG(2) << "set stream format sample_fmt: " << format.format;
}
checkStreamDecodeParams();
updateStreamDecodeParams();
if (format.samples > 0 && format.channels > 0 &&
format.format != AV_SAMPLE_FMT_NONE) {
return 0;
} else {
return -1;
}
}
unique_ptr<DecodedFrame> FfmpegAudioStream::sampleFrameData() {
AudioFormat& audioFormat = mediaFormat_.format.audio;
if (!sampler_ || !(sampler_->getInFormat() == *codecCtx_)) {
AudioFormat newInFormat;
newInFormat = toAudioFormat(newInFormat, *codecCtx_);
sampler_ = make_unique<FfmpegAudioSampler>(newInFormat, audioFormat);
VLOG(1) << "Set sampler input audio format"
<< ", samples: " << newInFormat.samples
<< ", channels: " << newInFormat.channels
<< ", format: " << newInFormat.format
<< " : output audio sampler format"
<< ", samples: " << audioFormat.samples
<< ", channels: " << audioFormat.channels
<< ", format: " << audioFormat.format;
int ret = sampler_->init();
if (ret < 0) {
VLOG(1) << "Fail to initialize audio sampler";
return nullptr;
}
}
return sampler_->sample(frame_);
}
#pragma once
#include <utility>
#include "FfmpegAudioSampler.h"
#include "FfmpegStream.h"
/**
* Class uses FFMPEG library to decode one video stream.
*/
class FfmpegAudioStream : public FfmpegStream {
public:
explicit FfmpegAudioStream(
AVFormatContext* inputCtx,
int index,
enum AVMediaType avMediaType,
MediaFormat mediaFormat,
double seekFrameMargin);
~FfmpegAudioStream() override;
// FfmpegStream overrides
MediaType getMediaType() const override {
return MediaType::TYPE_AUDIO;
}
FormatUnion getMediaFormat() const override {
return mediaFormat_.format;
}
int64_t getStartPts() const override {
return mediaFormat_.format.audio.startPts;
}
int64_t getEndPts() const override {
return mediaFormat_.format.audio.endPts;
}
// return numerator and denominator of time base
std::pair<int, int> getTimeBase() const {
return std::make_pair(
mediaFormat_.format.audio.timeBaseNum,
mediaFormat_.format.audio.timeBaseDen);
}
void checkStreamDecodeParams();
void updateStreamDecodeParams();
protected:
int initFormat() override;
std::unique_ptr<DecodedFrame> sampleFrameData() override;
private:
MediaFormat mediaFormat_;
std::unique_ptr<FfmpegAudioSampler> sampler_{nullptr};
};
#include "FfmpegDecoder.h"
#include "FfmpegAudioStream.h"
#include "FfmpegUtil.h"
#include "FfmpegVideoStream.h"
using namespace std;
static AVPacket avPkt;
namespace {
unique_ptr<FfmpegStream> createFfmpegStream(
MediaType type,
AVFormatContext* ctx,
int idx,
MediaFormat& mediaFormat,
double seekFrameMargin) {
enum AVMediaType avType;
CHECK(ffmpeg_util::mapMediaType(type, &avType));
switch (type) {
case MediaType::TYPE_VIDEO:
return make_unique<FfmpegVideoStream>(
ctx, idx, avType, mediaFormat, seekFrameMargin);
case MediaType::TYPE_AUDIO:
return make_unique<FfmpegAudioStream>(
ctx, idx, avType, mediaFormat, seekFrameMargin);
default:
return nullptr;
}
}
} // namespace
FfmpegAvioContext::FfmpegAvioContext()
: workBuffersize_(VIO_BUFFER_SZ),
workBuffer_((uint8_t*)av_malloc(workBuffersize_)),
inputFile_(nullptr),
inputBuffer_(nullptr),
inputBufferSize_(0) {}
int FfmpegAvioContext::initAVIOContext(const uint8_t* buffer, int64_t size) {
inputBuffer_ = buffer;
inputBufferSize_ = size;
avioCtx_ = avio_alloc_context(
workBuffer_,
workBuffersize_,
0,
reinterpret_cast<void*>(this),
&FfmpegAvioContext::readMemory,
nullptr, // no write function
&FfmpegAvioContext::seekMemory);
return 0;
}
FfmpegAvioContext::~FfmpegAvioContext() {
/* note: the internal buffer could have changed, and be != workBuffer_ */
if (avioCtx_) {
av_freep(&avioCtx_->buffer);
av_freep(&avioCtx_);
} else {
av_freep(&workBuffer_);
}
if (inputFile_) {
fclose(inputFile_);
}
}
int FfmpegAvioContext::read(uint8_t* buf, int buf_size) {
if (inputBuffer_) {
return readMemory(this, buf, buf_size);
} else {
return -1;
}
}
int FfmpegAvioContext::readMemory(void* opaque, uint8_t* buf, int buf_size) {
FfmpegAvioContext* h = static_cast<FfmpegAvioContext*>(opaque);
if (buf_size < 0) {
return -1;
}
int reminder = h->inputBufferSize_ - h->offset_;
int r = buf_size < reminder ? buf_size : reminder;
if (r < 0) {
return AVERROR_EOF;
}
memcpy(buf, h->inputBuffer_ + h->offset_, r);
h->offset_ += r;
return r;
}
int64_t FfmpegAvioContext::seek(int64_t offset, int whence) {
if (inputBuffer_) {
return seekMemory(this, offset, whence);
} else {
return -1;
}
}
int64_t FfmpegAvioContext::seekMemory(
void* opaque,
int64_t offset,
int whence) {
FfmpegAvioContext* h = static_cast<FfmpegAvioContext*>(opaque);
switch (whence) {
case SEEK_CUR: // from current position
h->offset_ += offset;
break;
case SEEK_END: // from eof
h->offset_ = h->inputBufferSize_ + offset;
break;
case SEEK_SET: // from beginning of file
h->offset_ = offset;
break;
case AVSEEK_SIZE:
return h->inputBufferSize_;
}
return h->offset_;
}
int FfmpegDecoder::init(
const std::string& filename,
bool isDecodeFile,
FfmpegAvioContext& ioctx,
DecoderOutput& decoderOutput) {
cleanUp();
int ret = 0;
if (!isDecodeFile) {
formatCtx_ = avformat_alloc_context();
if (!formatCtx_) {
LOG(ERROR) << "avformat_alloc_context failed";
return -1;
}
formatCtx_->pb = ioctx.get_avio();
formatCtx_->flags |= AVFMT_FLAG_CUSTOM_IO;
// Determining the input format:
int probeSz = AVPROBE_SIZE + AVPROBE_PADDING_SIZE;
uint8_t* probe((uint8_t*)av_malloc(probeSz));
memset(probe, 0, probeSz);
int len = ioctx.read(probe, probeSz - AVPROBE_PADDING_SIZE);
if (len < probeSz - AVPROBE_PADDING_SIZE) {
LOG(ERROR) << "Insufficient data to determine video format";
av_freep(&probe);
return -1;
}
// seek back to start of stream
ioctx.seek(0, SEEK_SET);
unique_ptr<AVProbeData> probeData(new AVProbeData());
probeData->buf = probe;
probeData->buf_size = len;
probeData->filename = "";
// Determine the input-format:
formatCtx_->iformat = av_probe_input_format(probeData.get(), 1);
// this is to avoid the double-free error
if (formatCtx_->iformat == nullptr) {
LOG(ERROR) << "av_probe_input_format fails";
return -1;
}
VLOG(1) << "av_probe_input_format succeeds";
av_freep(&probe);
ret = avformat_open_input(&formatCtx_, "", nullptr, nullptr);
} else {
ret = avformat_open_input(&formatCtx_, filename.c_str(), nullptr, nullptr);
}
if (ret < 0) {
LOG(ERROR) << "avformat_open_input failed, error: "
<< ffmpeg_util::getErrorDesc(ret);
cleanUp();
return ret;
}
ret = avformat_find_stream_info(formatCtx_, nullptr);
if (ret < 0) {
LOG(ERROR) << "avformat_find_stream_info failed, error: "
<< ffmpeg_util::getErrorDesc(ret);
cleanUp();
return ret;
}
if (!initStreams()) {
LOG(ERROR) << "Cannot activate streams";
cleanUp();
return -1;
}
for (auto& stream : streams_) {
MediaType mediaType = stream.second->getMediaType();
decoderOutput.initMediaType(mediaType, stream.second->getMediaFormat());
}
VLOG(1) << "FfmpegDecoder initialized";
return 0;
}
int FfmpegDecoder::decodeFile(
unique_ptr<DecoderParameters> params,
const string& fileName,
DecoderOutput& decoderOutput) {
VLOG(1) << "decode file: " << fileName;
FfmpegAvioContext ioctx;
int ret = decodeLoop(std::move(params), fileName, true, ioctx, decoderOutput);
return ret;
}
int FfmpegDecoder::decodeMemory(
unique_ptr<DecoderParameters> params,
const uint8_t* buffer,
int64_t size,
DecoderOutput& decoderOutput) {
VLOG(1) << "decode video data in memory";
FfmpegAvioContext ioctx;
int ret = ioctx.initAVIOContext(buffer, size);
if (ret == 0) {
ret =
decodeLoop(std::move(params), string(""), false, ioctx, decoderOutput);
}
return ret;
}
int FfmpegDecoder::probeFile(
unique_ptr<DecoderParameters> params,
const string& fileName,
DecoderOutput& decoderOutput) {
VLOG(1) << "probe file: " << fileName;
FfmpegAvioContext ioctx;
return probeVideo(std::move(params), fileName, true, ioctx, decoderOutput);
}
int FfmpegDecoder::probeMemory(
unique_ptr<DecoderParameters> params,
const uint8_t* buffer,
int64_t size,
DecoderOutput& decoderOutput) {
VLOG(1) << "probe video data in memory";
FfmpegAvioContext ioctx;
int ret = ioctx.initAVIOContext(buffer, size);
if (ret == 0) {
ret =
probeVideo(std::move(params), string(""), false, ioctx, decoderOutput);
}
return ret;
}
void FfmpegDecoder::cleanUp() {
if (formatCtx_) {
for (auto& stream : streams_) {
// Drain stream buffers.
DecoderOutput decoderOutput;
stream.second->flush(1, decoderOutput);
stream.second.reset();
}
streams_.clear();
avformat_close_input(&formatCtx_);
}
}
FfmpegStream* FfmpegDecoder::findStreamByIndex(int streamIndex) const {
auto it = streams_.find(streamIndex);
return it != streams_.end() ? it->second.get() : nullptr;
}
/*
Reference implementation:
https://ffmpeg.org/doxygen/3.4/demuxing_decoding_8c-example.html
*/
int FfmpegDecoder::decodeLoop(
unique_ptr<DecoderParameters> params,
const std::string& filename,
bool isDecodeFile,
FfmpegAvioContext& ioctx,
DecoderOutput& decoderOutput) {
params_ = std::move(params);
int ret = init(filename, isDecodeFile, ioctx, decoderOutput);
if (ret < 0) {
return ret;
}
// init package
av_init_packet(&avPkt);
avPkt.data = nullptr;
avPkt.size = 0;
int result = 0;
bool ptsInRange = true;
while (ptsInRange) {
result = av_read_frame(formatCtx_, &avPkt);
if (result == AVERROR(EAGAIN)) {
VLOG(1) << "Decoder is busy";
ret = 0;
break;
} else if (result == AVERROR_EOF) {
VLOG(1) << "Stream decoding is completed";
ret = 0;
break;
} else if (result < 0) {
VLOG(1) << "av_read_frame fails. Break decoder loop. Error: "
<< ffmpeg_util::getErrorDesc(result);
ret = result;
break;
}
ret = 0;
auto stream = findStreamByIndex(avPkt.stream_index);
if (stream == nullptr) {
// the packet is from a stream the caller is not interested. Ignore it
VLOG(2) << "avPkt ignored. stream index: " << avPkt.stream_index;
// Need to free the memory of AVPacket. Otherwise, memory leak happens
av_packet_unref(&avPkt);
continue;
}
do {
result = stream->sendPacket(&avPkt);
if (result == AVERROR(EAGAIN)) {
VLOG(2) << "avcodec_send_packet returns AVERROR(EAGAIN)";
// start to recevie available frames from internal buffer
stream->receiveAvailFrames(params_->getPtsOnly, decoderOutput);
if (isPtsExceedRange()) {
// exit the most-outer while loop
VLOG(1) << "In all streams, exceed the end pts. Exit decoding loop";
ret = 0;
ptsInRange = false;
break;
}
} else if (result < 0) {
LOG(WARNING) << "avcodec_send_packet failed. Error: "
<< ffmpeg_util::getErrorDesc(result);
ret = result;
break;
} else {
VLOG(2) << "avcodec_send_packet succeeds";
// succeed. Read the next AVPacket and send out it
break;
}
} while (ptsInRange);
// Need to free the memory of AVPacket. Otherwise, memory leak happens
av_packet_unref(&avPkt);
}
/* flush cached frames */
flushStreams(decoderOutput);
return ret;
}
int FfmpegDecoder::probeVideo(
unique_ptr<DecoderParameters> params,
const std::string& filename,
bool isDecodeFile,
FfmpegAvioContext& ioctx,
DecoderOutput& decoderOutput) {
params_ = std::move(params);
return init(filename, isDecodeFile, ioctx, decoderOutput);
}
bool FfmpegDecoder::initStreams() {
for (auto it = params_->formats.begin(); it != params_->formats.end(); ++it) {
AVMediaType mediaType;
if (!ffmpeg_util::mapMediaType(it->first, &mediaType)) {
LOG(ERROR) << "Unknown media type: " << it->first;
return false;
}
int streamIdx =
av_find_best_stream(formatCtx_, mediaType, -1, -1, nullptr, 0);
if (streamIdx >= 0) {
VLOG(2) << "find stream index: " << streamIdx;
auto stream = createFfmpegStream(
it->first,
formatCtx_,
streamIdx,
it->second,
params_->seekFrameMargin);
CHECK(stream);
if (stream->openCodecContext() < 0) {
LOG(ERROR) << "Cannot open codec. Stream index: " << streamIdx;
return false;
}
streams_.emplace(streamIdx, move(stream));
} else {
VLOG(1) << "Cannot open find stream of type " << it->first;
}
}
// Seek frames in each stream
int ret = 0;
for (auto& stream : streams_) {
auto startPts = stream.second->getStartPts();
VLOG(1) << "stream: " << stream.first << " startPts: " << startPts;
if (startPts > 0 && (ret = stream.second->seekFrame(startPts)) < 0) {
LOG(WARNING) << "seekFrame in stream fails";
return false;
}
}
VLOG(1) << "initStreams succeeds";
return true;
}
bool FfmpegDecoder::isPtsExceedRange() {
bool exceed = true;
for (auto& stream : streams_) {
exceed = exceed && stream.second->isFramePtsExceedRange();
}
return exceed;
}
void FfmpegDecoder::flushStreams(DecoderOutput& decoderOutput) {
for (auto& stream : streams_) {
stream.second->flush(params_->getPtsOnly, decoderOutput);
}
}
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment