Knowledge required for writing a video codec? [Archive]

View Full Version : Knowledge required for writing a video codec?

orion44

3rd February 2015, 06:09

Hi,

I was wondering what level of knowledge in math and programming is required for one
to be able to write a video codec?

The highest level of math that I presently understand is pre-calculus mathematics.
What are the specific names of the high level maths that I must learn to be able to write a video codec?

What programming languages do I need to learn?

Is there any other area, beside math and programming, that I need to learn?

Please recommend some books to get me started.

Thank you.

LoRd_MuldeR

3rd February 2015, 20:51

I think most video encoders are written in C or C++ noways - with a good portion of Assembler (x86, ARM, etc) for the "performance critical" functions.

As for the maths, I think a profound understanding of signal processing (especially frequency transforms, like FFT or DCT) and how to solve optimization problems will be essential here.

Xiph.org has some great introduction videos on "the technical foundations of modern digital media", which you might want to check out: http://xiph.org/video/

orion44

3rd February 2015, 23:19

Thank you.

pandy

5th February 2015, 10:33

My two cents - as video codec is very wide term - knowledge required to write codec may be different - from general programmer perspective almost no math required (as you can use some no compressed or loss-lessly compressed trough one of general compression libraries - knowledge required is more or less open file, seek over file, read/write, close file, API call) to more fancy one (specialized video compression but still relatively simple task) to fancy and highly advanced video coding (Lord Mulder described this a bit).
In fact you may do own transformation in YCgCo (http://en.wikipedia.org/wiki/YCgCo) style and call this a new video codec - purely up to You.

Katie Boundary

6th February 2015, 03:50

Yeah, those discrete cosine transforms are pretty damn important for video compression.

*.mp4 guy

8th February 2015, 01:28

The truth is, very very few, if any, people fully understand every part of a modern lossy codec at its deepest level. Not many of those who could implement, say, a dct from scratch, in fact really understand why dct's are used, other than perhaps citing a paper showing "superior energy compaction properties". Why does the dct have those properties? Why is that good? - If everyone working on programming video codecs had to really understand them, very little work would get done.

If all you want to do is /implement/ a video codec, you can get by with "just" extremely good algebra, coding, and specification-reading skills. If you want to design any fundamental parts from scratch, you will essentially need a B.A. in mathematics, with a focus on information theory and related fields, such as signal processing, as a minimum.

pandy

9th February 2015, 13:57

Yeah, those discrete cosine transforms are pretty damn important for video compression.

H.264* (AFAIR also H.265) don't use DCT at all - there is "plenty" of the various transforms that can be used instead DCT.

New transform design features, including:

An exact-match integer 4×4 spatial block transform, allowing precise placement of residual signals with little of the "ringing" often found with prior codec designs. This design is conceptually similar to that of the well-known discrete cosine transform (DCT), introduced in 1974 by N. Ahmed, T.Natarajan and K.R.Rao, which is Citation 1 in Discrete cosine transform. However, it is simplified and made to provide exactly specified decoding.
An exact-match integer 8×8 spatial block transform, allowing highly correlated regions to be compressed more efficiently than with the 4×4 transform. This design is conceptually similar to that of the well-known DCT, but simplified and made to provide exactly specified decoding.
Adaptive encoder selection between the 4×4 and 8×8 transform block sizes for the integer transform operation.
A secondary Hadamard transform performed on "DC" coefficients of the primary spatial transform applied to chroma DC coefficients (and also luma in one special case) to obtain even more compression in smooth regions.

http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC#Features

vivan

9th February 2015, 14:32

Using integer approximation of DCT is not "not using DCT at all".
If there were plenty of others, others would've used them instead of copypasting H.264 spec ;)

nevcairiel

9th February 2015, 14:39

The application is really the same, you just use specialized approximations for performance reasons.

feisty2

9th February 2015, 14:42

Using integer approximation of DCT is not "not using DCT at all".
If there were plenty of others, others would've used them instead of copypasting H.264 spec ;)

wavelet maybe?
jpeg2k picks it

vivan

9th February 2015, 15:05

Yeah, wavelets are close to be real alternative, but http://x264dev.multimedia.cx/archives/317
There were attempts to use them - but all of them (Dirac, Snow) are dead.

feisty2

9th February 2015, 15:25

ah... it's a shame to know the only alternative I ever heard doesn't work as expected... ;P

filler56789

9th February 2015, 15:28

And for a distant and uncertain future......

http://en.wikipedia.org/wiki/Fractal_compression

feisty2

9th February 2015, 15:41

Once Upon a Time, I heard "fractal" might be the key to the high quality upscaling... then I found a commercial software that claims it uses fractal algorithm to upscale images, and the result looks the same to those over warpsharped images to me...

pandy

10th February 2015, 09:53

Using integer approximation of DCT is not "not using DCT at all".
If there were plenty of others, others would've used them instead of copypasting H.264 spec ;)

Same true as saying that everything is some way of 2D DFT...
DCT is DCT, DST is DST, Walsh Hadamard is Walsh Hadamard.
Even if H.264 use DCT descent integer wise implementation this is not DCT, even if it looks like DCT at the first shoot.

And btw - future is vector ;)

nhakobian

11th February 2015, 03:54

No matter what technique someone uses to write a codec, it is a good idea to understand the techniques that have come before it, and hopefully try to improve upon them.

The original question seemed to ask whether pre-calculus would be enough to understand these concepts. Signal processing techniques, whether they be derived from DCTs or Wavelets are highly dependent and evolved from calculus techniques. Its important to understand the concepts in derivative and integral calculus, then build on that with Fourier Transforms first.

One thing you might find surprising, is that math you already know will make much more sense. Those volumetric functions (like volume of a sphere), determined with calculus. All those "strange" relationships between sin, cos, and tan, will be defined with calculus. Some parts of it will be a pain (learning all the techniques to integrate odd functions) and some are really only useful with certain subsets of math and physics, but I would say going at least until Fourier Theory is introduced is a must. A lot of EE/Physics/Astro departments have courses on signal processing, and they all require a minimum of 2-3 semesters of calculus to really understand.

foxyshadis

11th February 2015, 10:32

Unless your goal is to reinvent the DCT, I don't see how advanced math is important, although it can help you understand potential pitfalls before you run into them. It's OK to use an off-the-shelf implementation, especially when you're first starting. When it comes to writing a basic video codec, I don't think math is anywhere near most important, and fetishizing that to the exclusion of more practical knowledge is silly. Entirely new and groundbreaking codecs, the kind that Xiph specializes in, are the ones that need all the math chops you can muster, but H.265 and VP9 prove how far you can go pairing lots of other interesting ideas with off-the-shelf math.

I'd put the most important as: Solid experience working with video, so you know what it's like and what pathologies you'll encounter; solid experience in a major programming language; the ability to read and reimplement interesting ideas from academic papers, which will probably require familiarity with at least sums and basic trig; and solid knowledge of at least one video standard, preferably several. Implementing at least a working subset of an existing video standard is the best crash course in video codec development you can get; you can probably make an intra-only MPEG-1 codec in a few days.

The more esoteric your knowledge, the more you might be able to fine-tune your codec, but to get started you need a basic but broad grounding in a lot of areas.

pandy

11th February 2015, 10:56

The more esoteric your knowledge, the more you might be able to fine-tune your codec, but to get started you need a basic but broad grounding in a lot of areas.

true - HEVC (H.265) is example - there is nothing new (i mean there is no real break trough approach only more complexity added - so gain is by higher computational complexity not by radical new approach).

Radical new approach probably need involve neural networks and image segmentation etc - this is beyond possibility of modern HW/SW. To exploit this probably also basic medical knowledge is required (human visual system studies) - problem is trivial (solution not) - invent codec that behave like human brain ;) .