Thor: a new codec from Cisco (reference implementation available) - Page 2

Parabola · 4th August 2015, 11:49

Hi colours,

This non-separable filter is a Gaussian-like low pass filter. I don't think it is intended to be better than the separable 6-tap, rather it is designed to give the encoder more flexibility.

There are 16 sub pixel offsets per integer MV position. They break down as follows:

(0,0) - the integer position, no filtering - motion compensation is a block copy
[6 positions] - e.g. (0.25, 0) or (0, 0.5) - one direction filtered with short, band-pass-ish filter - prediction may suffer softening, ringing or aliasing
[8 positions] - e.g. (0.25, 0.25) or (0.75, 0.5) - both direction filtered with short, band-pass-ish filter - prediction may suffer softening, ringing or aliasing
(0.5,0.5) - the mid position - motion compensation is a low pass filter - prediction will be soft

By selecting the mid position, the encoder is low-pass filtering the prediction. If motion hits (0.5, 0.5) but low-pass filtering is not beneficial then the encoder would likely use the neighbouring separable vector having lowest cost, e.g. (0.25, 0.5). I doubt this would hit RD badly.

Of course an alternate way of achieving this would be have a separate flag, or index to specify the motion compensation filter to be applied. Pure speculation on my part but perhaps this approach has prior art that the Thor team wishes to avoid. Or perhaps combining the MC filter mode with MVD signalling is cheaper since Thor does not use arithmetic coding.

Kurtnoise · 12th August 2015, 08:08

fyi, Daala's Entropy Coder has been pulled to Thor and vice-versa...

Nice work btw.

From Monty Montgomery...

hajj_3 · 12th August 2015, 08:30

Thor has been officially announced: http://blogs.cisco.com/collaboration...ee-video-codec

pandy · 12th August 2015, 09:34

Quote:

Originally Posted by hajj_3

Thor has been officially announced: http://blogs.cisco.com/collaboration...ee-video-codec

Yupi, another video codec...

mandarinka · 12th August 2015, 21:05

Not necessarily, at this point is is just a draft parallel to Daala. IETF will probably standarddize just one codec through their NetVC programme. It will likely take years anyway.

LigH · 13th August 2015, 13:20

German technology sites and blogs of different topics start to discuss Thor already. Fefe's "conspiracy" blog (or rather a blog of political and social interests) points at the fact that Cisco used to be happy while being a member of the H.264 patent pool, being on the earners' side ... but because H.265 has two independent patent pools, and licenses are not even capped, Cisco got afraid of becoming a main payer this time.

Capitalism is a great concept as long as you make profits.

pandy · 13th August 2015, 14:11

Quote:

Originally Posted by mandarinka

Not necessarily, at this point is is just a draft parallel to Daala. IETF will probably standarddize just one codec through their NetVC programme. It will likely take years anyway.

Well... i believe Thor will be not Daala and Daala wil be not a Thor.
From my perspective Cisco is interested in niche codec capable to perform real time video streaming with reasonable resources usage even with less quality. Daala on other side pursue quality with reasonable resource usage - it almost same but not the same.
Probably both will share some ideas but at some point Daala should be capable to do more where Thor remain mostly niche codec.

Quote:

Originally Posted by LigH

German technology sites and blogs of different topics start to discuss Thor already. Fefe's "conspiracy" blog (or rather a blog of political and social interests) points at the fact that Cisco used to be happy while being a member of the H.264 patent pool, being on the earners' side ... but because H.265 has two independent patent pools, and licenses are not even capped, Cisco got afraid of becoming a main payer this time.

Capitalism is a great concept as long as you make profits.

Accordingly to my knowledge Cisco is no longer under strict H.265 pressure as they selling division (where H.265 seem to be most used) to Technicolor.

LigH · 13th August 2015, 14:20

A codec for a limited purpose may be easier to optimize than a general purpose codec. Not sure which interest Cisco may have mainly, but e.g. "video telephony" with face presentation may be narrow enough to optimize strongly for it.

mandarinka · 13th August 2015, 16:56

I don't think there is that much space to differentiate with the format itself. Most of the needs can be addressed by encoder/decoder configuration/architecture, so it IMHO makes more sense to use a general-purpose format that will benefit from network effects and such.

benwaggoner · 13th August 2015, 21:07

Quote:

Originally Posted by mandarinka

I don't think there is that much space to differentiate with the format itself. Most of the needs can be addressed by encoder/decoder configuration/architecture, so it IMHO makes more sense to use a general-purpose format that will benefit from network effects and such.

Yes, generally a flexible format can be used more narrowly for specific tasks.

However, codecs for extremely low latency don't need a lot of features useful for file-based encoding or streaming. For example, B-frames aren't used in VC since even a single one adds several frames of latency end-to-end. Robust error concealment and correction is also a lot more important than for other scenarios.

pandy · 20th August 2015, 10:32

Quote:

Originally Posted by LigH

A codec for a limited purpose may be easier to optimize than a general purpose codec. Not sure which interest Cisco may have mainly, but e.g. "video telephony" with face presentation may be narrow enough to optimize strongly for it.

Build 3D face model, apply picture of face as texture, do motion capture from video - is there anything more efficient than this?

mandarinka · 20th August 2015, 23:31

Quote:

Originally Posted by pandy

Build 3D face model, apply picture of face as texture, do motion capture from video - is there anything more efficient than this?

Then you subtract that prediction from source and you realize that you have residual that is ten times worse than what you had in 2002 with H.263

pandy · 27th August 2015, 21:59

Quote:

Originally Posted by mandarinka

Then you subtract that prediction from source and you realize that you have residual that is ten times worse than what you had in 2002 with H.263

Maybe and... how this is important from conference perspective?

kuchikirukia · 27th August 2015, 23:56

Quote:

Originally Posted by pandy

Maybe and... how this is important from conference perspective?

If H.263 would be give better quality while being less resource intensive, why wouldn't you just use H.263?

pandy · 31st August 2015, 09:07

Quote:

Originally Posted by kuchikirukia

If H.263 would be give better quality while being less resource intensive, why wouldn't you just use H.263?

I doubt if there is something more efficient (bitrate) than mocap 3d recreation... similarity to audio LPC (functional).

LigH · 31st August 2015, 10:17

Back to the purpose of this thread ... we have a new codec. Who was able to test it?

I'm just trying, and it is a little annoying; at first the lack of documentation from the CLI ('-?' doesn't show a brief help, '-h' doesn't either, just like omitting any parameter); then the assumption from the available config examples that only raw YUV is supported, not even Y4M, and possibly no pipe either, lack of documentation, you know (so I get afraid I may have to waste disk space to create a copy of my Y4M test samples without header). Luckily, thorenc supports the use of config files as input:

Code:

Thorenc -cf config.txt

So reading through the sources a bit ... oh, Y4M is supported in the encoder ... as reconstructed output. And ... oh, as input too, so I may be able to omit these parameters from the config file?

Here are some defaults from enc/strings.c:

Code:

  add_param_to_list(&list, "-cf",                   NULL, ARG_FILENAME, NULL);
  add_param_to_list(&list, "-if",                   NULL, ARG_FILENAME, &params->infilestr);
  add_param_to_list(&list, "-ph",                    "0", ARG_INTEGER,  &params->file_headerlen);
  add_param_to_list(&list, "-fh",                    "0", ARG_INTEGER,  &params->frame_headerlen);
  add_param_to_list(&list, "-of",                   NULL, ARG_FILENAME, &params->outfilestr);
  add_param_to_list(&list, "-rf",                   NULL, ARG_FILENAME, &params->reconfilestr);
  add_param_to_list(&list, "-stat",                 NULL, ARG_FILENAME, &params->statfilestr);
  add_param_to_list(&list, "-n",                   "600", ARG_INTEGER,  &params->num_frames);
  add_param_to_list(&list, "-skip",                  "0", ARG_INTEGER,  &params->skip);
  add_param_to_list(&list, "-width",              "1920", ARG_INTEGER,  &params->width);
  add_param_to_list(&list, "-height",             "1080", ARG_INTEGER,  &params->height);
  add_param_to_list(&list, "-qp",                   "32", ARG_INTEGER,  &params->qp);  
  add_param_to_list(&list, "-f",                    "60", ARG_FLOAT,    &params->frame_rate);
  add_param_to_list(&list, "-lambda_coeffI",       "1.0", ARG_FLOAT,    &params->lambda_coeffI);
  add_param_to_list(&list, "-lambda_coeffP",       "1.0", ARG_FLOAT,    &params->lambda_coeffP);
  add_param_to_list(&list, "-lambda_coeffB",       "1.0", ARG_FLOAT,    &params->lambda_coeffB);
  add_param_to_list(&list, "-early_skip_thr",      "0.0", ARG_FLOAT,    &params->early_skip_thr);
  add_param_to_list(&list, "-enable_tb_split",       "0", ARG_INTEGER,  &params->enable_tb_split);
  add_param_to_list(&list, "-enable_pb_split",       "0", ARG_INTEGER,  &params->enable_pb_split);
  add_param_to_list(&list, "-max_num_ref",           "1", ARG_INTEGER,  &params->max_num_ref);
  add_param_to_list(&list, "-HQperiod",              "1", ARG_INTEGER,  &params->HQperiod);
  add_param_to_list(&list, "-num_reorder_pics",      "0", ARG_INTEGER,  &params->num_reorder_pics);
  add_param_to_list(&list, "-dqpP",                  "0", ARG_INTEGER,  &params->dqpP);
  add_param_to_list(&list, "-dqpB",                  "0", ARG_INTEGER,  &params->dqpB);
  add_param_to_list(&list, "-mqpP",                "1.0", ARG_FLOAT,    &params->mqpP);
  add_param_to_list(&list, "-mqpB",                "1.0", ARG_FLOAT,    &params->mqpB);
  add_param_to_list(&list, "-dqpI",                  "0", ARG_INTEGER,  &params->dqpI);
  add_param_to_list(&list, "-intra_period",          "0", ARG_INTEGER,  &params->intra_period);
  add_param_to_list(&list, "-intra_rdo",             "0", ARG_INTEGER,  &params->intra_rdo);
  add_param_to_list(&list, "-rdoq",                  "0", ARG_INTEGER,  &params->rdoq);
  add_param_to_list(&list, "-max_delta_qp",          "0", ARG_INTEGER,  &params->max_delta_qp);
  add_param_to_list(&list, "-encoder_speed",         "0", ARG_INTEGER,  &params->encoder_speed);
  add_param_to_list(&list, "-deblocking",            "1", ARG_INTEGER,  &params->deblocking);
  add_param_to_list(&list, "-clpf",                  "1", ARG_INTEGER,  &params->clpf);
  add_param_to_list(&list, "-snrcalc",               "1", ARG_INTEGER,  &params->snrcalc);
  add_param_to_list(&list, "-use_block_contexts",    "0", ARG_INTEGER,  &params->use_block_contexts);
  add_param_to_list(&list, "-enable_bipred",         "0", ARG_INTEGER,  &params->enable_bipred);

Let's start with a test file I have available as raw YUV 4:2:0 anyway:

Code:

Thorenc.exe -if Johnny_1280x720_60.yuv -width 1280 -height 720 -f 60 -n 600 -of Johnny_1280x720_60.bit -rf Johnny_1280x720_60.recon.y4m

It does something ... at 2 seconds per frame on an AMD Phenom-II X4. But the LC result looks already quite convenient with the rather low-complexity defaults (not even B frames or multi refs).

Preparing a config for higher efficiency, testing again...

Code:

-if                 Johnny_1280x720_60.yuv                        ; Input yuv sequence
-of                 Johnny_1280x720_60_HE.bit                     ; Output bitstream
-rf                 Johnny_1280x720_60_HE.recon.y4m               ; Reconstructed yuv sequence
-stat               Johnny_1280x720_60_HE.stat.txt                ; Statistics to file
-width              1280                                          ; Witdh of luminance
-height             720                                           ; Height of luminance
-n                  600                                           ; Number of frames to encode
-f                  60                                            ; Frame rate in Hz
-qp                 32                                            ; Quantization parameter
-HQperiod           12                                            ; Period of high quality frames
-mqpP               1.2                                           ; QP multiplier for low quality P frames
-dqpI              -2                                             ; QP offset for intra frames
-lambda_coeffI      1.2                                           ; Multiplier for lambda - I frames
-lambda_coeffP      1.2                                           ; Multiplier for lambda - P frames

;
;High complexity operating point
;
-intra_rdo          1                                             ; Use RDO for choosing intra mode
-enable_tb_split    1                                             ; Enable splitting of a block in 4 transform blocks
-enable_pb_split    1                                             ; Enable splitting of an inter block in 4 prediction blocks
-early_skip_thr     0.3                                           ; Early skip threshold
-max_num_ref        4                                             ; Number of reference frames
-use_block_contexts 1                                             ; Use block contexts
-enable_bipred      1                                             ; Enable biprediction
-encoder_speed      0                                             ; Encoder complexity parameter (0: Slow, 1: Moderate: 2: Fast)

Encoding speed drops to more than 10 seconds per frame. — After 3.5 hours for 600 frames: Less than 3 frames per minute. Well, the x265 encoder for HEVC is extremely optimized, in comparison. But apart from that ... the HE result looks rather convenient, too (just blurred, yet structured, hardly any ringing), at a much smaller size even.

The stats file tells little, just few statistical values, it's not like a 2-pass bitrate distribution stats file:

Code:

 NFR     kbps     PSNRY  PSNRU  PSNRV
 600      161.215 38.010 44.001 44.578

I doubt I will easily do "same size" comparisons very soon, due to the very low encoding speed, but maybe in some future... a MediaFire shareable folder is reserved for samples.

Havokdan · 1st September 2015, 19:12

Maybe offtopic: http://aomedia.org

Quote:

WHAT IS THE ALLIANCE?
The Alliance for Open Media is founded by leading Internet companies focused on developing next-generation media formats, codecs and technologies in the public interest. The new Alliance is committing its collective technology and expertise to meet growing Internet demand for top-quality video, audio, imagery and streaming across devices of all kinds and for users worldwide.

The initial project will pursue a new, open royalty-free video codec specification and open-source implementation based on the contributions of members, along with binding specifications for media format, content encryption and adaptive streaming, thereby creating opportunities for next-generation media experiences.

Day one founding members are Amazon, Cisco, Google, Intel Corporation, Microsoft, Mozilla and Netflix.

BadFrame · 2nd September 2015, 04:56

Quote:

Originally Posted by Havokdan

Maybe offtopic: http://aomedia.org

This needs it's own topic I think, it's quite a bomb in the video codec world.

dapperdan · 2nd September 2015, 10:45

I gathered a few of the announcement posts about the Alliance and started a thread here:

http://forum.doom9.org/showthread.php?t=172550

mandarinka · 23rd September 2015, 23:29

Presentation about Thor from VDD 2015: https://www.youtube.com/watch?v=g6m_N3QlqOI

I didn't yet have time to watch it whole, but it seems that Thor is basically something that Cisco originally proposed for HEVC, but with lower complexity (which means the lack of CABAC-grade entropy coding is intentional, meh!), and now it is basically being recycled. Which also means that Thor as a format is not going to be competitive with HEVC, even if it had a competitively-tuned encoder available which it AFAIK doesn't have anyway.

I hope the influence of these low-complexity aims won't cripple NetVC's overall compression strength, as Thor and Cisco is another input into the NetVC program apart from Daala.

4th August 2015, 11:49	#21 \| Link
Parabola Registered User Join Date: Nov 2012 Posts: 41	Hi colours, This non-separable filter is a Gaussian-like low pass filter. I don't think it is intended to be better than the separable 6-tap, rather it is designed to give the encoder more flexibility. There are 16 sub pixel offsets per integer MV position. They break down as follows: (0,0) - the integer position, no filtering - motion compensation is a block copy [6 positions] - e.g. (0.25, 0) or (0, 0.5) - one direction filtered with short, band-pass-ish filter - prediction may suffer softening, ringing or aliasing [8 positions] - e.g. (0.25, 0.25) or (0.75, 0.5) - both direction filtered with short, band-pass-ish filter - prediction may suffer softening, ringing or aliasing (0.5,0.5) - the mid position - motion compensation is a low pass filter - prediction will be soft By selecting the mid position, the encoder is low-pass filtering the prediction. If motion hits (0.5, 0.5) but low-pass filtering is not beneficial then the encoder would likely use the neighbouring separable vector having lowest cost, e.g. (0.25, 0.5). I doubt this would hit RD badly. Of course an alternate way of achieving this would be have a separate flag, or index to specify the motion compensation filter to be applied. Pure speculation on my part but perhaps this approach has prior art that the Thor team wishes to avoid. Or perhaps combining the MC filter mode with MVD signalling is cheaper since Thor does not use arithmetic coding. __________________ John @ Parabola Research Limited - HEVC conformance and technology http://www.parabolaresearch.com/

13th August 2015, 13:20	#26 \| Link
LigH German doom9/Gleitz SuMo Join Date: Oct 2001 Location: Germany, rural Altmark Posts: 6,784	German technology sites and blogs of different topics start to discuss Thor already. Fefe's "conspiracy" blog (or rather a blog of political and social interests) points at the fact that Cisco used to be happy while being a member of the H.264 patent pool, being on the earners' side ... but because H.265 has two independent patent pools, and licenses are not even capped, Cisco got afraid of becoming a main payer this time. Capitalism is a great concept as long as you make profits. __________________ New German Gleitz board MediaFire: x264 \| x265 \| VPx \| AOM \| Xvid

13th August 2015, 14:20	#28 \| Link
LigH German doom9/Gleitz SuMo Join Date: Oct 2001 Location: Germany, rural Altmark Posts: 6,784	A codec for a limited purpose may be easier to optimize than a general purpose codec. Not sure which interest Cisco may have mainly, but e.g. "video telephony" with face presentation may be narrow enough to optimize strongly for it. __________________ New German Gleitz board MediaFire: x264 \| x265 \| VPx \| AOM \| Xvid

12th August 2015, 08:08	#22 \| Link
Kurtnoise Swallowed in the Sea Join Date: Oct 2002 Location: Aix-en-Provence, France Posts: 5,191	fyi, Daala's Entropy Coder has been pulled to Thor and vice-versa... Nice work btw. From Monty Montgomery...

12th August 2015, 08:30	#23 \| Link
hajj_3 Registered User Join Date: Mar 2004 Posts: 1,126	Thor has been officially announced: http://blogs.cisco.com/collaboration...ee-video-codec

12th August 2015, 21:05	#25 \| Link
mandarinka Registered User Join Date: Jan 2007 Posts: 729	Not necessarily, at this point is is just a draft parallel to Daala. IETF will probably standarddize just one codec through their NetVC programme. It will likely take years anyway.

13th August 2015, 16:56	#29 \| Link
mandarinka Registered User Join Date: Jan 2007 Posts: 729	I don't think there is that much space to differentiate with the format itself. Most of the needs can be addressed by encoder/decoder configuration/architecture, so it IMHO makes more sense to use a general-purpose format that will benefit from network effects and such.

2nd September 2015, 10:45	#39 \| Link
dapperdan Registered User Join Date: Aug 2009 Posts: 201	I gathered a few of the announcement posts about the Alliance and started a thread here: http://forum.doom9.org/showthread.php?t=172550

23rd September 2015, 23:29	#40 \| Link
mandarinka Registered User Join Date: Jan 2007 Posts: 729	Presentation about Thor from VDD 2015: https://www.youtube.com/watch?v=g6m_N3QlqOI I didn't yet have time to watch it whole, but it seems that Thor is basically something that Cisco originally proposed for HEVC, but with lower complexity (which means the lack of CABAC-grade entropy coding is intentional, meh!), and now it is basically being recycled. Which also means that Thor as a format is not going to be competitive with HEVC, even if it had a competitively-tuned encoder available which it AFAIK doesn't have anyway. I hope the influence of these low-complexity aims won't cripple NetVC's overall compression strength, as Thor and Cisco is another input into the NetVC program apart from Daala.