Log in

View Full Version : x264 development


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 [35] 36 37 38 39 40 41 42 43

rallymax
26th September 2011, 18:43
I want to build an encoding rig too with either a 2600K or Bulldozer 8150. I'm in no hurry though so I'll wait until Bulldozer is out before making a decision. Are you in a hurry? If you're not, you may as well wait and see how Bulldozer performs.

w.r.t. your Intel choice: Are you going for the "K" version because of the Intel HD 3000 vs 2000 GPU? Because if you're not then this "Intel Core i7-2600 vs 2600K" (http://ark.intel.com/compare/52213,52214) comparison on the Intel website shows you might as well get the 2600 coz it's exactly the same in all other categories, is cheaper and has things like Vpro that the 2600K does not.

Also at The Intel Developer's Forum ("IDF") (http://www.intel.com/idf/) last week I saw the new Sandy Bridge CPU & chipset that's coming out in NOVEMBER '11 with 4x 1600MHz DDR3 Memory Busses vs the 2600's 2 BUSSES. If memory throughput is important to you then you may want to wait for 2 months.

My 2c.

Dark Shikari
26th September 2011, 18:58
nice to see that you have got your hands on a bulldozer cpu. Will there be new optimisations for bulldozer within a few weeks of bulldozer's release?

Can't wait to see the speed of bulldozer compared to intel.

I saw some benchmarks yesterday that show the highend bulldozer 8 core (£200) is about the same speed with x264 as the highend intel core i7 990x which is £800. It is supposed to be around 20% faster than the core i7 2600k SKT1155 (£240) too.

Have you got your hands on the bulldozer 16 core server cpu too?I have access to a 2x16 core Bulldozer server, plus AMD is mailing us a physical system.

Yes, there will be XOP optimizations. Yes, they will help, at least a % or 2.

vivan
26th September 2011, 20:07
rallymax,
"K" = unlocked multiplier.

Ice009
26th September 2011, 21:03
w.r.t. your Intel choice: Are you going for the "K" version because of the Intel HD 3000 vs 2000 GPU? Because if you're not then this "Intel Core i7-2600 vs 2600K" (http://ark.intel.com/compare/52213,52214) comparison on the Intel website shows you might as well get the 2600 coz it's exactly the same in all other categories, is cheaper and has things like Vpro that the 2600K does not.

Also at The Intel Developer's Forum ("IDF") (http://www.intel.com/idf/) last week I saw the new Sandy Bridge CPU & chipset that's coming out in NOVEMBER '11 with 4x 1600MHz DDR3 Memory Busses vs the 2600's 2 BUSSES. If memory throughput is important to you then you may want to wait for 2 months.

My 2c.

I'm looking for a good value solution so that rules out socket 2011, which will probably be as expensive as X58 was on release. The main reason I'd choose the K model is for unlocked multipliers. I think either the 2600K or Bulldozer 8150 would be the best value for money for an encoding box.

rallymax
26th September 2011, 22:00
rallymax,
"K" = unlocked multiplier.

ah.
thx for the education.

mandarinka
26th September 2011, 22:44
In other words, not a very good value for the money if you aren't willing to go overclocking.

Blue_MiSfit
27th September 2011, 05:57
Woooow.... 2x 16 core. That sounds like an awful lot of fun to me!

G_M_C
28th September 2011, 14:19
[...]But also for a short question of mine, on 10bit: When i use an 8-bit source on 10-bit x264. When does x264 do the converting from 8-bit to 10-bit ? Is that done before the filtering stage or after ?
( ex:8bit input -> conversion to 10bit -> resize -> encoding stage )

After. Depth filter is the last in chain (before sending data to libx264).

K, thx :)

But doesnt conversion to 10-bit, before resizing theorethically lead to better quality ?

The problem is probably that all of the filters are 8 bit only.

Expanding on this short discussion;
I see a lot of 1080p ->720p encodes in the wild. For myself i have some captures in 1080i that I deinterlace and resize to 720p.

Concluding; x264's build in resizing-filter/crop-filter is used relatively often.

Would i be correct in thinking that making x264's resize/crop filters able to work in higher bitdepth (*) could help further improve x264 overall quality (as perceived by people that watch the encodes).

(*) i.e. moving the 8->10bit conversion to the beginning, in stead of it beeing the last step before encoding, and making the pipeline in between abble to process in a higher bit-depth (16-bit integer or something ?).

MasterNobody
28th September 2011, 17:05
Would i be correct in thinking that making x264's resize/crop filters able to work in higher bitdepth
x264's resize/crop filters already can work with high bit depth (you can use 'csp' param of resize filter and specify depth=16). The question is, are this official filters enough for you or not (and is this worth it).

G_M_C
28th September 2011, 19:12
x264's resize/crop filters already can work with high bit depth (you can use 'csp' param of resize filter and specify depth=16). The question is, are this official filters enough for you or not (and is this worth it).

For basic resizing it is good enough i think. Avisynth doesnt work in higher bitdepth yet afaik.

Cman21
30th September 2011, 02:48
question: will there be any quality gain encoding a 4:2:0 Blu-Ray to 4:2:2 or does the source need to be 4:2:2 to retain quality? i dont fully understand how the chroma sampling works so thought i would just ask.

rallymax
30th September 2011, 03:38
question: will there be any quality gain encoding a 4:2:0 Blu-Ray to 4:2:2 or does the source need to be 4:2:2 to retain quality? i dont fully understand how the chroma sampling works so thought i would just ask.

no gain. the damage is done to the src footage already.
This Wikipedia YUV (http://en.wikipedia.org/wiki/Yuv#Luminance.2Fchrominance_systems_in_general) page is an ok description of 4:4:4, 4:2:2 and 4:2:0

Cman21
30th September 2011, 04:50
no gain. the damage is done to the src footage already.
This Wikipedia YUV (http://en.wikipedia.org/wiki/Yuv#Luminance.2Fchrominance_systems_in_general) page is an ok description of 4:4:4, 4:2:2 and 4:2:0

ya i figured as much from what i could kinda put together from my own research, thanks anyway i guess.

ajp_anton
30th September 2011, 21:28
If you downsample it to 720p or even smaller, you might gain something.
1080p has 960x540 of chroma resolution, keep as much of it as you can =).

hajj_3
12th October 2011, 08:33
Now that the NDA for bulldozer is over and is available for pre-order will you be releasing an updated version of x264 with bulldozer optimisations anytime soon?

Would be nice to see benchmarks of an optimised x264 compared to v2085 build.

the_weirdo
12th October 2011, 08:55
Now that the NDA for bulldozer is over and is available for pre-order will you be releasing an updated version of x264 with bulldozer optimisations anytime soon?


You might find the answer here:
http://forum.doom9.org/showthread.php?p=1530754#post1530754

rallymax
12th October 2011, 19:35
If I'm feeding i444 to x264 is there any upside in feeding in 10bit vs 8bit if the output is Bluray (ie i420 8bit)?
x264 is doing no scaling or cropping.
thx

rallymax
12th October 2011, 19:36
is x264bluray.com's x264 for bluray settings accurate these days?
specifically the
--open-gop
--pulldown double and 32
--fake-interlaced

thx

sneaker_ger
12th October 2011, 19:45
Yes, it is.

rallymax
12th October 2011, 20:02
Yes, it is.

thx!! :thanks:

benwaggoner
13th October 2011, 19:56
If I'm feeding i444 to x264 is there any upside in feeding in 10bit vs 8bit if the output is Bluray (ie i420 8bit)?
x264 is doing no scaling or cropping.
thx
In theory, a codec could make use of the 10-bit luma range in figuring out how to most efficiently preserve gradients etectera, instead of the codec-blind brute-force dithering that normally happens. Basically it would do a rate-distortion optimization finding the 8-bit output that would best match the 10-bit input. In a lot of cases, I bet the codec could discover that adjoining pixels could be more similar than a typical dithering algorithm, reducing low amplitude high frequencies a bit.

I don't know if x264 does that, but it sure would be cool if it did :). But it probably not the kind of feature one would knock off in a weekend.

kieranrk
13th October 2011, 22:24
In theory, a codec could make use of the 10-bit luma range in figuring out how to most efficiently preserve gradients etectera, instead of the codec-blind brute-force dithering that normally happens. Basically it would do a rate-distortion optimization finding the 8-bit output that would best match the 10-bit input. In a lot of cases, I bet the codec could discover that adjoining pixels could be more similar than a typical dithering algorithm, reducing low amplitude high frequencies a bit.

I don't know if x264 does that, but it sure would be cool if it did :). But it probably not the kind of feature one would knock off in a weekend.

I thought about doing this at VDD but Dark_Shikari said it wouldn't help much

Dark Shikari
14th October 2011, 00:37
Well, we can't be quite sure without trying, but more practically there's the issue that 10-bit sources are incredibly rare, so it wouldn't affect the vast majority of users.

wlee15
14th October 2011, 22:28
AMD has publicly posted the BIOS and Kernel Developer’s Guide for Bulldozer. I imagine that the performance monitoring section might be useful for x264 development (although I hope that AMD had already provided the documents).

http://support.amd.com/us/Processor_TechDocs/42301.pdf

Biggiesized
15th October 2011, 04:22
In theory, a codec could make use of the 10-bit luma range in figuring out how to most efficiently preserve gradients etectera, instead of the codec-blind brute-force dithering that normally happens. Basically it would do a rate-distortion optimization finding the 8-bit output that would best match the 10-bit input. In a lot of cases, I bet the codec could discover that adjoining pixels could be more similar than a typical dithering algorithm, reducing low amplitude high frequencies a bit.

I don't know if x264 does that, but it sure would be cool if it did :). But it probably not the kind of feature one would knock off in a weekend.

What do professional MPEG-2 broadcast encoders do?

kieranrk
15th October 2011, 12:27
What do professional MPEG-2 broadcast encoders do?

Most organisations using an MPEG-2 broadcast encoder would not be using a "true" 10-bit playout system - the playout system would be based on MPEG-2 intra most likely and not be 10-bit.

Blue_MiSfit
15th October 2011, 22:48
Indeed. All of the playout servers I work with use some form of "long" GOP MPEG-2. XDCAM HD422, generic MPEG-2, Omneon etc... it's all 8 bit. Of course, it gets played out over (10 bit) HD-SDI, so I would imagine the extra bits simply become padding? Assuming this, a downstream broadcast encoder would simply ignore the top bits? Things could get more complex with graphics devices between the playout server and the broadcast encoder, since these could possibly produce true 10 bit video...

I imagine some facilities playout 10 bit (DNxHD maybe, or AVC Intra), but I doubt most would have the budget for the storage this would entail...

Derek

benwaggoner
17th October 2011, 01:02
Well, we can't be quite sure without trying, but more practically there's the issue that 10-bit sources are incredibly rare, so it wouldn't affect the vast majority of users.
It depends on where you are in the production chain. Most TV and movies are produced in >8-bit, and lots of production codecs (Cineform, DNxHD, ProRes) can operate in 10-bit or even 12-bit modes. If there were encoders that could improve efficiency by using 10-bit sources directly, they would be used with 10-bit sources in many professional workflows.

jpsdr
20th October 2011, 14:03
Any status about the future treillis patch for witch subme=11 has been added ?
A very very rough ETA ?
Just to be clear : I'm not asking to push or anything, i'm not in a hurry, i'll wait the needed time, but i just would like to eventualy know how about to plan my jobs.
If you said : Maybe in one month or two, i'll may halt what i have to do, waiting for it, allowing me to do better quality encodes. If you said : Probably not before 6 months, i'll arange and plan things differently.
But not knowing and being in the fog...

hajj_3
22nd October 2011, 17:21
v2106 has just been released, it contains the bulldozer improvements.

Hope someone with a bulldozer cpu can do some benchmarks of 720p and 1080p encodes with the v2085 and v2106 builds to compare the speed.

TheRyuu
23rd October 2011, 06:14
v2106 has just been released, it contains the bulldozer improvements.

Hope someone with a bulldozer cpu can do some benchmarks of 720p and 1080p encodes with the v2085 and v2106 builds to compare the speed.

anandtech did. (http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/7)

Dark Shikari
23rd October 2011, 07:04
anandtech did. (http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/7)No they didn't. They did a review with an extremely old build of x264, and a new one, but didn't list the actual revision numbers, and claimed the only difference was that the new one was "AVX-enabled". In reality, AVX doesn't help on Bulldozer because Bulldozer already has move-elimination (except indirectly by saving code size, which helps marginally): the difference was because the versions they used were years apart.

They're incompetent at best, and I wouldn't trust their other benchmarks either.

CruNcher
23rd October 2011, 09:28
Yep many of their tests are flawed in terms of Multimedia best example see their GPU encoding test where they make conclusions based on different vendor frameworks :(

when the first x264 benchmarks came into public it was clear a bulldozer system isn't as efficient but i wouldn't have thought its this of a heavy system difference like these 3 screen let you conclude from the Power Consumption behavior ?

http://images.anandtech.com/graphs/graph4955/41715.png
http://images.anandtech.com/graphs/graph4955/41697.png
http://images.anandtech.com/graphs/graph4955/41717.png


PS: AMD released their first Bulldozer benchmarks (aside of IDF running some blocks down @ a Press conference called Fussion Zone http://nl.hardware.info/nieuws/24619/eerste-officiele-benchmarks-amd-fx-processor ) and surprise surprise they used X264 in form of Handbrake for Benchmark winning it by 19% with 4 modules vs 4 cores , and surprise surprise Anandtech was not reporting about this but kept straight focused onto IDF and Ivy Bridge .
Though you can most probably guess that this win is bought @ almost the full 125W scale (though nothing really was revealed about the benchmark itself just the result) which everyone knows is not hard to win vs Intels more efficient 95W output driving your max Power to the EDGE.
At least it seems they reach almost Intels Efficiency now and don't hang 2 Generations behind anymore which already is a good sign they are back in a fully recovered state (after all the Fusion investment including ATI)

And i guess you can currently take only Annandtechs test as reference here i saw no others comparing power consumption on x264 vs Intel Sandy Bridge :(
unfortunately there is no Power Consumption result of the newer AVX Build for both (fail again) :(

@Dark Shikari
Is it really like this somehow it's hard to believe or are you able to improve this further down (Consumption) or eh up (Speed) for Bulldozer ?

aegisofrime
24th October 2011, 05:36
Hopefully somebody competent can do a proper benchmark then. I'm kinda bored with my i7-2600K. :p

Also I wish somebody would do a CRF benchmark.

noee
24th October 2011, 12:40
^^
I hacked up this batch runner (http://www.techarp.com/showarticle.aspx?artno=520) to use the new x64 10bit and changed it to CRF22 medium and a 4100 did about 10-12% better than a 3.5Ghz Propus box I have. Not really a benchmark, fwiw, but a quick look.

Atak_Snajpera
24th October 2011, 16:02
Hopefully somebody competent can do a proper benchmark then. I'm kinda bored with my i7-2600K.

Also I wish somebody would do a CRF benchmark.

Convince them to use my benchmark ;)
http://i.imgur.com/lq6Bi.gif

Download
http://www.mediafire.com/?mcic8cupimcy4fm

mandarinka
24th October 2011, 17:41
^^
I hacked up this batch runner (http://www.techarp.com/showarticle.aspx?artno=520) to use the new x64 10bit and changed it to CRF22 medium and a 4100 did about 10-12% better than a 3.5Ghz Propus box I have. Not really a benchmark, fwiw, but a quick look.

offtopic, but could you perhaps benchmark the performance of nnedi3 (or other significant avisynth filters - nnedi3 is well multithreaded though, so a good test)?

noee
24th October 2011, 19:42
offtopic, but could you perhaps benchmark the performance of nnedi3 (or other significant avisynth filters - nnedi3 is well multithreaded though, so a good test)?

It's not my machine, but the dude helping out is willing to test anything except 64bit AVISynth. ;)

It's on AMDZone.com, under the "Fusion, etc." forum board (http://www.amdzone.com/phpbb3/viewtopic.php?f=532&t=138886)....If you wanted to post a script or offer a particular example.

I can help make it work with the batch runner, if needed.

aegisofrime
25th October 2011, 08:41
Convince them to use my benchmark ;)
http://i.imgur.com/lq6Bi.gif

Download
http://www.mediafire.com/?mcic8cupimcy4fm

For some reason your benchmark recognizes my i7-2600K as a 8C/16T CPU :p

Midzuki
25th October 2011, 16:41
BTW, animated GIFs are pesky and annoying. :mad: :p :D

jpsdr
26th October 2011, 08:27
My I7@980 was recognized as 16C/32T CPU. Result was 25.1.

epitaxial
27th October 2011, 19:23
Indeed. All of the playout servers I work with use some form of "long" GOP MPEG-2. XDCAM HD422, generic MPEG-2, Omneon etc... it's all 8 bit. Of course, it gets played out over (10 bit) HD-SDI, so I would imagine the extra bits simply become padding? Assuming this, a downstream broadcast encoder would simply ignore the top bits? Things could get more complex with graphics devices between the playout server and the broadcast encoder, since these could possibly produce true 10 bit video...

I imagine some facilities playout 10 bit (DNxHD maybe, or AVC Intra), but I doubt most would have the budget for the storage this would entail...

Derek

Yes, servers are one of the places where 10 bit sources become 8 bit and for the reason you note. Most production (live) is actually 10 bit and if this is delivered (backhauled) using MPEG2 then - 8 bit however J2K and AVC 10 bit solutions are becoming prevalent so the 10 bits are carried through. And most devices in a broadcast plant now process 10 bit.

When 10 bit is converted (brute force) to 8 bit the LSBs are chopped off.

As for the availability of 10 bit source material - my Decklink Extreme HD capture card does a very nice job of capturing raw 10 bit from cameras, TSGs, etc... so there's plenty available.

Greg

rallymax
28th October 2011, 19:31
For AVC 10 Intra do you have any preferred settings?
I like the idea of standardizing on i444 10bit as a digital intermediate permanent storage type.
thx

hajj_3
28th October 2011, 21:35
@Dark Shikari: I don't know if this is news for you or not but there are some ways to get 10% more out of bulldozer cpu's by changing some scheduling things in win7, see this info: http://techreport.com/articles.x/21865

Maybe x264 already does this I don't know, thought i'd let you know anyway.

mandarinka
28th October 2011, 23:27
It probably won't help if all the cores are already utilized.
Maybe for 1st pass, IF the scheduler is clever enough to shut some idle module or two to get headroom for turbo and put the critical lookahead thread onto the turbo-boosted module while fully dedicating it just to the 1 thread. Lots of ifs...
Whether I am right here or not, one thing is certain: don'T expect any miracles anything significant there.

kieranrk
29th October 2011, 02:33
For AVC 10 Intra do you have any preferred settings?
I like the idea of standardizing on i444 10bit as a digital intermediate permanent storage type.
thx

Well Panasonic are pushing 12-bit 4:4:4 as their new format. I had a long discussion about it with the people at IBC.

MetalPhreak
1st November 2011, 15:49
Is there any ETA on when we can expect the 10-bit color shift bug to be resolved?

mandarinka
1st November 2011, 16:12
You can at least use patched builds in the meantime: http://x264.fushizen.eu/

Dark Shikari
1st November 2011, 18:00
Is there any ETA on when we can expect the 10-bit color shift bug to be resolved?When kemuri9 finishes the range patch.

rallymax
1st November 2011, 18:20
How about the Google Summer of Code MVC extensions? Me want some x264 3D! :)