View Full Version : x265 HEVC Encoder
Motenai Yoda
25th March 2016, 10:16
Some time ago, I tried using --dither when the output was 10bit. The encoder didn't complain but the output was garbage.
Then I checked the docs and it says:
Only applicable when the input bit depth is larger than 8bits and internal bit depth is 8bits.
Maybe there should be a sanity check?
from what I've seen for anything but 8 bit the dither isn't applied, but it discard some bit, ie from 16 to 10 looks like it discard first 6 (or 8) bits
LigH
25th March 2016, 11:49
@ J1Man:
10 bit precision of internal parameters in the HEVC stream, after a transformation to the frequency domain, are not related to 10 bit precision of raw RGB or YUV video "pixels"/components.
MeteorRain
26th March 2016, 01:51
I did some visual comparisons but I could not decide which method is the best, quality wise.
Even if there is any difference, it will be so small that your monitor cannot even show.
I won't be worry about that at all.
pingfr
26th March 2016, 15:03
I'm in the process of building a x265 dedicated encoding/testbox but still hesitating between two architectures, at the moment I'm leaning towards the quad socket but could use more input again from you guys, so here goes:
- On one side we have a dual socketed system, we'll call it "duo" for the sake of the argument. It is built on a high-end consumer/enthusiast motherboard ASUS Z10PE-D16 WS which benefits from latest generation DDR4, the processors are two latest generation Xeon E5-2699v4 CPUs (retail, not ES) which Intel is about to release to the public in a few days; for the sake of comparison, they are 22 cores, 44 threads each, they run at 2.3GHz stock speed with a Turbo speed of 3.6GHz and of course they feature the AVX2 instruction set.
So in a dual socket system, properly configured, properly cooled, etc, we are looking at a total of 44 cores for 88 threads.
- On the opposite side we have a quad socketed system, we'll call it "quad" for the sake of the argument. It uses a true professional server-grade board Supermicro X10QBL which unfortunately is limited to DDR3, the processors are four Xeon E7-4890v2 CPUs (they are not retail, they are Engineering Samples) they were introduced Q1 2014 to the public; for the sake of comparison, each CPUs are sporting 15 cores, 30 threads, they run at 2.8GHz stock speed (higher than the CPUs from the first machine) with a Turbo Speed of 3.4GHz (lower than the CPUs from the first machine), as they are older CPUs they lack latest instruction sets such as AVX2 but still feature the AVX one.
So in the present configuration in a quad socket system, we are looking at a grand total of 60 cores for no less than 120 threads.
The other last and minor drawbacks aside the lack of AVX2 instruction set is the "limited" 32 PCI lanes coupled with a restricting operating system requirement forcing the switch to either Linux or Windows Server 2012 (or 2016 beta) as my good ole' Windows 7 can't see quad socketed configurations.
So there you have it:
- A newer dual socketed system with all the nice and latest features from the fourth upcoming generation of Xeon E5's for a total of 44 cores/88 threads.
Versus:
- A slightly older quad socketed system from two generations ago which trades off the AVX2/40~32 PCI lanes for much more cores/threads with a total of 60 cores/120 threads.
With all that being said, my only interest being faster x265 encoding, which route should I take?
Go for 4-way E7/AVX 60c/120t from 2 generations ago which still "packs a punch" or rather aim for modern 2-way E5/AVX2 44c/88t from this upcoming generation which seems just "meh, whatever" to me?
Thanks.
Edit: Added two relevant links but they should be taken with a pinch of salt as it's E5 v3, I have access to E5 v4 processors in my case.
http://www.spec.org/cpu2006/results/res2015q1/cpu2006-20150209-34979.html
http://www.spec.org/cpu2006/results/res2014q1/cpu2006-20140210-28464.html
MeteorRain
28th March 2016, 00:02
pingfr: I think you should first set up the goal. Are you looking for an ultimate box regardless of the price? Or are you looking to have more performance per certain amount of money?
The fastest way to do the encoding is to have a cluster of servers, and dispatch tasks to be executed in parallel.
x265_Project
28th March 2016, 05:43
I'm in the process of building a x265 dedicated encoding/testbox but still hesitating between two architectures, at the moment I'm leaning towards the quad socket but could use more input again from you guys, so here goes:
- On one side we have a dual socketed system, we'll call it "duo" for the sake of the argument. It is built on a high-end consumer/enthusiast motherboard ASUS Z10PE-D16 WS which benefits from latest generation DDR4, the processors are two latest generation Xeon E5-2699v4 CPUs (retail, not ES) which Intel is about to release to the public in a few days; for the sake of comparison, they are 22 cores, 44 threads each, they run at 2.3GHz stock speed with a Turbo speed of 3.6GHz and of course they feature the AVX2 instruction set.
So in a dual socket system, properly configured, properly cooled, etc, we are looking at a total of 44 cores for 88 threads.
- On the opposite side we have a quad socketed system, we'll call it "quad" for the sake of the argument. It uses a true professional server-grade board Supermicro X10QBL which unfortunately is limited to DDR3, the processors are four Xeon E7-4890v2 CPUs (they are not retail, they are Engineering Samples) they were introduced Q1 2014 to the public; for the sake of comparison, each CPUs are sporting 15 cores, 30 threads, they run at 2.8GHz stock speed (higher than the CPUs from the first machine) with a Turbo Speed of 3.4GHz (lower than the CPUs from the first machine), as they are older CPUs they lack latest instruction sets such as AVX2 but still feature the AVX one.
So in the present configuration in a quad socket system, we are looking at a grand total of 60 cores for no less than 120 threads.
The other last and minor drawbacks aside the lack of AVX2 instruction set is the "limited" 32 PCI lanes coupled with a restricting operating system requirement forcing the switch to either Linux or Windows Server 2012 (or 2016 beta) as my good ole' Windows 7 can't see quad socketed configurations.
So there you have it:
- A newer dual socketed system with all the nice and latest features from the fourth upcoming generation of Xeon E5's for a total of 44 cores/88 threads.
Versus:
- A slightly older quad socketed system from two generations ago which trades off the AVX2/40~32 PCI lanes for much more cores/threads with a total of 60 cores/120 threads.
With all that being said, my only interest being faster x265 encoding, which route should I take?
Go for 4-way E7/AVX 60c/120t from 2 generations ago which still "packs a punch" or rather aim for modern 2-way E5/AVX2 44c/88t from this upcoming generation which seems just "meh, whatever" to me?
Thanks.
Edit: Added two relevant links but they should be taken with a pinch of salt as it's E5 v3, I have access to E5 v4 processors in my case.
http://www.spec.org/cpu2006/results/res2015q1/cpu2006-20150209-34979.html
http://www.spec.org/cpu2006/results/res2014q1/cpu2006-20140210-28464.html
60 Ivy Bridge Xeon cores running at 2.8 GHz has more x265 compute power than 44 Broadwell Xeon cores running at 2.3 GHz. Forget about turbo clock speeds... if you're running x265 you will hit the thermal limits of your chips, and there will be no turbo boost.
Again, one x265 instance can't keep that many threads working efficiently, even with all of the parallelism that we've implemented. You will need to run multiple instances in parallel to fully exploit either of these many-core servers. This could either be multiple videos, or multiple chunks of the same video.
zioneed
28th March 2016, 09:31
Hi all,
I am encoding a lot a of TV series, shrinking from BD.
1080p to 720p.
So far the quality is pretty good with below settings, via MEGui; what I'd need is some hints if these are the best settings for preserving as much quality as possible.
Clearly it's a matter of compromises.
Const. Quality, preset medium.
additional commands :program --crf 17.0 --limit-refs=3 --no-sao --no-deblock --qcomp=0.9 --early-skip --qg-size=32 --psy-rd=2 --rdpenalty=2 --limit-modes --vbv-maxrate=5000 --vbv-bufsize=5000 --rdoq-level=2 --tu-intra-depth=3 --output "output" "input"
These settings are giving good results in terms of speed VS quality
but have a couple of doubts
1) What is the relationship between qcomp and bitrate?
2) QG Size is set correctly? Spedifically the original BD videos are a bit blocky (Heroes tv series) and my undesranding is that deblock option activated (together with AQ mode) would make the results a bit blurry.
So please can anyone help and give some hints about what could be set differently or if there's any problem with my preset?
Many thanks in advance!!!
Motenai Yoda
28th March 2016, 17:04
Const. Quality, preset medium.
additional commands :program --crf 17.0 --limit-refs=3 --no-sao --no-deblock --qcomp=0.9 --early-skip --qg-size=32 --psy-rd=2 --rdpenalty=2 --limit-modes --vbv-maxrate=5000 --vbv-bufsize=5000 --rdoq-level=2 --tu-intra-depth=3 --output "output" "input"
1) What is the relationship between qcomp and bitrate?
2) QG Size is set correctly? Spedifically the original BD videos are a bit blocky (Heroes tv series) and my undesranding is that deblock option activated (together with AQ mode) would make the results a bit blurry.
I would set deblock to at least -2:-1 or -1:-1, disabling it isn't a good idea, and qg-size to 16, also why vbv to 5Mbps? maybe increase ref to 6 and rdpenality to 1 if you need it
1- qcomp "adjust" how variable the bitrate would be, 0 = costant bitrate, 1 = costant quantizer, usually the .6 default value is good, if you have a mid/high complex video with few low complex scenes setting it to .5 will help to bump those low complex scenes quality, otherwise, with a mid/low complex and few high complex scenes raise it up to .7 or .8
2- maybe you should deblock first, like with deblock_qed
zioneed
28th March 2016, 17:47
I would set deblock to at least -2:-1 or -1:-1, disabling it isn't a good idea, and qg-size to 16, also why vbv to 5Mbps? maybe increase ref to 6 and rdpenality to 1 if you need it
1- qcomp "adjust" how variable the bitrate would be, 0 = costant bitrate, 1 = costant quantizer, usually the .6 default value is good, if you have a mid/high complex video with few low complex scenes setting it to .5 will help to bump those low complex scenes quality, otherwise, with a mid/low complex and few high complex scenes raise it up to .7 or .8
2- maybe you should deblock first, like with deblock_qed
Many thanks for the reply!!!
Sorry but questions that might seem dumb, but HEVC is still a bit of a bet for me, and trying to deal with some tuning. The goal is keep a "good" image quality and avoid blocky images. Keeping sharpness as much as possible at the same time.
Been reading a lot lately on specific topics, but I'm still confused about some options :)
Back to your kind reply:
1) will try a sample with deblock -1-1
2) qcomp: not clear how it works, sorry. Assumed that usually scenes are quite complex and full of action which value works best? (in theory, I know there's no definitive answer on that). The higher the better for action scenes? Default is 0.6 if I'm not mistaken, hence raising to 0.8/9 would be any good?
3) will modify qgsize as per your suggestion.
4) "also why vbv to 5Mbps? maybe increase ref to 6 and rdpenality to 1 if you need it": can you please explain this? I have tried different settings for bitrate and between 4 and 5 k is giving best result with my settings. by "ref" you mean "limit-refs"?
Thanks a lot again, the technical discussions and topics are always exhaustive and clear, but it's hard to get a hold on interactions between different setting :)
Btw, what do you mean by "deblock_qed"?
LigH
28th March 2016, 20:45
Btw, what do you mean by "deblock_qed"?
http://avisynth.nl/index.php/Deblock_QED (discussion (http://forum.doom9.org/showthread.php?t=154777))
Motenai Yoda
28th March 2016, 20:56
2- qcomp when you raise or decrease bitrate (or crf) it will be "spreaded" to all the video, if you have only few high complex scenes most of the bits go to the others "yet-good-quality" scenes, higher qcomp allow the codec to assign more bit on the hot ones, viceversa with few low complex scenes the codec will assign too few bits and you have to lower it to "spread" better the bits.
4- vbv didn't controll bitrate, it restrict the operating space of the codec, 5Mbps is a low value for vbv, usless you are targeting to a streaming purpose you can raise them to 10Mbps or even more (level 3.1 main tier, the lowest for 720p, require 10Mbps, 4.0 (1080p) tier main 12Mbps and high 30Mbps)
If you are aiming to a specific bitrate mode you should go with 2 pass abr ratecontrol not the crf one.
ref are for reference frames, default 3
deblock_qed is an avisynth function wrote by Didée, and further modified by others, which aim to adaptively reduce blockness artifacts
description and dependecies http://avisynth.nl/index.php/Deblock_QED
lastest version http://forum.doom9.org/showpost.php?p=1697386&postcount=13
jlpsvk
28th March 2016, 21:16
why is x265 about 50% slower with --pmode than without it? on 6 core i7-3930K?
Motenai Yoda
29th March 2016, 01:24
@jlpsvk read the docs http://x265.readthedocs.org/en/default/cli.html#cmdoption--pmode
pingfr
29th March 2016, 09:31
pingfr: I think you should first set up the goal. Are you looking for an ultimate box regardless of the price? Or are you looking to have more performance per certain amount of money?
Alrighty, here it goes, here's a bit of lecture for you:
I actually never took the proper time to introduce myself but, I work as a support staff/library clerk for a major city in France (read: we're in charge of roughly 1 million citizen).
In my department, we are handling responsabilities ranging from managing a book loans system to managing DVDs and Blu-Rays loans and rentals to our citizens.
As the time passed over the years, we realized that with our 7 days turn-over system, we, the library staffers came to the conclusion we need at least 3 copies on every single movie we ever have owned/on our catalog at any given time.
Now if you're a video enthusiast like me, you are aware that owning every Blu-Ray movie ever released, is nearly impossible, we're talking about roughly 5000 physical discs, multiplied by 3x, based on our 7 days rotation "loan and must return within 7 days" system, that's a total of no less than 15000 discs.
Now with that set aside, both our local and nation-wide broadband penetration statistics have shown that over the years, it is now possible to have fiber at home with unlimited download caps for say, less than 25$/month.
So we came to the conclusion, as we were running out of physical space (read: proper office space) we should probably "digitize" all our DVD and Blu-Ray contents to simple files and maybe broadcast them over the internet instead of loaning the physical the discs back and forth between users.
Now, as you've probably guessed by my posts history, I'm not really a fully highly tech savvy person and I don't know all the details regarding the streaming bit/part (not our department), however as the source discs are in my department, I have been tasked to "digitize" all our content and render them into nearly "acceptable" quality digital content.
I at first went the x264 route but I've been more or less been told by our other IT departments, based on their calculations, x264 encoded files with an average bitrate of 4k for 720p contents and 8k for 1080p would have a huge impact on the network's performance and bandwidth use (huge costs for us).
It was stated at that this point, the trade ratio for bandwidth expenses (not even counting the hardware costs and the encoding expenses/time spent:electricity bill) wasn't worth the effort.
But at the same time, I've been told that if I could find a way to retain the same type of subjective visual quality with a lower bitrate, read; lower storage and bandwidth costs altogether, that it would be "green lighted" as a project.
Keeping in mind, that this is taxpayer money we're talking about here, as such: every single dollar/euro spent has to be properly justified and allocated.
In the very end, this is naturally how x265 came into play; I am constantly looking for ways to reduce our encoding file sizes while retaining "upper" quality and saving as much as we can on the bandwidth costs/fees.
Now that was for the professional side of things.
Now for the personal bit, I'm also a very geeky person and always am interested in newer technologies... I remember when the very first MP3 format came out shortly followed by the original DivX 3.11, back in the days, everyone around me was going bat-shit crazy! but again I disgress. ;)
So now on the personal side of things, I'm also interested in x265 encoding, but that's just a "side-project" or a "hobby" as I can put it.
Over the past months, I've been saving personal funds here and there, shaving rough edges on my personal budget and have saved roughly $3k USD, my intent is to be on the lookout for the cheapest, refurbished, second-hand hardware, Engineering Sample deals on eBay and such sites and pretty much build the "best and fastest" encoding box for me, my own personal toybox at home based on a restricted $3k budget.
Of course the machine wouldn't just be a pure dedicated x265 box, it has to be my own personal computer at home, but you get the idea, build something that is strong enough to encode things "mildly rapidly" at the cheapest cost ratio that would fit in my budget which is again roughly 3k $USD.
So there you have it; grabbing CPUs from eBays, motherboard and other parts from shady retailers, DDR3 ECC memory sticks from retired or "defunct" servers from work, cook it all together and hope for the best.
Sorry if that was lengthy, but yeah, at least you guys know me a bit better now, understand what is my approach and are aware what kind of machine I'm building here.
As always, any feedback (even negative as long as it brings something to the table) is appreciated. :)
The fastest way to do the encoding is to have a cluster of servers, and dispatch tasks to be executed in parallel.
As it stands, best machines I can get remote access to are 4x Dell R930 from work from the IT department, but I've been warned they can be accessed at odd times (mostly night time), whenever they aren't in heavy use by the departments they belong to and they have extra spare ressources. I don't even know if they are part of a cluster or they're standalone machines, but requesting a "privilege" access to them shouldn't be much of an issue.
As of today, so far, all I have access to is a i7 5960X at work to do remote encoding and an i7 6700k at home, my own personnal machine... and let's just say this; for everyday usage it's fine, but when it comes to x265 (with the quality parameters littlepox kindly cooked us), it just doesn't cut it, at all, so at least on the personal side of things, I'm just throwing more hardware ressources at it (read: money), hoping it will "fix it", but then again at the very end of the day, keep in mind it's just a (costly) hobby. :)
pingfr
29th March 2016, 09:47
60 Ivy Bridge Xeon cores running at 2.8 GHz has more x265 compute power than 44 Broadwell Xeon cores running at 2.3 GHz.
Hey Tom thanks for your input!
So if I get you right, regardless 60 Ivy Bridge Xeon cores at 2.8GHz *without* AVX2, BMI, LZCNT instruction sets still provide more x265 compute power than 44 Broadwell Xeon cores at 2.3GHz even if they do have AVX2, BMI, LZCNT to compensate for the lower core/thread count?
Forget about turbo clock speeds... if you're running x265 you will hit the thermal limits of your chips, and there will be no turbo boost.
This one machine I'm building is a personnal system for my own use at home, do you think agressive watercooling could help with lowering the chips temp and therefore retain turbo boost active or is that a "helpless cause" to you?
You will need to run multiple instances in parallel to fully exploit either of these many-core servers. This could either be multiple videos, or multiple chunks of the same video.
Regarding a multiple chunks encoding, is there a more or less "official" and practical solution/kit from MulticoreWare or do we still have to rely on programs such as Ripbot264 etc?
Thanks for your feedback.
nevcairiel
29th March 2016, 09:50
This one machine I'm building is a personnal system for my own use at home, do you think agressive watercooling could help with lowering the chips temp and therefore retain turbo boost active or is that a "helpless cause" to you?
You don't even need water cooling for this, a good air cooler can keep CPUs in full turbo at all times as well - of course such huge air coolers don't find their way into server chassis, since they are too big.
pingfr
29th March 2016, 10:09
You don't even need water cooling for this, a good air cooler can keep CPUs in full turbo at all times as well - of course such huge air coolers don't find their way into server chassis, since they are too big.
It looks like I'm gonna have to "hack" a case chassis anyways, this one is for my own use at home sitting under my desk, if I'm effectively going for a quad E7 v2, the only reliable motherboard I found is from Supermicro, X10QBL and it would seem the board itself is using a proprietary format (16.79" x 16.4" (42.6cm x 41.7cm)).
http://www.supermicro.com/products/motherboard/Xeon/C600/X10QBL.cfm
Not sure what kind of case will fit this baby, but it doesn't look seem most standard ATX cases would do the trick.
So yeah, chassis "hacking", case air cooler "hacking" and water cooling "hacking".
Sounds like my kind of fun. :)
nevcairiel
29th March 2016, 10:12
For a quad setup water cooling is probably best indeed, as 4 decent air coolers would have trouble fitting on there!
pingfr
29th March 2016, 10:17
For a quad setup water cooling is probably best indeed, as 4 decent air coolers would have trouble fitting on there!
Thanks nevcairiel. :D
zioneed
29th March 2016, 12:47
2- qcomp when you raise or decrease bitrate (or crf) it will be "spreaded" to all the video, if you have only few high complex scenes most of the bits go to the others "yet-good-quality" scenes, higher qcomp allow the codec to assign more bit on the hot ones, viceversa with few low complex scenes the codec will assign too few bits and you have to lower it to "spread" better the bits.
4- vbv didn't controll bitrate, it restrict the operating space of the codec, 5Mbps is a low value for vbv, usless you are targeting to a streaming purpose you can raise them to 10Mbps or even more (level 3.1 main tier, the lowest for 720p, require 10Mbps, 4.0 (1080p) tier main 12Mbps and high 30Mbps)
If you are aiming to a specific bitrate mode you should go with 2 pass abr ratecontrol not the crf one.
ref are for reference frames, default 3
deblock_qed is an avisynth function wrote by Didée, and further modified by others, which aim to adaptively reduce blockness artifacts
description and dependecies http://avisynth.nl/index.php/Deblock_QED
lastest version http://forum.doom9.org/showpost.php?p=1697386&postcount=13
Thanks, was just checking avisynth but have no clue how to use it, will pass :)
More than bitrate I'm interested in spreading it properly across different scenes, tryng to find out best settings to encode properly bot action scenes and slow scenes. Size really doesn't matter much since I'm more into finding a good compromise between quality and encoding speed.
If I might ask one last question (since have read quite a lot and still have no final answer on this):
what's the role of this part?
I have set up " --psy-rd=1 --psy-rdoq 1.10 --rdpenalty=2 --rdoq-level 2". This according to different posts read here. The results are quite good in terms of quality but wondering if there's anything in these options which might fight.
Thanks again!
jlpsvk
30th March 2016, 01:21
this is x264:
http://s12.postimg.org/4r368zrl9/SW_x264.png
and this x265:
http://s8.postimg.org/sldo3qj51/SW_x265.png
No matter what I do, the transition on the banner (from red to black - highlighted) is smoother on x264. On x265 you the the squares after the close-up. What do you advice?
My settings:
--crf 22 --preset slow --output-depth 10 --tu-intra-depth 2 --rdoq-level 1 --qg-size 16 --ipratio 1.3 --pbratio 1.2 --max-merge 2 --rc-lookahead 60 --ref 3 --min-keyint 23 --keyint 240 --colorprim bt709 --colormatrix bt709 --transfer bt709 --deblock -3:-3 --psy-rdoq 5
MeteorRain
30th March 2016, 01:24
Keeping in mind, that this is taxpayer money we're talking about here, as such: every single dollar/euro spent has to be properly justified and allocated.
So you have 2 options for the maximum performance per money,
1) i7-4790 + B85 mobo + 8GB RAM + PSU
2) 2xE5-2670 (Used) + C602 mobo + Some ECC RAM (Used) + Good PSU
First choice is about $450 without HDD, monitor and computer case.
Second is about $600 without these craps.
So if you have, let's say, 5 bluray to encode, distribute them to 5 different servers is going to maximize the speed. Given $3000, running 5 i7s will be faster than a single E7, let alone E7 being much more expensive.
The whole point is that, you'll be paying a lot more for the high density (the amount of space on the rack).
pingfr
30th March 2016, 07:47
So you have 2 options for the maximum performance per money,
1) i7-4790 + B85 mobo + 8GB RAM + PSU
2) 2xE5-2670 (Used) + C602 mobo + Some ECC RAM (Used) + Good PSU
First choice is about $450 without HDD, monitor and computer case.
Second is about $600 without these craps.
So if you have, let's say, 5 bluray to encode, distribute them to 5 different servers is going to maximize the speed. Given $3000, running 5 i7s will be faster than a single E7, let alone E7 being much more expensive.
The whole point is that, you'll be paying a lot more for the high density (the amount of space on the rack).
I think you misread it real fast.
Regarding my interest in regards to x265 in general, there are two things;
Professionally, like, at my work, as my job, it's impossible for me to bring-in "fleaky hardware" from dubitous sources without proper invoices, warranties or without a corporate contract directly from HP, Dell or any other major retailers because it's actually a budget that has to be allocated and voted in city hall's court. Which implies, what hardware I will eventually get my hands on depends on things that are out of my control.
Now what I have complete control over is what I do with my personnal funds, at home, as a leisure/hobby and what I have at the moment is a 3k budget to use as I see it at my own will and I have all reasons to believe, my best option at home is a Quad Xeon E7 v2 since I'm acquiring parts for reaaaaaaallly cheap compared to the official fares.
But then again if someone has an idea of "what performs better than what for cheaper than whatever else grabbed from over here rather than over there", please let me know. I'm all ears. ;)
kypec
30th March 2016, 12:17
Professionally, like, at my work, as my job, it's impossible for me to bring-in "fleaky hardware" from dubitous sources without proper invoices, warranties or without a corporate contract directly from HP, Dell or any other major retailers because it's actually a budget that has to be allocated and voted in city hall's court. Which implies, what hardware I will eventually get my hands on depends on things that are out of my control.
Maybe I'll go a bit off topic here but did you also consider HEVC licensing fees (http://forum.doom9.org/showthread.php?t=172387) apart from technological (better compression ratio) aspects of choosing HEVC over AVC? I don't want to spoil your fun and really hope that your project of digital movie library ends up successfully but I think you should engage lawyers as well into the process before spending too much time & electricity on encoding the content... ;)
MeteorRain
30th March 2016, 17:14
I think you misread it real fast.
Regarding my interest in regards to x265 in general, there are two things;
Professionally, like, at my work, as my job, it's impossible for me to bring-in "fleaky hardware" from dubitous sources without proper invoices, warranties or without a corporate contract directly from HP, Dell or any other major retailers because it's actually a budget that has to be allocated and voted in city hall's court. Which implies, what hardware I will eventually get my hands on depends on things that are out of my control.
Now what I have complete control over is what I do with my personnal funds, at home, as a leisure/hobby and what I have at the moment is a 3k budget to use as I see it at my own will and I have all reasons to believe, my best option at home is a Quad Xeon E7 v2 since I'm acquiring parts for reaaaaaaallly cheap compared to the official fares.
But then again if someone has an idea of "what performs better than what for cheaper than whatever else grabbed from over here rather than over there", please let me know. I'm all ears. ;)
Sorry for misreading your post.
For the professional part, unless you are running out of physical space, I'd not recommend using rack servers as they tend to be more expensive and less cost-effective. Buying multiple dell or hp i7s business level computer could be a good use of money, compared to those E5s (2x+ expensive with almost same performance).
For the hobby part, if you can get dirty cheap E7 then go ahead. The E5 that I mentioned was also a quite cheap option for you. Just take care of the noise control and power consumption.
For performance wise, google search "passmark <whatever cpu here>" and you can compare the score easily.
pingfr
30th March 2016, 18:43
Sorry for misreading your post.
No problems, no harm, I know you were trying to be helpful. :)
For the hobby part, if you can get dirty cheap E7 then go ahead.
400$ per CPU, not sure if that's "dirt cheap" but we're talking about E7 Xeons here at the fastest/highest frequency ones from that generation.
For performance wise, google search "passmark <whatever cpu here>" and you can compare the score easily.
Passmark tends to be "unreliable" specially if you're looking at multi sockets benchs, on the contrary I believe SPEC benchs are more reliable.
Regardless, thanks a lot for your input. :)
MeteorRain
30th March 2016, 20:26
400$ per CPU, not sure if that's "dirt cheap" but we're talking about E7 Xeons here at the fastest/highest frequency ones from that generation.
That's cheap enough. Usually you can only get an i7 with that money.
Regardless, thanks a lot for your input. :)
You're welcome. Hope it helps ;)
pingfr
30th March 2016, 20:38
That's cheap enough. Usually you can only get an i7 with that money.
It's really a bargain, otherwise I wouldn't even bother to begin with.
It seems the best CPU power route from the worst to the best is:
E5-2699 v3 x2 -> E7-4890 v2 x4 = E7-4850 v3 x4 -> E5-4669 v3 x4 -> E7-8895 v3 x8 -> Nothing yet? maybe some high-end E5 or E7 v4 around Q3-Q4 2016 -> E5 or E7 v5 with AVX512 H2-2017.
You're welcome. Hope it helps ;)
Anything helps. :)
MeteorRain
30th March 2016, 20:50
It seems the best CPU power route from the worst to the best is:
I actually don't have much idea on how much performance can you get with one of these. I'd personally go with cluster solution where it fits me better.
foxyshadis
31st March 2016, 00:53
Another datapoint you might be interested in: VMWare, both ESXi and Workstation, gives you native encoding speed. Benchmarks of KVM and Xen show the same. Virtualbox you take a hit, unfortunately, but other virtual solutions allow 99%+ native speed in x264 and x265. (One of the benefits of heavy number crunching is that you don't need to call into the kernel or do I/O much.) You could use that to set up a test cluster for working out how a real cluster would perform at work, and get experience in setting it up.
I know Multicoreware has demonstrated a cluster, but I don't know how polished vs spit and duct tape it is, since they don't seem to offer it for sale. Aside from that, the only one I know of is lancoder (https://github.com/jdupl/lancoder), though there's x264farm (http://forum.doom9.org/showthread.php?t=117889) & ELDER (https://forum.doom9.org/showthread.php?t=102119) that might be modified to call x265 instead of x264, and Media Encoding Cluster (http://www.codergrid.de) that uses ffmpeg. None of those have been touched since 2009-2010 though, still a bit of a gap for someone who wants to come in and fill it.
Maybe you can ask Netflix if they're willing to share part of their solution? You never know.
2themax
31st March 2016, 17:30
Experimental UHD Bluray support (--uhd-bd) in x265 1.9+106-c8ec86965e54 (https://www.mediafire.com/download/b11kgzti9w4gglu/x265_1.9+106-c8ec86965e54.7z); is there anyone who can test and confirm?
This is very close to working. The only error I am getting is that GOPs are longer than they are specified to be.
sneaker_ger
31st March 2016, 19:43
x265 command-line? x265 log? Exact error message of your authoring software? (What software?)
Vesdaris
31st March 2016, 19:47
Forget about turbo clock speeds... if you're running x265 you will hit the thermal limits of your chips, and there will be no turbo boost.
Why would he hit thermal limits having a proper cooling?
I can run my Ivy bridge 3770k @4.5 on air using a pretty high for these CPUs voltage (1.33V under 100% load -> unfortunately I haven't won the silicon lottery) and I still don't hit thermal limits. Ever.
foxyshadis
31st March 2016, 21:39
Why would he hit thermal limits having a proper cooling?
I can run my Ivy bridge 3770k @4.5 on air using a pretty high for these CPUs voltage (1.33V under 100% load -> unfortunately I haven't won the silicon lottery) and I still don't hit thermal limits. Ever.
I think the initial assumption was that it would be a pizza box (rack mount), which are much, much harder to cool that a standalone server, because there just isn't enough room. You don't see too many quad sockets that aren't racked these days. Only a heavy-duty watercooling can keep those running at full-tilt, and even that's not easy in the confines of the rack. Since it'll be standalone, though, it'll be loud but easily capable of being cooled.
x265_Project
31st March 2016, 22:04
Why would he hit thermal limits having a proper cooling?
I can run my Ivy bridge 3770k @4.5 on air using a pretty high for these CPUs voltage (1.33V under 100% load -> unfortunately I haven't won the silicon lottery) and I still don't hit thermal limits. Ever.
I'm guessing that you hit thermal throttling all the time, but you aren't aware of it. Thermal management is all done automatically. You would have to use an advanced performance profiling tool like Intel's Performance Counter Monitor to log the clock speed in small increments.
Your Core i7-3770K has a TDP of 77 watts, or roughly 19 watts per core. Many-core Xeons have higher TDPs, but much lower TDP per core (hence, the slower clock speeds). Fully saturating them with compute-intensive applications will heat them up good, and you are very unlikely to have enough thermal headroom to allow turbo mode to be engaged.
Vesdaris
1st April 2016, 09:07
I'm guessing that you hit thermal throttling all the time, but you aren't aware of it.
You are guessing wrong.:) I'm always running a monitoring tool(hwinfo). I always stay below my CPU's thermal limits and I have turbo clock at it's max (clocks & performance).
pingfr
1st April 2016, 10:16
No fighting here boys! Keep it cool. (You see what I did there?). :)
MeteorRain
1st April 2016, 20:12
You are guessing wrong.:) I'm always running a monitoring tool(hwinfo). I always stay below my CPU's thermal limits and I have turbo clock at it's max (clocks & performance).
I guess the TDP is the maximum generated thermal, not the maximum evacuated thermal. So in most case if you max out your CPU, it will (regardless of cooler) limit itself to a certain point.
I know that some mobo can crack it a bit to have turbo boost running at all time. In this case the only requirement would be a decent sink and cooler.
benwaggoner
1st April 2016, 20:31
Passmark tends to be "unreliable" specially if you're looking at multi sockets benchs, on the contrary I believe SPEC benchs are more reliable.
Regardless, thanks a lot for your input. :)
Also, x265 makes heavy use of AVX2 instructions. So perf-per-clock gets a lot better with a chip that uses that. I wouldn't suggest buying anything pre-AVX for use with x265.
benwaggoner
1st April 2016, 20:34
I guess the TDP is the maximum generated thermal, not the maximum evacuated thermal. So in most case if you max out your CPU, it will (regardless of cooler) limit itself to a certain point.
I know that some mobo can crack it a bit to have turbo boost running at all time. In this case the only requirement would be a decent sink and cooler.
And we also know that lots of sustained multicore AVX2 operations (heavily used to good effect by x265) can also trigger thermal throttling, apparently irrespective to the level of cooling in action. I know MCW has been looking at this.
Even with that, we see big throughput increases (>50%) on c4.8xlarge instances versus c3.8xlarge instances for x265 encoding with recent builds.
C3.8xlarge: E5-2680v2 (Ivy Bridge), logical CPUs=32
C4.8xlarge: E5-2666v3 (Haswell), logical CPUs=36
While there are a few more cores, it's mainly going from v2 to v3.
x265_Project
2nd April 2016, 21:16
You are guessing wrong.:) I'm always running a monitoring tool(hwinfo). I always stay below my CPU's thermal limits and I have turbo clock at it's max (clocks & performance).
I certainly could be wrong, but I'm not sure that HWInfo would be the definitive way to know if there is any thermal throttling going on. I know you can trust Intel's Performance Counter Monitor to tell you what's really going on. We've built support for PCM into x265, so if you can get a build you can run it alongside x265.
2themax
4th April 2016, 22:08
x265 command-line? x265 log? Exact error message of your authoring software? (What software?)
Here's the command-line:
x265-64bit-10bit-2016-03-31.exe --preset veryslow --output "outpt.hevc" --input "input.yuv" --input-res 3840x2160 --input-csp i420 --fps 24 --profile main10 --level-idc 51 --high-tier --open-gop --keyint 24 --min-keyint 1 --bitrate 75000 --vbv-maxrate 95000 --vbv-bufsize 100000 --pass 1 --colorprim bt709 --transfer bt709 --colormatrix bt709 --hrd --aud --sar 1:1 --output-depth 10 --uhd-bd --repeat-headers --no-scenecut --rc-lookahead 20
And here's the error being thrown by the Panasonic UHD verifier:
The number of frames displayed in a GOP exceeds the maximum number defined in Table 9-97.
Actual number of frames in a GOP= 27
frame_rate_value = 24
Maximum number of frames displayed in a GOP = 24
LigH
5th April 2016, 07:21
So despite using the option --keyint 24 the verifier found a longer GOP?
MasterNobody
5th April 2016, 08:00
May be it is confusion if GOP length should be in presentation order or decode order. For x264 (H.264) at least there was difference because default is presentation order and --bluray-compat case needed decode order (bluray order). This difference in GOP length exist only for open GOP.
x265_Project
5th April 2016, 16:20
And here's the error being thrown by the Panasonic UHD verifier:
The number of frames displayed in a GOP exceeds the maximum number defined in Table 9-97.
Actual number of frames in a GOP= 27
frame_rate_value = 24
Maximum number of frames displayed in a GOP = 24
This is great feedback. Please email me so that we can connect directly to debug this issue. I just sent you a private message with my email address. Thanks, Tom
2themax
5th April 2016, 18:11
So despite using the option --keyint 24 the verifier found a longer GOP?
Yes. I tried adding --no-scenecut and --rc-lookahead 20 thinking it might change the encode but it did not.
x265_Project
5th April 2016, 20:05
Yes. I tried adding --no-scenecut and --rc-lookahead 20 thinking it might change the encode but it did not.
Thanks for connecting via email. Our engineers will take a look at this today, and we'll follow up here and if needed (for sharing confidential files or information) privately.
I suspect that --open-gop may be an issue, but again, we'll see what our team thinks. We may need a frame-by-frame log file (--csv) to see the exact GOP structure and references you produced for this file.
nandaku2
6th April 2016, 17:06
Thanks for connecting via email. Our engineers will take a look at this today, and we'll follow up here and if needed (for sharing confidential files or information) privately.
I suspect that --open-gop may be an issue, but again, we'll see what our team thinks. We may need a frame-by-frame log file (--csv) to see the exact GOP structure and references you produced for this file.
Yes, open-gop is the issue. The reason the Panasonic UHD Verifier shows 27 frames is because 3 frames after the I frame refer to frames in the previous GOP. Can you test with --no-open-gop?
The UHD-BD documentation was not specific about enforcing closed GOPs.
sneaker_ger
6th April 2016, 17:09
OpenGOP is allowed according to the white paper on the BluRay website. Chapter 3.2 lists GOP restrictions.
http://www.blu-raydisc.com/assets/Downloadablefile/BD-ROM_Part3_V3.0_WhitePaper_150724.pdf
zerowalker
6th April 2016, 21:25
Anyone know how to set up Matrix, Transfer, Range etc on x265 in ffmpeg?
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.