Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 14th August 2018, 09:37   #6281  |  Link
Ma
Registered User
 
Join Date: Feb 2015
Posts: 325
Quote:
Originally Posted by Magik Mark View Post
Same problem ma
Thanks for info!

Did you check ver. 2.8+48 (form test.7z in post #6260)?
Ma is offline   Reply With Quote
Old 16th August 2018, 20:44   #6282  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,150
Ryzen Threadripper 2990wx uses 4 NUMA nodes and I would like to check if running 4 instances with manually adjusted --numa-pools could improve performance.
Can somebody verify if those are correct switches?

Code:
Instance 1 = --numa-pools "+,-,-,-" 
Instance 2 = --numa-pools "-,+,-,-"
Instance 3 = --numa-pools "-,-,+,-"
Instance 4 = --numa-pools "-,-,-,+"
Without any adjustments 5 instances give this

2990wx@3.4GHz(all core turbo) is only 20% faster than 1950@3.4GHz

Last edited by Atak_Snajpera; 16th August 2018 at 20:51.
Atak_Snajpera is offline   Reply With Quote
Old 17th August 2018, 00:14   #6283  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,419
Quote:
Originally Posted by Atak_Snajpera View Post
Ryzen Threadripper 2990wx uses 4 NUMA nodes and I would like to check if running 4 instances with manually adjusted --numa-pools could improve performance.
Can somebody verify if those are correct switches?

Code:
Instance 1 = --numa-pools "+,-,-,-" 
Instance 2 = --numa-pools "-,+,-,-"
Instance 3 = --numa-pools "-,-,+,-"
Instance 4 = --numa-pools "-,-,-,+"
Without any adjustments 5 instances give this

2990wx@3.4GHz(all core turbo) is only 20% faster than 1950@3.4GHz
well 5 instance just became too low for 1080p source ...

32C/64T for 5 instance for 1080p is more than 6C/12T for each 1080p instance. Unfortunaly, x265 have threading problem at 8 thread (and more) for 1080p source.

If you want really saturate 64 thread CPU, you must use at least 8 instance for 1080p source or at least 2 instance for 2160p source. And perhaps that 8x 1080p instance will saturate RAM with particular CCX connexion (even with quad DDR4 channel).
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 17th August 2018 at 00:23.
Sagittaire is offline   Reply With Quote
Old 17th August 2018, 11:17   #6284  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,150
No it is not too low. Dual socket (2 NUMA) Intel Xeon E5-4660 v3 (56 threads total) still scales much better than single socket (4 NUMA) 2990WX.
It would probably scale even better if I set numa pools manually.

According to x265 documentation ( https://x265.readthedocs.io/en/default/threading.html )
Quote:
If you are running multiple encoders on a system with multiple NUMA nodes, it is recommended to isolate each of them to a single node in order to avoid the NUMA overhead of remote memory access.
Can somebody verify than I'm setting numa pools correctly in my previous post?

Last edited by Atak_Snajpera; 17th August 2018 at 11:22.
Atak_Snajpera is offline   Reply With Quote
Old 17th August 2018, 12:58   #6285  |  Link
zub35
Registered User
 
Join Date: Oct 2016
Posts: 56
x264 has a good optimization option
--tune film [--deblock -1:-1 --psy-rd <unset>:0.15]

why not have the same for x265 ?
--tune film [--no-sao --no-strong-intra-smoothing --psy-rd 4]

Last edited by zub35; 17th August 2018 at 13:00.
zub35 is offline   Reply With Quote
Old 17th August 2018, 15:09   #6286  |  Link
RieGo
Registered User
 
Join Date: Nov 2009
Posts: 57
Quote:
Originally Posted by zub35 View Post
x264 has a good optimization option
--tune film [--deblock -1:-1 --psy-rd <unset>:0.15]

why not have the same for x265 ?
--tune film [--no-sao --no-strong-intra-smoothing --psy-rd 4]
afaik a film preset is on the todo list... probably may take a while till they/we figure out all sane parameters.

now... why does everyone think it's a good idea to switch off Sample Adaptive Offset in-loop filter? i read about it and it sounds like a nice feature to improve efficiency - no matter what kind of video content is encoded.
i understand that there was supposely a little problem in the early stages of x265 with sao integration. but is this still a thing or is everybody just blindly turning off sao?
RieGo is offline   Reply With Quote
Old 17th August 2018, 15:36   #6287  |  Link
froggy1
ffx264/ffhevc author
 
froggy1's Avatar
 
Join Date: May 2007
Location: Belgium
Posts: 1,550
Quote:
Originally Posted by RieGo View Post
afaik a film preset is on the todo list... probably may take a while till they/we figure out all sane parameters.

now... why does everyone think it's a good idea to switch off Sample Adaptive Offset in-loop filter? i read about it and it sounds like a nice feature to improve efficiency - no matter what kind of video content is encoded.
i understand that there was supposely a little problem in the early stages of x265 with sao integration. but is this still a thing or is everybody just blindly turning off sao?
SAO still blurs too much so many people disable it if they want to retain as much details as possible. However, at very low bitrates where other artifacts are more visible/present, the blur of SAO produces "better looking" images than an encode without it
__________________
ffx264--ffhevc--ffxvid
froggy1 is offline   Reply With Quote
Old 17th August 2018, 17:51   #6288  |  Link
RieGo
Registered User
 
Join Date: Nov 2009
Posts: 57
Quote:
Originally Posted by froggy1 View Post
SAO still blurs too much so many people disable it if they want to retain as much details as possible. However, at very low bitrates where other artifacts are more visible/present, the blur of SAO produces "better looking" images than an encode without it
thanks.
i did some visual comparisons lately but wasn't able to detect any kind of differences at high bitrate - i didn't look at still images, only at video scenes.
at very low bitrate (300kbit/s) there was a lot of quality differences with different parameters, but I didn't look at no-sao...

so probably i'm just a bad quality judge.
RieGo is offline   Reply With Quote
Old 17th August 2018, 18:58   #6289  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,964
Quote:
Originally Posted by RieGo View Post
thanks.
i did some visual comparisons lately but wasn't able to detect any kind of differences at high bitrate - i didn't look at still images, only at video scenes.
at very low bitrate (300kbit/s) there was a lot of quality differences with different parameters, but I didn't look at no-sao...

so probably i'm just a bad quality judge.
SAO should do less as QP goes down, so what you see is how it should work.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 18th August 2018, 17:56   #6290  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,699
Quote:
Originally Posted by Atak_Snajpera View Post
Can somebody verify if those are correct switches?
Code:
Instance 1 = --numa-pools "+,-,-,-" 
Instance 2 = --numa-pools "-,+,-,-"
Instance 3 = --numa-pools "-,-,+,-"
Instance 4 = --numa-pools "-,-,-,+"
Quote:
Originally Posted by Atak_Snajpera View Post
Can somebody verify than I'm setting numa pools correctly in my previous post?
Please, don't expect answers regarding AMD optimizations in this thread.

They are all Intel fans or worse fanboys.

Even the developers.
__________________
Win 10 x64 (18362.449) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 18th August 2018, 18:28   #6291  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,940
In general, generalizations are wrong. If I could afford a new PC, I would buy a Ryzen. But I could still not buy the insight in its NUMA structure.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 18th August 2018, 21:32   #6292  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 668
Quote:
Originally Posted by Sagittaire View Post
If you want really saturate 64 thread CPU, you must use at least 2 instance for 2160p source.
Not just 64 thread CPU, at work I have two Intel Xeon E5-2660V4 14c/28th for a total of 28c/56th and I can't still saturate both CPUs with a 2160p 10bit HDR10 content encoded with preset --medium and bluray compatible specs.


Quote:
Originally Posted by NikosD View Post
They are all Intel fans or worse fanboys.
Some consumers are moving to AMD, but the majority of businesses are using Intel Xeon CPUs (my company included), so that's what they ask for optimizations.
They are simply following the market needs, nothing more.
__________________
Broadcast Encoder
Avisynth memes: 1 - 2 - 3
Videotek - Audacity XP
FranceBB is offline   Reply With Quote
Old 20th August 2018, 17:15   #6293  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 334
x265 v2.8+66-88ee12651e30 (32 & 64-bit 8/10/12bit Multilib Windows Binaries)

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default

Last edited by Barough; 20th August 2018 at 17:25.
Barough is offline   Reply With Quote
Old 21st August 2018, 01:49   #6294  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,964
Quote:
Originally Posted by FranceBB View Post
Not just 64 thread CPU, at work I have two Intel Xeon E5-2660V4 14c/28th for a total of 28c/56th and I can't still saturate both CPUs with a 2160p 10bit HDR10 content encoded with preset --medium and bluray compatible specs.
That's not surprising. Something like --preset slower would probably be better, but there's only so much threading that can be usefully done in a single instance, and Blu-ray restrictions reduce even that (b-frames can encode in parallel, but BD only allows 2 consecutive). Increasing -F will help, but high values can cause rate control issues.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 21st August 2018, 03:35   #6295  |  Link
jlpsvk
Registered User
 
Join Date: Dec 2014
Posts: 191
Quote:
Originally Posted by FranceBB View Post
Not just 64 thread CPU, at work I have two Intel Xeon E5-2660V4 14c/28th for a total of 28c/56th and I can't still saturate both CPUs with a 2160p 10bit HDR10 content encoded with preset --medium and bluray compatible specs.
could you post your uhd bd compatible command line?
__________________
Core i9-7960X, 64GB DDR4, RTX 2070, 1TB NVMe SSD, 56TB NAS
jlpsvk is offline   Reply With Quote
Old 21st August 2018, 16:22   #6296  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 115
Quote:
Originally Posted by RieGo View Post
afaik a film preset is on the todo list... probably may take a while till they/we figure out all sane parameters.

now... why does everyone think it's a good idea to switch off Sample Adaptive Offset in-loop filter? i read about it and it sounds like a nice feature to improve efficiency - no matter what kind of video content is encoded.
i understand that there was supposely a little problem in the early stages of x265 with sao integration. but is this still a thing or is everybody just blindly turning off sao?
Using --no-sao for a tune film is imo valid. In my experience no-sao does improve fine detail alot with almost no negative effects for general "film" content with lower crf values. Preset slow together with no-sao is imo enough for detail retention now days. Not sure what setting does it, but I find preset Medium to be way softer then preset slow (imo there should only be a bitrate difference between them when doing a CRF encode, but it doesnt work like that I guess).

I have found sao to be usefull for both animation and low bitrate content though (as expected).

Quote:
Originally Posted by FranceBB View Post
Not just 64 thread CPU, at work I have two Intel Xeon E5-2660V4 14c/28th for a total of 28c/56th and I can't still saturate both CPUs with a 2160p 10bit HDR10 content encoded with preset --medium and bluray compatible specs.
To add to this, I see around 70-80% utilization on dual Xeon E5-2680 v3 (48t) systems for 2160p content using preset slow. Imo that is a very reasonable ammount of multithread performance. For 1080p I wouldnt bother with anything more then 8-12C. Start using chunk-encoding if better multithread utilization is needed.

But I still think Atak question is valid, does 2990wx need any NUMA tweaking to perform correctly?

Last edited by excellentswordfight; 21st August 2018 at 16:42.
excellentswordfight is offline   Reply With Quote
Old 21st August 2018, 17:27   #6297  |  Link
RieGo
Registered User
 
Join Date: Nov 2009
Posts: 57
Quote:
Originally Posted by excellentswordfight View Post
Using --no-sao for a tune film is imo valid. In my experience no-sao does improve fine detail alot with almost no negative effects for general "film" content with lower crf values. Preset slow together with no-sao is imo enough for detail retention now days. Not sure what setting does it, but I find preset Medium to be way softer then preset slow (imo there should only be a bitrate difference between them when doing a CRF encode, but it doesnt work like that I guess).

I have found sao to be usefull for both animation and low bitrate content though (as expected).
thanks for your opinion
i will do some more visual tests with high and low bitrates and only film content. maybe i can finally understand all your motivation to turn off sao.

update: wow.
so I did a quick test @6000/1000/100 kbit/s with and without sao.
so you are 100% right. no-sao looks just much sharper and retains more details. even I can see it...
not just true on high bitrate but also on medium/low bitrate. but I can understand why it would make kinda sense to have a smooth low bitrate encoding.
so basically i'm sorry for not believing, need to change my presets now. lol

Last edited by RieGo; 21st August 2018 at 18:56.
RieGo is offline   Reply With Quote
Old 21st August 2018, 22:51   #6298  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,964
Quote:
Originally Posted by RieGo View Post
update: wow.
so I did a quick test @6000/1000/100 kbit/s with and without sao.
so you are 100% right. no-sao looks just much sharper and retains more details. even I can see it...
not just true on high bitrate but also on medium/low bitrate. but I can understand why it would make kinda sense to have a smooth low bitrate encoding.
so basically i'm sorry for not believing, need to change my presets now. lol
Can you share the bitrates and/or command lines you were using?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 22nd August 2018, 04:01   #6299  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 668
Quote:
Originally Posted by benwaggoner View Post
That's not surprising. Something like --preset slower would probably be better, but there's only so much threading that can be usefully done in a single instance, and Blu-ray restrictions reduce even that (b-frames can encode in parallel, but BD only allows 2 consecutive). Increasing -F will help, but high values can cause rate control issues.
Yes... With slower I might get something more but still, it's still acceptable.

Quote:
Originally Posted by excellentswordfight View Post
I see around 70-80% utilization on dual Xeon E5-2680 v3 (48t) systems for 2160p content using preset slow. Imo that is a very reasonable ammount of multithread performance.
Yes, it kinda is.

Quote:
Originally Posted by jlpsvk View Post
could you post your uhd bd compatible command line?
Sure.

UHD HDR10 BD50:
Code:
x265.exe --y4m - --dither --preset medium --level 5.1 --tune fastdecode --no-high-tier --ref 4 --profile main10 --bitrate 75000 --deblock -1:-1 --hdr-opt --hrd --min-luma 64 --max-luma 940 --chromaloc 2 --range limited --videoformat component --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc --master-display "G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,0.0050)" --max-cll 1000,400 --overscan show --no-open-gop --min-keyint 1 --keyint 24 --repeat-headers --rd 3 --vbv-maxrate 75000 --vbv-bufsize 75000 --asm=avx2  --wpp -o "H:\raw_video.hevc"
I know that some of you might be screaming "Ah!!" to that "brutal" clipping, but the uncompressed 16bit stream that x265 is gonna encode has already been brought in Tv Range with a proper LUT that tries to avoid to clip too much, so that's just for "safety reasons", especially 'cause otherwise QC refuses it, even if there's just a single scene in the video that is out of range. No, they don't stare at the video-scope all the time; such a process is automated by a machine that checks the file 1:1 and reports details about luma, chroma, whether there are freeze-frames, blocking of whatever type and so on. Sometimes it fails and it is spot-checked by a human, but still, they refuse the content if it's out of range.


Still, using --preset slow might help a bit, but I would have to specify parameters myself, especially 'cause I'm not using --uhd-bd and I would end up by limiting myself anyway.
Increasing --ref from 4 to 6 might also help.
As to the 75Mbit/s, the specs require the bitrate to stay below 82Mbit/s, but 75Mbit/s plus a bit of oscillation up and down and audio tracks is gonna be fine.
__________________
Broadcast Encoder
Avisynth memes: 1 - 2 - 3
Videotek - Audacity XP

Last edited by FranceBB; 22nd August 2018 at 04:08.
FranceBB is offline   Reply With Quote
Old 22nd August 2018, 10:30   #6300  |  Link
RieGo
Registered User
 
Join Date: Nov 2009
Posts: 57
Quote:
Originally Posted by benwaggoner View Post
Can you share the bitrates and/or command lines you were using?
nothing fancy, just a simple "--pass x --bitrate 6000 --preset slow --pmode [--no-sao]"
RieGo is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:35.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.