Log in

View Full Version : x265 (@ffmpeg) faster than ultrafast settings (and qp if possible) - any ideas?


pandy
16th August 2016, 08:26
I need to encode H.265 and bitrate is not a problem (i have around 40Mbps available also quality is not relevant) - video is extremely simple to encode - lot of flat areas (more than 95% is solid gray + very large frame counter).
I care only about speed and additional limitation - this is 3840x2160 50 fps and compatibility with HW decoders is a plus (that's why i would like to use 'uhd-bd=1' but this imply Main 10@L5.1@High Tier).

level=5.1:no-high-tier=0:crf=20:vbv-bufsize=40000:vbv-maxrate=40000:keyint=2:min_keyint=1:bframes=0:b-adapt=0:no-b-intra=1:rd=1:no-psy-rd=1:rc-lookahead=0:rdoq-level=1:merange=4:subme=0:ctu=32:max-tu-size=16:no-sao=1:fast-intra=1:no-deblock=1:no-weightp=1:no-weightb=1:no-scenecut=1:no-cutree=1:frame-threads=0:uhd-bd=1:no-open-gop=1:repeat-headers=1:no_psnr=1:no_ssim=1:colorprim=bt709:transfer=bt709:colormatrix=bt709"


I would prefer to keep 'keyint=1' however then x265 enable Intra profile which may introduce incompatibility.

Thank You in advance for all suggestions - what else can be disabled to speed up encoding?

burfadel
16th August 2016, 09:22
If you speed up encoding efficiency considerably drops. Maybe you should consider x264 instead? If you reduce the settings too far you completely defeat the purpose of using x265, and your settings are already there. You are probably slowing things down with such a low keyint. Also it seems kinda strange encoding to that resolution if you're only going to 'ruin' it with such low encode settings.

pandy
16th August 2016, 10:36
If you speed up encoding efficiency considerably drops. Maybe you should consider x264 instead? If you reduce the settings too far you completely defeat the purpose of using x265, and your settings are already there. You are probably slowing things down with such a low keyint. Also it seems kinda strange encoding to that resolution if you're only going to 'ruin' it with such low encode settings.

I already have x264 settings and they are already able to run with real time speed - problem is x265 - it is for functional testing not for video quality as such plenty of bitrate, would like to have qp as it is fastest way to encode but x265 complain that qp is not acceptable when level specified (can't guarantee bitrate) - in my case qp/crf is/will be set to point where bitrate will not go over level limitation...

Purpose of this exercise is simple (and this is not about efficiency) - force HW decoder to decode and display H.265 with particular GOP structure (GOP=1, Intra only).
I can encode real time with QSV or with NVenc - my goal is to use CPU, on x264 (libx264) this is feasible - any chance for x265 (libx265)?

benwaggoner
17th August 2016, 14:30
I already have x264 settings and they are already able to run with real time speed - problem is x265 - it is for functional testing not for video quality as such plenty of bitrate, would like to have qp as it is fastest way to encode but x265 complain that qp is not acceptable when level specified (can't guarantee bitrate) - in my case qp/crf is/will be set to point where bitrate will not go over level limitation...
x264 will always be faster in its fastest mode, since it can use CAVLC instead of CABAC.

To get around the QP issue, maybe try "--allow-non-conformance"? Or specify vbv-maxrate/vbv-bufsize? Although that might increase encoder complexity a bit.

Purpose of this exercise is simple (and this is not about efficiency) - force HW decoder to decode and display H.265 with particular GOP structure (GOP=1, Intra only).
I can encode real time with QSV or with NVenc - my goal is to use CPU, on x264 (libx264) this is feasible - any chance for x265 (libx265)?
Intra-only should definitely be faster, but you wind up with bigger file sizes and hence more CABAC. The higher the QP, the faster the encode. Since HEVC supports intraframe prediction, reducing motion search size could help. Lookahead-threads may be too high for this many pixels/second (try 5?). Maybe "--rd 1" since ultrafast has a default of 2? 8-bit is going to be faster than 10-bit.

This certainly should be doable on a very fast dual-socket Xeon with AVX2. Precise tuning will depend on your hardware.

pandy
23rd August 2016, 10:46
Thx Ben. At some point i'm limited to be strict on conformance (intention is to use such stream with real video HW) - i know that requirements are quite strange for most forum readers but they are justified by particular functionality i'm trying to achieve so there is a sense in this nonsense ;) .
Anyway i was able to speed up encoding more than twice (from 6 fps to over 13fps) but still not close enough to real time (at least not on Dell T5500 i have for such experiments).
Strangely my findings are a bit opposite to most of scientific sources (typical claim is that increasing CTU size will reduce decoding load i.e. increase decoder speed but encoding speed will be increased where i'm interested to increase decoder load and reduce encoder load but libx265 act contradictory increasing CTU to max 64 improved encoding speed by at least 30 - 50%).
Looks like x265 behave not exactly as it is predicted by those papers. I've noticed also that for particular combinations some strange crashes can be observed - for today i blame memory issue in my T5500 for this (only 6GB installed).
Additional conclusion is that libx265 will not support QP mode and at the same time bitrate limit where x264 is able to offer such combinations without any issue (i.e. increase QP when bitrate violate level limitation).
Additional observation is that for Intra (keyint=1) libx265 starting special Intra profile (so keyint=1 is recognized and triggering different profile than expected - this forced me to use keyint=2 i.e. I and P frames - not desired from functionality perspective but still acceptable for latency).
Activating BDUHD compliance (assumption is that it will improve overall HW decoding compatibility - same as on x264) will lead to High Tier and 10 bit encoding - it is not possible to encode in Main Tier and 8 bit at the same time - not sure if this is really BD UHD requirement or not due lack of access to the BD UHD spec.

To conclude - it will be extremely difficult to achieve real time encoding on average workstation for H.265 on libx265... observations shows that even for low complexity encoding x265 is not close to well optimized x264 - i don't expect any significant speed improvements nearby...
Seem for today HW H.265 encoding even low quality is still only one way to achieve some real time.

Bellow fastest settings i was able to find, hope someone will find this useful somehow - appreciate any feedback, improvements are welcomed.

x265params="level=5.1:high-tier=1:uhd-bd=1:crf=26:vbv-bufsize=24000:vbv-maxrate=48000:vbv-init=0.75:keyint=2:min_keyint=1:bframes=0:b-adapt=0:b-intra=0:wpp=1:no-pmode=0:no-pme=0:rd=2:psy-rd=0:rc-lookahead=1:no-rd-refine=1:rdoq-level=0:rdpenalty=0:me=dia:merange=16:subme=0:ctu=64:max-tu-size=32:min-cu-size=32:no-sao=1:fast-intra=1:no-deblock=1:no-weightp=1:no-weightb=1:no-scenecut=1:no-strong-intra-smoothing=1:no-cutree=1:frame-threads=4:open-gop=0:repeat-headers=1:info=1:hrd=1:aud=1:no-psnr=1:no-ssim=1:sar=1:videoformat=component:range=limited:colorprim=bt709:transfer=bt709:colormatrix=bt709:no-signhide=1:intra-refresh=0:log-level=none"

benwaggoner
23rd August 2016, 21:45
Yeah, it is extremely difficult on an average workstation. It takes quite high-end gear.

Are you turning off wpp? That disables VBV controls today.

For UHD sizes, there is a whole lot to parallelize, so the bigger BTU probably doesn't matter.

CRF+VBV does work.

--allow-non-conformance + Main Profile might let you do Main and keyint=1. That said, I think any Main decoder should decode Main Intra just fine AFAIK.

Can you share your complete best command line?


Sent from my iPad using Tapatalk

pandy
24th August 2016, 16:07
Yeah, it is extremely difficult on an average workstation. It takes quite high-end gear.


Well 12 physical cores even on a bit outdated machine should be visible...


Are you turning off wpp? That disables VBV controls today.


It is explicitly enabled 'wpp=1'


CRF+VBV does work.


More or less it works but... i had small hope that QP will be even faster than crf due lack of "thinking" on encoder side however x265 provided this surprising message about inability to use qp and level at the same time (well - i see no problem to check bitrate and temporarily increase qp when bitrate violate level limit but i'm not x265 developer).


--allow-non-conformance + Main Profile might let you do Main and keyint=1. That said, I think any Main decoder should decode Main Intra just fine AFAIK.


And my goal was fully conformance as such i can't take a risk to explicitly set non conformance encoding mode.
I've found Main Intra listed only in some H.265 documents i decided to not take risk and adding P is acceptable compromise.


Can you share your complete best command line?


Not a problem:


@set x265params="level=5.1:high-tier=1:uhd-bd=1:crf=%crfval%:vbv-bufsize=24000:vbv-maxrate=48000:vbv-init=0.75:keyint=2:min_keyint=1:bframes=0:b-adapt=0:b-intra=0:wpp=1:no-pmode=0:no-pme=0:rd=2:psy-rd=0:rc-lookahead=1:no-rd-refine=1:rdoq-level=0:rdpenalty=0:me=dia:merange=16:subme=0:ctu=64:max-tu-size=32:min-cu-size=32:no-sao=1:fast-intra=1:no-deblock=1:no-weightp=1:no-weightb=1:no-scenecut=1:no-strong-intra-smoothing=1:no-cutree=1:frame-threads=4:open-gop=0:repeat-headers=1:info=1:hrd=1:aud=1:no-psnr=1:no-ssim=1:sar=1:videoformat=component:range=limited:colorprim=bt709:transfer=bt709:colormatrix=bt709:no-signhide=1:intra-refresh=0:log-level=none"
@ffmpeg.exe -re -threads %cput% -y -hide_banner -loglevel 8 -stats -f lavfi -i %video% -intra -flags:v +cgop+low_delay -preset:v ultrafast -c:v libx265 -x265-params %x265params% -an -dn -sn -f mpegts %mpegts% -y udp://232.0.35.3:5000?pkt_size=1316


After all this i need to remap ultrafast to x265opts as perhaps i've missed something (without ultrafast preset encoding is slower by 30 - 40%) - perhaps i need to review x265 source code to find opts mapping.

Will try to return to this topic in few weeks (need to change memory modules - seem there is some issue - crashes hard to explain).