Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
21st February 2019, 21:32 | #6741 | Link | |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
|
Quote:
x265 has a TON of features where it can take input from a first pass and then refine it. Some of those don't require the stream be made with x265, and a few work with H.264 sources IIRC. |
|
22nd February 2019, 09:57 | #6742 | Link |
Herr
Join Date: Apr 2009
Location: North Europe
Posts: 556
|
I just wanted to say that I did a little x265 speed-test, one compile vs another,
x265-3.0_Au+7-cb3e172_vs2017-AVX2 (msystem) vs x265-v3.0_Au+7-cb3e172a5f51-SVT-win64 [ICC 1900][MSVC 1916 Multilib][SVT][64 bit]. I encoded a 44 second long cartoon animation, 00096.m2ts, with this setting: x265.exe --crf 18 --preset veryslow --output-depth 10 --rdoq-level 0 --psy-rdoq 0 --aq-mode 1 --aq-strength 0.4 --qcomp 0.65 --bframes 16 --rc-lookahead 48 --ref 6 --min-keyint 24 --keyint 240 --frame-threads 1 --colormatrix bt709 --deblock -2:-2 --no-sao --psy-rd 0.4 --tskip --tskip-fast --tu-inter 4 --tu-intra 4 --frames 1066 x265-3.0_Au+7-cb3e172_vs2017-AVX2 (msystem) Duration: 00:53:41 x265-v3.0_Au+7-cb3e172a5f51-SVT-win64 [ICC 1900][MSVC 1916 Multilib][SVT][64 bit] Duration: 00:53:32 Not a big difference in speed, considering I have a Intel Core i5-5200U CPU (I thought that the ICC 1900-compile would be much faster). EDIT: By "much faster", I meant much faster than this encode was, I meant like 10% faster than the non-ICC compile. Last edited by Forteen88; 25th February 2019 at 11:17. Reason: clarification |
22nd February 2019, 15:29 | #6744 | Link | ||
Registered User
Join Date: Sep 2018
Posts: 12
|
Quote:
assembly needs to be disabled for x86 high bit, it does not compile when enabled. Quote:
but i see differences in the x86 builds. but honestly, not many people will use those anyway. Last edited by poller; 22nd February 2019 at 15:32. |
||
22nd February 2019, 17:51 | #6745 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,781
|
One more build to compare, with two variants:
x265 3.0_Au+7-cb3e172a5f51 MABS compiled with media-autobuild_suite only (EXE only, no DLL) x265 3.0_Au+7-cb3e172a5f51 compiled with custom build scripts to obtain libx265.dll too, running in interactive MinGW32 / MinGW64 shells |
22nd February 2019, 22:05 | #6746 | Link |
Registered User
Join Date: Sep 2018
Posts: 12
|
nice, some small test:
x265_3.0_RC+14-46b84ff665fd 20.5 seconds Code:
cpuid=1049583 / frame-threads=3 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / qp-adaptation-range=1.00 20.5 seconds Code:
cpuid=1049583 / frame-threads=3 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / qp-adaptation-range=1.00 23.3 seconds Code:
cpuid=1049583 / frame-threads=3 / numa-pools=8 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / qp-adaptation-range=1.00 22.6 seconds Code:
cpuid=1049583 / frame-threads=3 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / qp-adaptation-range=1.00 the MABS build has some additional setting (numa-pools=8) but that did not affect the performance. this was tested on a i7-3770k |
22nd February 2019, 23:51 | #6747 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,781
|
What you may not find here are default GNU C/C++ compiler options.
Please note that MABS scripts may set up some specific CFLAGS and CXXFLAGS (e.g. O2 or O3?). The interactive MinGW consoles should not ... so GCC / G++ defaults may apply. Except for the 32-bit build where I explicitly set CXXFLAGS with pretty generic options suitable for 32-bit code on any AMD64 capable CPU, minimally (see above). I have no clue what I may do "right". |
23rd February 2019, 01:17 | #6750 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Since everyone was concerned about x64 platforms and nobody used x86, I tested it on a real x86 platform running Windows Server 2003 x86 with PAE and 16 GB of RAM.
The CPU is an old, dusty Intel Xeon 4c/8th running at 2.60GHz with instruction sets up to SSE4.2: 4/N.A) - x265 3.0_Au+7 - MABS compiled by LigH with media-autobuild_suite only (EXE only, no DLL) It didn't even start. It refused to start due to missing kernel calls: GetNumaNodeProcessorMaskEx, InitializeConditionVariable, SetThreadGroupAffinity, SleepConditionVariableCS, WakeAllConditionVariable No luck on Windows Server 2003, so it won't run on XP and its derivatives either. 3) - x265 3.0_Au+7 - compiled by LigH with custom build scripts to obtain libx265.dll too, running in interactive MinGW32 / MinGW64 shells 3.7fps/3.9fps 2) - x265 3.0_Au+7 - compiled with GCC9 (Preview) target SSE4.2 4.2fps/4.3fps 1) - x265 3.0_Au+7 - compiled with GCC8 target SSE4.2 4.7fps/4.8fps Very basic low-complex Command line: x265.exe --y4m - --dither --preset medium --level 5.0 --tune fastdecode --no-high-tier --ref 2 --rc-lookahead 3 -b 2 --profile main10 --bitrate 25000 --deblock -4:-4 --min-luma 64 --max-luma 940 --chromaloc 2 --range limited --videoformat component --colorprim bt709 --transfer bt709 --colormatrix bt709 --overscan show --no-open-gop --min-keyint 1 --keyint 24 --repeat-headers --rd 3 --vbv-maxrate 25000 --vbv-bufsize 25000 --asm=sse4.2 --wpp -o "\\VBOXSVR\Share_Windows_Linux\raw_video.hevc" Lossless 16bit SD (UHD SDR downscaled) footage. Anyway, I don't think the comparison is fair, 'cause LigH targeted pentium4, which means only SSE2 are supported. In other words, I'm comparing SSE4.2 vs SSE2 and it's pretty clear that SSE4.2 have an advantage over SSE2. As to GCC9, it seems that they changed something in the way -mtune behaves or maybe they changed something else; anyway, it produces an SSE4.2 build slower than the GCC8 SSE4.2 one. It would be interesting to find out how ICC targeting SSE4.2 behaves on old Intel x86 systems (if Intel Parallel Studio can produce a Windows Server 2003 compatible binary). Last edited by FranceBB; 27th February 2019 at 05:36. |
24th February 2019, 11:09 | #6751 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,731
|
Has anyone else noticed how the bitrate shown during the encoding phase and the final bitrate differ from each other quite a lot sometimes? Yesterday I happened to be watching an 70000-frame encode finish and at the last frames, the average bitrate was ~6800 kbps. When the encode finished, the final bitrate was suddenly over 7100 kbps.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
24th February 2019, 12:42 | #6752 | Link |
Helenium(Easter)
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
|
@FranceBB I made a x86 test binary targeting SSE4.2 (only 8 bit).
You can test if it works or not, I also compiled ffmpeg with libvmaf, and you can test that as well.
__________________
Monochrome Anomaly Last edited by Wolfberry; 27th February 2019 at 09:48. |
25th February 2019, 04:28 | #6753 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Quote:
Dependency Walker didn't find any issue with the installer, but when I try to run it it says that's not a valid x86 application. Are you sure that you targeted SSE4.2? Out of curiosity, can you try with 4.1 and one without any assembly optimisation? The CPU is fully capable of handling SSE4.2, so I don't understand. The OS is Windows Server 2003 x86 with PAE Enabled and 16GB of RAM. |
|
25th February 2019, 06:12 | #6754 | Link |
...?
Join Date: Nov 2005
Location: Florida
Posts: 1,420
|
It's not weird when you consider that Windows Server 2003 is subject to the same problem that Windows XP faces: the error is almost definitely coming from x265 expecting modern Windows APIs (read: NUMA). The 'not a valid x86 application' error doesn't occur when the assembly is wrong (in that case it would crash and throw a SIGILL), it occurs when you try to run 64-bit programs on 32-bit OSes (which in this case it's not; the executable and dll are standard 32-bit PE32) or other similar OS incompatibilities.
https://bitbucket.org/multicoreware/...eLists.txt-443 |
26th February 2019, 09:31 | #6755 | Link |
Registered User
Join Date: Feb 2007
Posts: 128
|
Hm.
Could it be that zones are broken in AU+7? StaxRip crashes away for me with zones defined. AU+3 still worked. (I'm using the VS 2019 Preview 3 AVX2 AU+7 build from http://msystem.waw.pl/x265/) |
26th February 2019, 11:20 | #6756 | Link | |
Registered User
Join Date: Feb 2015
Posts: 326
|
Quote:
Proposed fix is (it is only technical change, I totally don't understand x265_copy_params function): Code:
diff -r cb3e172a5f51 source/common/param.cpp --- a/source/common/param.cpp Tue Feb 19 20:20:35 2019 +0530 +++ b/source/common/param.cpp Tue Feb 26 16:16:16 2019 +0100 @@ -2240,16 +2240,7 @@ dst->rc.zoneCount = src->rc.zoneCount; dst->rc.zonefileCount = src->rc.zonefileCount; - if (src->rc.zones) - { - dst->rc.zones->startFrame = src->rc.zones->startFrame; - dst->rc.zones->endFrame = src->rc.zones->endFrame; - dst->rc.zones->bForceQp = src->rc.zones->bForceQp; - dst->rc.zones->qp = src->rc.zones->qp; - dst->rc.zones->bitrateFactor = src->rc.zones->bitrateFactor; - } - else - dst->rc.zones = NULL; + dst->rc.zones = src->rc.zones; if (src->rc.lambdaFileName) dst->rc.lambdaFileName = strdup(src->rc.lambdaFileName); else dst->rc.lambdaFileName = NULL; Last edited by Ma; 26th February 2019 at 16:28. |
|
26th February 2019, 13:48 | #6757 | Link |
Registered User
Join Date: Feb 2007
Posts: 128
|
Thx Ma... I'll try your version out as soon as I'm back from work (in 6 hours).
AU+7 crashed for me for two videos when I tried it out this morning just by setting --zones 1,100,b=1.3 No crash any more after I replaced the exe with an older AU+3 version. |
26th February 2019, 18:08 | #6759 | Link |
Registered User
Join Date: Feb 2015
Posts: 326
|
In x265 version 3.0_Au+5 there is new function x265_copy_params which I don't know what problem solves -- for sure it create new problems (zones for example).
My question is: if we remove x265_copy_params function and go back to old code (memcpy) is there something wrong? I've attached patch that revert changes with x265_copy_params function (maybe instead of patching this function it is better to remove it). |
26th February 2019, 18:25 | #6760 | Link | |
Registered User
Join Date: Jan 2007
Posts: 729
|
Quote:
|
|
Thread Tools | Search this Thread |
Display Modes | |
|
|