Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 17th August 2017, 18:27   #5541  |  Link
Balthazar2k4
Registered User
 
Join Date: Mar 2009
Location: Here, There, & Everywhere
Posts: 269
Quote:
Originally Posted by Atak_Snajpera View Post
Ok smart ass so explain us why zen architecture sucks so much in x265...
http://www.linleygroup.com/mpr/article.php?id=11666
I am running a 1950x with a 3.9ghz OC across all 16-cores and, frankly, I am impressed. I have to run two encodes simultaneously to saturate the system and using the medium preset with a CRF of 19 on 1080p material I am seeing ~25fps on both encodes. Will the 16C Intel counterpart beat the 1950x? Most likely. That said, the Intel part is $700 more and would therefore expect it to be superior.

This is my first AMD system in 15+ years and I can say unequivocally that I am very happy with it.
Balthazar2k4 is offline   Reply With Quote
Old 18th August 2017, 15:15   #5542  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
x265 2.5+11-d58761d8db4a

supports some new SMPTE-ST/RP/EG colorimetry options and a new split RD skip command* (documented only in full help):

Code:
   --[no-]splitrd-skip           Enable skipping split RD analysis when sum of split CU rdCost larger than none split CU rdCost for Intra CU. Default disabled

   --colorprim <string>          Specify color primaries from undef, bt709, bt470m, bt470bg, smpte170m,
                                 smpte240m, film, bt2020, smpte-st-428, smpte-rp-431, smpte-eg-432. Default undef

   --colormatrix <string>        Specify color matrix setting from undef, bt709, fcc, bt470bg, smpte170m,
                                 smpte240m, GBR, YCgCo, bt2020nc, bt2020c, smpte-st-2085, chroma-nc, chroma-c, ictcp. Default undef
* If I understood the patch comment in the mailing list correctly, it should speed up intra split cost calculation a little while possibly preserving identical output.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 19th August 2017, 03:55   #5543  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
Yes, the splitRD-skip looks interesting, I wouldn't be surprised that if in the future it isn't enabled by default. I guess that comes down to user reports, or maybe they're waiting on the possibility of it being extended to inter-CU?
burfadel is offline   Reply With Quote
Old 20th August 2017, 03:12   #5544  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
Quote:
Originally Posted by NikosD View Post
In case you didn't see it, I put a in my first sentence just to be polite with your tremendous ignorance regarding CPU architectures and ego (those two usually come together)

But now, after your reply, I can't be polite anymore.

Your comments made me laugh like no tomorrow regarding FMACs and x265, so keep on posting your thoughts after reading CPU architecture articles you don't understand.

It's so funny!

Thank you!
If were smart instead of merely a smart-ass, you would know that the FPU (and FMAC) are also responsible for executing integer SIMD instructions. Likewise, you would know that Intel can retire dual 256-bit (512-bit in Skylake) multiply-accumulate of 16-bit integers using the exact same execution ports as floating-point FMA. In fact, why do you think the upcoming Cannonlake will have 52-bit integer FMA instructions (hint)?
Stephen R. Savage is offline   Reply With Quote
Old 20th August 2017, 05:52   #5545  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by Stephen R. Savage View Post
If were smart instead of merely a smart-ass, you would know that the FPU (and FMAC) are also responsible for executing integer SIMD instructions. Likewise, you would know that Intel can retire dual 256-bit (512-bit in Skylake) multiply-accumulate of 16-bit integers using the exact same execution ports as floating-point FMA. In fact, why do you think the upcoming Cannonlake will have 52-bit integer FMA instructions (hint)?
Oh my, oh my (!)

What a smart ass.

What a dump ass

What an asshole.

Port 0 and 1 can dispatch SIMD integer and FMA for floating point, but not all hardware capable of SIMD integer can do SIMD floating point too.

For example port 5 can do SIMD integer but not FMA for floating point in Haswell/Broadwell/Skylake/Kabylake architecture.

Integer FMA is something very new to Intel's CPU architecture and part of AVX-512 instruction set only.

It's called AVX512-IFMA and has 52 bit precision.

Haswell/Broadwell/Skylake/Kabylake do not support AVX-512 and don't support integer FMA of course.

Skylake-X (Skylake-SP core) added an FMA 512 bit unit in Port 5 (10 core and above) but for floating point only, as it supports a limited part of the huge AVX-512 family of instructions set variants, but not AVX512-IFMA.

There are 12 levels of AVX-512 actually.

Cannonlake will be the first mainstream CPU of Intel to support integer FMA.

So, again.

What a smart ass, a dump ass and an asshole.

You should be banned from doom9 for ever.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 21st August 2017, 06:00   #5546  |  Link
littlepox
Registered User
 
Join Date: Nov 2012
Posts: 218
Quote:
Originally Posted by LigH View Post
* If I understood the patch comment in the mailing list correctly, it should speed up intra split cost calculation a little while possibly preserving identical output.
Should that be the case, we are probably going to see this option removed while the skip is integrated in the code very soon.
littlepox is offline   Reply With Quote
Old 21st August 2017, 06:10   #5547  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
Maybe after a similar feature is available for inter-coding, too. This hope was already expressed in the patch discussion of the intra version.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 21st August 2017, 15:29   #5548  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
x265 v2.5+12-fcd9154fa4e2 (GCC 7.2.0, 32 & 64-bit 8/10/12bit Multilib Windows Binaries)

x265 [info]: HEVC encoder version 2.5+12-fcd9154fa4e2
x265 [info]: build info [Windows][GCC 7.2.0][32 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2


Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default
Barough is offline   Reply With Quote
Old 22nd August 2017, 07:52   #5549  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
Side note:

The German c't Magazine reports Ryzen Threadripper (1950X) as the generally fastest desktop CPU at the moment (including advantages of the TR4 platform connecting to PCIx SSD's and graphic cards, plus USB 3.1).

Not so important for x265, specifically. Back to topic.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 22nd August 2017, 13:26   #5550  |  Link
zub35
Registered User
 
Join Date: Oct 2016
Posts: 56
GPU Fork - x265 HEVC OpenCL or CUDA Encoder

gcc>=7 , -O2 -fopenacc

https://bitbucket.org/vovagubin/x265...r-cuda-encoder
zub35 is offline   Reply With Quote
Old 22nd August 2017, 13:32   #5551  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
Interesting... I wonder if it is technically still as powerful as x265 (means, if GPGPU algorithms can still be as complex as CPU algorithms), and also if it is allowed to be still named something-with-x265. Just curious...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 22nd August 2017, 15:07   #5552  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
Quote:
Originally Posted by zub35 View Post
GPU Fork - x265 HEVC OpenCL or CUDA Encoder

gcc>=7 , -O2 -fopenacc

https://bitbucket.org/vovagubin/x265...r-cuda-encoder
OpenCL would make more sense, CUDA is proprietory and NVidia only.
burfadel is offline   Reply With Quote
Old 22nd August 2017, 17:44   #5553  |  Link
JohnLai
Registered User
 
Join Date: Mar 2008
Posts: 448
Quote:
Originally Posted by zub35 View Post
GPU Fork - x265 HEVC OpenCL or CUDA Encoder

gcc>=7 , -O2 -fopenacc

https://bitbucket.org/vovagubin/x265...r-cuda-encoder

Hmmm......process that could be offloaded.....

Motion search + estimation? All big three gpu have some sort of motion search/estimation code available.

Maybe lookahead (didn't work well in x264), but somehow nvidia is able to implement it using CUDA cores in conjunction with its proprietary NVENC. The code ain't available from nvidia....hmmm....but being able to do it fast.....

Maybe SAO? Some form of low complexity SAO like what nvidia used in Pascal?

Or offloading adaptive quantization calculation like that nvidia did?


*Gotta hand it off to Nvidia software engineers.....they are leading.......in GPGPU usage for video acceleration part....with its CUDA....
JohnLai is offline   Reply With Quote
Old 22nd August 2017, 18:12   #5554  |  Link
easyfab
Registered User
 
Join Date: Jan 2002
Posts: 332
or using FEI from intel if it come to HEVC .
I don't know if this could be interesting to mix with x264/X265

from https://github.com/01org/intel-vaapi-driver/issues/228

"The main highlight of FEI is the possibility to split the encoding process into two phases, first is ENC and the second is PAK.ENC is the operation which performs all motion vector calculation and prediction.PAK is doing all transformations and entropy coding. Without having FEI, the whole ENC+PAK is a black box to middleware, but with FEI user can extract the output of ENC and feed PAK with a custom enhanced motion vectors and macroblock prediction modes."
easyfab is offline   Reply With Quote
Old 22nd August 2017, 23:12   #5555  |  Link
x265_Project
Guest
 
Posts: n/a
Quote:
Originally Posted by LigH View Post
Interesting... I wonder ... if it is allowed to be still named something-with-x265.
No. Anyone can fork a GPL software project, but they can't copy a trademark.
  Reply With Quote
Old 22nd August 2017, 23:16   #5556  |  Link
x265_Project
Guest
 
Posts: n/a
Quote:
Originally Posted by easyfab View Post
or using FEI from intel if it come to HEVC .
I don't know if this could be interesting to mix with x264/X265

from https://github.com/01org/intel-vaapi-driver/issues/228

"The main highlight of FEI is the possibility to split the encoding process into two phases, first is ENC and the second is PAK.ENC is the operation which performs all motion vector calculation and prediction.PAK is doing all transformations and entropy coding. Without having FEI, the whole ENC+PAK is a black box to middleware, but with FEI user can extract the output of ENC and feed PAK with a custom enhanced motion vectors and macroblock prediction modes."
Intel's Flexible Encoder Interface isn't the right way to go. They offer lower-level OpenCL libraries that we've looked at to do these functions. Keep in mind that if we replace a whole section of code with a hardware encoder's functionality, we end up with a very different thing.
  Reply With Quote
Old 22nd August 2017, 23:26   #5557  |  Link
x265_Project
Guest
 
Posts: n/a
Quote:
Originally Posted by JohnLai View Post
Hmmm......process that could be offloaded.....

Motion search + estimation? All big three gpu have some sort of motion search/estimation code available.

Maybe lookahead (didn't work well in x264), but somehow nvidia is able to implement it using CUDA cores in conjunction with its proprietary NVENC. The code ain't available from nvidia....hmmm....but being able to do it fast.....

Maybe SAO? Some form of low complexity SAO like what nvidia used in Pascal?

Or offloading adaptive quantization calculation like that nvidia did?


*Gotta hand it off to Nvidia software engineers.....they are leading.......in GPGPU usage for video acceleration part....with its CUDA....
The challenge is speed. If you offload small chunks of work to a GPU, the CPU won't have to do that work, so it can effectively speed up. But if you don't get the result of those tasks back from the GPU before they're needed, you won't accelerate.
GPUs are very good at work that can be highly parallelized, and not good at work that has serial dependencies. Video encoding has many serial dependencies. The block you're encoding right now makes reference to neighboring blocks, or blocks in other frames, all of which must be completely finished encoding before you can efficiently encode the current block.
  Reply With Quote
Old 23rd August 2017, 06:11   #5558  |  Link
x265_Project
Guest
 
Posts: n/a
Be sure to vote for x265 in the Streaming Media Reader's Choice poll... http://www.streamingmedia.com/Reader...2017/Vote.aspx
  Reply With Quote
Old 23rd August 2017, 08:56   #5559  |  Link
Ma
Registered User
 
Join Date: Feb 2015
Posts: 326
I've made 10-bit "true placebo" 2500 kb/s test with Sintel movie (from 8-bit 4K png source).

Command line:
Code:
f:\speed\Sintel>ffmpeg -framerate 24 -start_number 00000001 -i %08d.png -pix_fmt yuv420p16 -vf "scale=1920:-4:flags=bicubic+accu
rate_rnd+full_chroma_int+full_chroma_inp:param0=-0.5:param1=0.25,setsar=1" -v warning -strict -1 -f yuv4mpegpipe -   | x265 -D10
 --bitrate 2500 -I480 --psnr --ssim -p9 --no-psy-rd --multi-pass-opt-distortion --rc-lookahead 120 --bframes 12 --ref 6 --subme
7 -F1 --y4m - w1.hevc --pass 1
[yuv4mpegpipe @ 00000000005603a0] Warning: generating non standard YUV stream. Mjpegtools will not work.
y4m  [info]: 1920x816 fps 24/1 i420p16 sar 1:1 unknown frame count
raw  [info]: output file: w1.hevc
x265 [info]: HEVC encoder version 2.5+11-d58761d8db4a
x265 [info]: build info [Windows][MSVC 1911][64 bit] 10bit+8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [warning]: --psnr used with psy on: results will be invalid!
x265 [warning]: --tune psnr should be used if attempting to benchmark psnr!
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 1 / wpp(13 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 4 inter / 4 intra
x265 [info]: ME / range / subpel / merge         : star / 92 / 7 / 5
x265 [info]: Keyframe min / max / scenecut / bias: 24 / 480 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 120 / 12 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
x265 [info]: References / ref-limit  cu / depth  : 6 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : ABR-2500 kbps / 0.60
x265 [info]: tools: rect amp rd=6 rdoq=2 psy-rdoq=1.00 tskip signhide tmvp
x265 [info]: tools: b-intra strong-intra-smoothing deblock sao stats-write
x265 [info]: frame I:    187, Avg QP:16.78  kb/s: 16314.61  PSNR Mean: Y:51.527 U:54.999 V:54.462  SSIM Mean: 0.994805 (22.844dB
)
x265 [info]: frame P:   5475, Avg QP:18.42  kb/s: 6150.87   PSNR Mean: Y:50.783 U:54.985 V:54.451  SSIM Mean: 0.993361 (21.779dB
)
x265 [info]: frame B:  15650, Avg QP:23.87  kb/s: 992.31    PSNR Mean: Y:50.160 U:56.112 V:55.537  SSIM Mean: 0.992487 (21.242dB
)
x265 [info]: Weighted P-Frames: Y:10.6% UV:7.3%
x265 [info]: Weighted B-Frames: Y:8.8% UV:5.7%
x265 [info]: consecutive B-frames: 17.7% 9.1% 8.2% 42.3% 6.2% 9.2% 3.2% 2.5% 0.4% 0.2% 0.2% 0.2% 0.6%

encoded 21312 frames in 135210.11s (0.16 fps), 2451.98 kb/s, Avg QP:22.41, Global PSNR: 51.632, SSIM Mean Y: 0.9927318 (21.386 d
B)

f:\speed\Sintel>ffmpeg -framerate 24 -start_number 00000001 -i %08d.png -pix_fmt yuv420p16 -vf "scale=1920:-4:flags=bicubic+accu
rate_rnd+full_chroma_int+full_chroma_inp:param0=-0.5:param1=0.25,setsar=1" -v warning -strict -1 -f yuv4mpegpipe -   | x265 -D10
 --bitrate 2500 -I480 --psnr --ssim -p9 --no-psy-rd --multi-pass-opt-distortion --rc-lookahead 120 --bframes 12 --ref 6 --subme
7 -F1 --y4m - w2.hevc --pass 2
[yuv4mpegpipe @ 00000000004403a0] Warning: generating non standard YUV stream. Mjpegtools will not work.
y4m  [info]: 1920x816 fps 24/1 i420p16 sar 1:1 unknown frame count
raw  [info]: output file: w2.hevc
x265 [info]: HEVC encoder version 2.5+11-d58761d8db4a
x265 [info]: build info [Windows][MSVC 1911][64 bit] 10bit+8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [warning]: --psnr used with psy on: results will be invalid!
x265 [warning]: --tune psnr should be used if attempting to benchmark psnr!
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 1 / wpp(13 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 4 inter / 4 intra
x265 [info]: ME / range / subpel / merge         : star / 92 / 7 / 5
x265 [info]: Keyframe min / max / scenecut / bias: 24 / 480 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 120 / 12 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
x265 [info]: References / ref-limit  cu / depth  : 6 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : ABR-2500 kbps / 0.60
x265 [info]: tools: rect amp rd=6 rdoq=2 psy-rdoq=1.00 tskip signhide tmvp
x265 [info]: tools: b-intra strong-intra-smoothing deblock sao stats-read
x265 [info]: frame I:    187, Avg QP:15.44  kb/s: 18757.12  PSNR Mean: Y:53.007 U:56.245 V:55.748  SSIM Mean: 0.996481 (24.535dB
)
x265 [info]: frame P:   5475, Avg QP:19.54  kb/s: 6299.14   PSNR Mean: Y:50.557 U:55.238 V:54.725  SSIM Mean: 0.994567 (22.650dB
)
x265 [info]: frame B:  15650, Avg QP:25.57  kb/s: 977.83    PSNR Mean: Y:49.518 U:56.082 V:55.533  SSIM Mean: 0.993497 (21.869dB
)
x265 [info]: Weighted P-Frames: Y:3.3% UV:2.1%
x265 [info]: Weighted B-Frames: Y:1.0% UV:0.5%
x265 [info]: consecutive B-frames: 17.7% 9.1% 8.2% 42.3% 6.2% 9.2% 3.2% 2.5% 0.4% 0.2% 0.2% 0.2% 0.6%

encoded 21312 frames in 124205.99s (0.17 fps), 2500.86 kb/s, Avg QP:23.93, Global PSNR: 51.261, SSIM Mean Y: 0.9937978 (22.075 d
B)
The result is watchable but encoding speed is a bit too slow.
Result movie -- www.msystem.waw.pl/x265/sintel2500.mkv
Ma is offline   Reply With Quote
Old 23rd August 2017, 13:58   #5560  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,484
Quote:
Originally Posted by Atak_Snajpera View Post
well, with 4K encoding with this command line:

Quote:
ffmpeg\ffmpeg.exe -i Sample\Exodus_UHD_HDR_Exodus_draft.mp4 -an -f rawvideo - | x265\x265.exe --input-res 3840x2160 --fps 23.976 - -o Output\x265_2160p.265 --input-depth 10 --output-depth 10 --crf 24 --preset medium --tune grain
Code:
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
|     CPU             |   x264  |   x265  |   LAVC  |   auto  |   MMX2  |    SSE  |   SSE2  |   SSE3  |   SSE4  |    AVX  |   AVX2  |    All  | 
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------| 
|  Core i9-7900X      |  32.97  |   6.24  |   157   |   4.70  |   1.57  |   1.58  |   2.50  |   2.74  |   4.07  |   3.98  |   4.82  |    N/A  |
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| Threadripper 1950X  |  37.59  |   6.50  |   136   |   4.65  |   2.02  |   2.00  |  2.95   |   3.10  |   3.89  |   4.10  |   4.26  |    N/A  | 
|---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
With core i9-7900X @4.5 Ghz and 1950X @stock

at stock, core i9-7900X will be at 27.97 fps for x264 and 5.29 fps for x265

As I always say, massive instance encoding is real problem for Rysen.

Anyway certainely that i9-7960X 16C/32T or i9-7980XE 18C/36T will make really better result than 1950X for x264 and x265 encoding. But it's not the case for i9-7800X 10C/20T even with massive OC at 4.5 Ghz.
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 23rd August 2017 at 14:23.
Sagittaire is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:37.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.