Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 12th January 2023, 21:29   #141  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Posts: 343
Quote:
Originally Posted by FranceBB View Post
m6i.4xlarge 16c/16th AVX-512 CPU + OpenCL GPU
1h 03m
Which GPU is available in an m6i.4xlarge instance ?
rwill is offline   Reply With Quote
Old 12th January 2023, 23:16   #142  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
Quote:
Originally Posted by rwill View Post
Which GPU is available in an m6i.4xlarge instance ?
Since everything you run through AWS is virtualized (unless you specifically require bare metal), it's hard to know, but I believe they're running an NVIDIA Tesla M6 8GB GDDR5.
FranceBB is offline   Reply With Quote
Old 17th January 2023, 18:52   #143  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by FranceBB View Post
I actually tested again but this time on Amazon AWS machines.

m6i.4xlarge 16c/16th AVX-512 CPU Only
1h 24m

m6i.4xlarge 16c/16th AVX-512 CPU + OpenCL GPU
1h 03m
...
adding --opencl saved around 20 min compared to not adding it.
So... yeah, it still makes sense to have it on.
Yeah, that follows the "about a 20% speedup" rule of thumb.

That said, a c6a.4xlarge (AMD EPYC 7R13) may well offer the same performance software-only at a lower per-hour instance cost.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 10th February 2023, 11:36   #144  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Quote:
Originally Posted by LoRd_MuldeR View Post
Probably there simply are no "high bit-depth" functions that would benefit from AVX-512 instructions, or at least nobody nobody cared enough about AVX-512 to figure it out
As from old intel whitepaper from start of x265 at the start of 201x - https://www.intel.com/content/dam/de...265-avx512.pdf

The >8bit MPEGs more benefit from AVX512 because it uses 2x wider computing of source and result and immediate numbers (16bit for src and out and 32bit immediate) in compare with 8bit MPEGs (8bit src and out and 16bit immediate possible). Median performance boost from AVX512 over AVX2 is about 40% at 'processing kernels' of MPEG encoder.


But 10bit x264 looks never was interested for major usage and no developers resources were for design either wider AVX512 computing for existed x264 architecture (SIMD 'workunit' size of 512 bytes max for AVX2 register file) or even much more complex design separate brunch of 4x enlarged 'workunit' size for 2048 bytes register file for AVX512 x64 environment. And that AVX512 brunch will only be executed faster on rare exist at endusers AVX512 environment in prevoius years. And no rich investor put some grant to support AVX512 redesign of x264 (to use in some possible commertials).

When some company start project from AVX512 hardware it can invest in pro programmers to design software solution for target platform and use all benefits of AVX512 environment. Splitting problem chunks to 'large workunit' size up to 2048 bytes with best processing performance at AVX512 platform. Also using alignment of workunits addressing to 64bytes cacheline size for fastest transfer to and from AVX512 register file and dispatch ports. And freeware opensource developers for multiplatform development typically making C-reference solutions to rely on compiler vectorization is possible and may limit 'workunit size' of algorithm to smaller wider used by endusers platforms because it works faster on small register file chips (less reloads from cache). So practically when opensource developers design and profile for best performance some computing algorithms at cheap old platforms with small sized register file (128 or 256 bytes for SSE(2) and AVX(2)) they actually optimize algorithm to run only at small register file sized platforms. And resources of high-cost in the past AVX512 platforms left underused. Full optimizing for AVX512 includes both usage of 2x wider execution ports (and some more faster instructions) and 2x sized datawords transfer and usage of 4x larger register file increasing processed 'workunit' size. And changing 'workunit' size may need the significant redesign of software (like processing 4 blocks in single pass instead of 1 block at AVX2 and so on).

AVX512 programs significantly more complex in design and debug. Without AVX512 chip it is possible to design via intel SDE software simulator but it can not provide correct profiling for performance results.
May be with progress of AI like ChatGPT or others we can see some progress in using of new compute platforms for solving old tasks like x264. Because it looks resources of opensource freeware programmers fast dying with ending of current civilization.

Quote:
Originally Posted by FranceBB View Post
m6i.4xlarge 16c/16th AVX-512 CPU Only

x264.exe "Z:\AVS Script.avs" --preset medium --profile high --level 4.1 --ref 4 --deblock -1:-1 --crf 25 --keyint 50 --aud --overscan show --range tv --opencl --colormatrix bt709 --transfer bt709 --colorprim bt709 --videoformat component --nal-hrd vbr --vbv-maxrate 25000 --vbv-bufsize 25000 --output "I:\temp\raw_video.h264"
Do programmers of x264 builds in 2022..2023 already included AVX512 usage in auto-detection of execution platform capabilities ? Or user still need to force it with --asm avx512 (or may be --asm=avx512) command line option ?

Quote:
Originally Posted by LigH View Post

I can't use ICL, I don't have any experience with it.
Intel C compiler can use multi-file interprocedural optimization (not everytime works and may need to fix sources). It may visibly helps to performance of complex program with many small processing functions (and many C source files). Though with increasing of complexity of program the probability of successful full program multi-file IPO decreases. But you still can try to enable it for separate projects of solution. Sometime you need to run several compiling runs and one of it finally ends with successful multi-file IPO.

Also it possibly best compiler to make AVX512-targeted builds (and intel chips builds with individual optimization to many intel chips families).

Last edited by DTL; 10th February 2023 at 12:24.
DTL is offline   Reply With Quote
Old 10th February 2023, 17:51   #145  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
Quote:
Originally Posted by DTL View Post
But 10bit x264 looks never was interested for major usage and no developers resources were for design either wider AVX512 computing for existed x264 architecture (SIMD 'workunit' size of 512 bytes max for AVX2 register file) or even much more complex design separate brunch of 4x enlarged 'workunit' size for 2048 bytes register file for AVX512 x64 environment.
Yeah, x264 is almost always used in the 8bit flavor for distribution, however professionally AVC Intra and XAVC Intra are two very common use cases for both FULL HD and UHD workflows (like Intra Class 300 and 480) and there AVX512 would definitely speed things up a lot. It's a shame that no one invested into it, though.

Quote:
Originally Posted by DTL View Post
no rich investor put some grant to support AVX512 redesign of x264 (to use in some possible commercials).
We're already supporting Multicoreware for x265 (and the future x266) by being part of their partner program, but unfortunately they're not the ones behind x264...
I'm sure that there are other companies using x264 to encode either AVC Intra files or XAVC files using the Sony and Panasonic profiles for linear playout beside us, so if they could chip in and support the development, that would be greatly appreciated.


Quote:
Originally Posted by DTL View Post
Do programmers of x264 builds in 2022..2023 already included AVX512 usage in auto-detection of execution platform capabilities ? Or user still need to force it with --asm avx512 (or may be --asm=avx512) command line option ?
Ironically, x264 works automatically with AVX-512 once it detects a compatible CPU to run, however x265 still needs --asm=avx512 to make use of them.
Why? Dunno.
FranceBB is offline   Reply With Quote
Old 10th February 2023, 18:57   #146  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
"professionally AVC Intra and XAVC Intra are two very common use cases for both FULL HD and UHD workflows (like Intra Class 300 and 480) and there AVX512 would definitely speed things up a lot"

Intra may be already very simple and too fast - so the total workflow may be too few benefit if someone make intra more faster. Most slow things may happen at inter frame encoding where motion search required. Also as I see today 10bit x264 builds for AVC-Intra is almost impossible to found. May be it somewhere in special builds of ffmpeg.

8bit 'classic' x264 throws an error at YV12 avisynth input -
Code:
at -I 1 --avcintra-class=100
x264 [error]:  8-bit AVC-Intra is not widely compatible
x264 [error]: 10-bit x264 is required to encode AVC-Intra
x264 [error]: x264_encoder_open failed
With simple I-frames only simulation with -I 1 it runs at about 12 fps vs 3 fps for IPB encoding. About 4x faster for 'intra-only' encoding.

"I'm sure that there are other companies using x264 to encode either AVC Intra files or XAVC files using the Sony and Panasonic profiles for linear playout beside us, so if they could chip in and support the development, that would be greatly appreciated."

If some commertials uses freeware for production it may be too poor company to pay even for its existing poor workers. Not have any funds to pay to pro programmers to make opensource software better. If you have some funds you may try to open some offering contract at software developers jobs sites with exact task to pay for - like make x264 build run 2x faster at AVX512 defined chip with your required arguments and provided sample footage. May be such jobs exist and already solved but not offered opensource as a commit to standard project for free.

Also current business models are about making money for business owners and not make job amount of hired workers lower for the same payments. So if worker want to encode faster and spent less work time it is only task for worker to read intel docs for chip and make software run faster. The business owner will not pay for it.
Also as stated in typical jobs contracts: All enchancements made by worker for software are new funds of business owner. So it may be illegal for workers contracts to share any enchancements made to x264 at the payed time by business owner.

Last edited by DTL; 10th February 2023 at 19:43.
DTL is offline   Reply With Quote
Old 11th February 2023, 00:22   #147  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
Quote:
Originally Posted by DTL View Post
Also as I see today 10bit x264 builds for AVC-Intra is almost impossible to found.
Just use any of the JPSDR builds and you're good to go.

Quote:
Originally Posted by DTL View Post
8bit 'classic' x264 throws an error at YV12 avisynth input -
Code:
at -I 1 --avcintra-class=100
x264 [error]:  8-bit AVC-Intra is not widely compatible
x264 [error]: 10-bit x264 is required to encode AVC-Intra
x264 [error]: x264_encoder_open failed
Yeah cause Panasonic and Sony specs are 10bits only, so the preset won't allow you, however you can do it manually if you really want to without using the preset.
Keep in mind that the command line switch is a bit like the Blu-ray switch: it will make life easier for you, but you can also use the options and do it manually. https://forum.doom9.org/showthread.php?t=182715

Quote:
Originally Posted by DTL View Post
With simple I-frames only simulation with -I 1 it runs at about 12 fps vs 3 fps for IPB encoding. About 4x faster for 'intra-only' encoding.
Well there are also the "Long GOP" version of AVC as per Sony specifications and those use a closed specific GOP and definitely make use of P and B.

About the last point, it's not true, this forum is the demonstration as we have people contributing to open source projects from all over the world and working on several different companies. Our Ben here works at Amazon for instance and they contributed to x265 with the NEON assembly optimization for ARM among other things, for instance, Steinar works at NRK and contributed to the Sony and Panasonic flavours we talked about in x264, Kieran works at Open Broadcast System and made x262 etc (I could go on and on and on), so companies do contribute to open source encoders.
FranceBB is offline   Reply With Quote
Old 11th February 2023, 14:42   #148  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Quote:
Originally Posted by FranceBB View Post
I'm sure that there are other companies using x264 to encode either AVC Intra files or XAVC files using the Sony and Panasonic profiles for linear playout beside us, so if they could chip in and support the development, that would be greatly appreciated.
For I-frames only encoding you may check this build - https://github.com/DTL2020/x264/releases/tag/Rel_130223 . It have (temporarily for performance check) disabled some _x9_8x8 and _x9_4x4 SAD/SATD compare in intra analysis. At old E7500 CPU it runs about 34% faster but produces about 10% more bitrate (may be with fixed bitrate AVC-Intra modes it will only degrades quality in some degree and depending on frame complexity).
Later I hope to put this function call to multi-block SIMD processing of AVX2/AVX512 to keep full quality.

Updated: fixed build to include processing of 4:2:2 and 10bit format required for AVX-Intra with skipped _x9 macroblocks compare. It looks this functions still not exist at all for single call _x9 processing and for > 4:2:0 and/or > 8bit processing always performed longer loop of checking each predictor separately.

At i5-11600 CPU this build with simulated AVX-Intra 100 for FullHD frame settings (from https://forum.doom9.org/showthread.p...82#post1940382 post) run at about 22.9 fps (jpsdr build 'winthreads' marked run at about 20 fps).

As more profiling shows for I-frames only high bitrates encoding with CAVLC only compression:
It looks x264 not any optimized for such production. The AVC-Intra over 50 class not allow CABAC (having some asm optimizations) and so x264 only can run with CAVLC compression. CAVLC have only C-implementation and almost no asm optimizations. And it is not about math computing but mostly shuffling small enough and random length byte streams. So SIMD units unlikely can help alot here. It mostly memory-bound task. At least at first look at CAVLC compressor.
Also it much more display x264 feature to have lower performance at the lower compression rate. If you remove hard fixed bitrate from AVC-Intra and allow x264 to run at crf-ratecontrol with VBR it run significantly faster. At IBP encodings it is also visible but much lower. So at I-frames only and high fixed bitrates performance looks like limited by rate control logic to keep required very high bitrate stable. If disable CBR and allow low crf - the analysis run several times faster (about 80 fps vs 20fps).
May be for professional usage you can found other intra-frame encoder MJPEG-like with much better implementation of fixed CBR output.

Last edited by DTL; 13th February 2023 at 09:24.
DTL is offline   Reply With Quote
Old 20th February 2024, 15:35   #149  |  Link
blob2500
Registered User
 
Join Date: Sep 2007
Location: Italy
Posts: 24
Latest 'official' x64 build (r3179): 3MB only?

https://artifacts.videolan.org/x264/release-win64/

Last edited by blob2500; 20th February 2024 at 15:54.
blob2500 is offline   Reply With Quote
Old 20th February 2024, 16:23   #150  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,750
Most probably not containing all the encoders specialized in various internal resolutions (8+10 bit). Especially the higher resolutions require a lot more efforts and code to handle more than 1 byte per video component in Assembler.

New MABS compile: x264 0.164.3179 12426f5
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 21st February 2024 at 23:30.
LigH is offline   Reply With Quote
Old 20th February 2024, 18:34   #151  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Any release notes?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 20th February 2024, 18:58   #152  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
Quote:
Originally Posted by benwaggoner View Post
Any release notes?
https://code.videolan.org/videolan/x...commits/master
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else.
Barough is offline   Reply With Quote
Old 21st February 2024, 00:36   #153  |  Link
blob2500
Registered User
 
Join Date: Sep 2007
Location: Italy
Posts: 24
I read now that there is no lavf support in win64 version. There is in win32 ver.

Code:
x264 core:164 r3179 12426f5
Syntax: x264 [options] -o outfile infile

Infile can be raw (in which case resolution is required),
  or YUV4MPEG (*.y4m),
  or Avisynth if compiled with support (yes).
  or libav* formats if compiled with lavf support (no) or ffms support (no).

Last edited by blob2500; 21st February 2024 at 00:42.
blob2500 is offline   Reply With Quote
Old 21st February 2024, 21:35   #154  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,750
From which source? My MABS build supports both LAVF and FFMS in both bitnesses, also L-SMASH MP4 output.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 22nd February 2024, 03:06   #155  |  Link
blob2500
Registered User
 
Join Date: Sep 2007
Location: Italy
Posts: 24
I was still referring to the latest build 'official' version (r3179, 3MB).

https://artifacts.videolan.org/x264/release-win64/

Up to r3173 (win64) there was support for lavf:

Code:
x264 core:164 r3173 4815cca
Syntax: x264 [options] -o outfile infile

Infile can be raw (in which case resolution is required),
  or YUV4MPEG (*.y4m),
  or Avisynth if compiled with support (yes).
  or libav* formats if compiled with lavf support (yes) or ffms support (no).
--

Thanks for your builds with full lavf+ffms support. I've been using them for a long time.

Last edited by blob2500; 22nd February 2024 at 03:12.
blob2500 is offline   Reply With Quote
Old 22nd February 2024, 19:40   #156  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,308
You can also check my builds. I'm doing the builds for now, should be finished before WE.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 23rd February 2024, 11:35   #157  |  Link
blob2500
Registered User
 
Join Date: Sep 2007
Location: Italy
Posts: 24
Thank you.
But do you know why there is no longer support for lavf in the latest version of Videolan's x264 (win64 build)? Is it an error?

Last edited by blob2500; 23rd February 2024 at 11:38.
blob2500 is offline   Reply With Quote
Old 23rd February 2024, 18:55   #158  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,308
No idea...
__________________
My github.
jpsdr is offline   Reply With Quote
Old 23rd February 2024, 21:25   #159  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,750
You would have to ask those people who are responsible for building these binaries. Not us, though...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:14.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.