Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 3rd September 2017, 19:01   #1  |  Link
wqcr
Registered User
 
Join Date: Sep 2007
Posts: 24
x265 encoding on arm64

Hi,
I'm on aarch64/arm64/armv8l and trying to figure out how to transcode video (mpeg1, mpeg2, avc) into h265 with opus audio.
FFmpeg built-in h265 encoder for debian squeezy does not currently support multicore encoding (surprisingly h264 does), further in the current version, ffmpeg's opus encoder also appears broken.
So I'd need to use ffmpeg to decode video and feed it directly to x265 encoder, then to decode audio and using opusenc create opus file, and finally mux those two streams.
Can someone advice me how to do this?

I'd have grabbed Handbrake a long time ago, but it's also not available for arm64 yet.

Thanks
wqcr is offline   Reply With Quote
Old 3rd September 2017, 23:03   #2  |  Link
birdie
Artem S. Tashkinov
 
birdie's Avatar
 
Join Date: Dec 2006
Posts: 248
You don't want to do that. Seriously.
birdie is offline   Reply With Quote
Old 3rd September 2017, 23:33   #3  |  Link
RanmaCanada
Registered User
 
Join Date: May 2009
Posts: 299
Do you have years to wait? Seriously you do not wait to do x265 on arm.
RanmaCanada is offline   Reply With Quote
Old 4th September 2017, 06:51   #4  |  Link
wqcr
Registered User
 
Join Date: Sep 2007
Posts: 24
Quote:
Originally Posted by birdie View Post
You don't want to do that. Seriously.
Quote:
Originally Posted by RanmaCanada View Post
Do you have years to wait? Seriously you do not wait to do x265 on arm.
Putting aside for a moment the fact that I asked for something else, care to explain why?

Last edited by wqcr; 4th September 2017 at 06:59.
wqcr is offline   Reply With Quote
Old 4th September 2017, 12:28   #5  |  Link
Ely
Registered User
 
Join Date: Dec 2014
Posts: 40
x265 on arm64, even with NEON, is slower by several orders of magnitude than x86.

Your best bet is to have a SoC with a HEVC hardware encoder and use that.

To actually answer your question : you need to build FFmpeg linked with libx265 and libopus. This way, you can encode streams with commands roughly like this :

Code:
ffmpeg -i <input> -c:v libx265 -c:a libopus <output>
Ely is offline   Reply With Quote
Old 4th September 2017, 13:12   #6  |  Link
wqcr
Registered User
 
Join Date: Sep 2007
Posts: 24
I don't know, really....
x265 on arm64 is running 0.35-0.45fps on 'slower' preset for 352x272 MPEG-2 video

When encoding the same file on i7-3820QM downclocked to 2.2GHz I get 3.4fps

But this is most likely due to the fact that x265 on arm64 is not multithreaded:




With as efficient multithreading as on x86-64, the performance would be on-par, or much closer, certainly not magnitudes slower.

Last edited by wqcr; 4th September 2017 at 13:16.
wqcr is offline   Reply With Quote
Old 5th September 2017, 00:27   #7  |  Link
birdie
Artem S. Tashkinov
 
birdie's Avatar
 
Join Date: Dec 2006
Posts: 248
It will be magnitudes slower because mobile SoCs are just not suitable for such intensive workloads. I'm not even sure your SoC is not already throttling when using a single thread.
birdie is offline   Reply With Quote
Old 5th September 2017, 06:03   #8  |  Link
wqcr
Registered User
 
Join Date: Sep 2007
Posts: 24
Quote:
Originally Posted by birdie View Post
It will be magnitudes slower because mobile SoCs are just not suitable for such intensive workloads. I'm not even sure your SoC is not already throttling when using a single thread.
There's lot of assumptions in that statement.
Have you actually tried it before assuming all arm-SoCs are just too weak to handle this?

I did - and not only the SoC does not throttle after days and days of 100% load (all 8 cores), further its performance is equivalent to i5-3340M HT at 3.2GHz (Geekbench3 and opus decoding speed).
It doesn't have any difficulty with x264 even multithreaded. By all means it's not only suitable for x265 encode, it should be even preferred due to its vastly superior power efficiency, several magnitudes better than even the latest generation of Kaby Lake.
The only drawback comes from current implementation of x265 which on arm64 supposedly neither use NEON nor is multithreaded.

Still with manual parallelization, 2.5+fps is possible, which isn't that far from the above result measured at i7 quad

Last edited by wqcr; 5th September 2017 at 06:09.
wqcr is offline   Reply With Quote
Old 5th September 2017, 09:06   #9  |  Link
littlepox
Registered User
 
Join Date: Nov 2012
Posts: 218
Given x265 is still at its feature expanding & quality enhancing stage, it makes little sense to burn resources and optimize it for arm devices.
littlepox is offline   Reply With Quote
Old 9th September 2017, 08:04   #10  |  Link
x265_Project
Guest
 
Posts: n/a
Quote:
Originally Posted by wqcr View Post
There's lot of assumptions in that statement.
Have you actually tried it before assuming all arm-SoCs are just too weak to handle this?

I did - and not only the SoC does not throttle after days and days of 100% load (all 8 cores), further its performance is equivalent to i5-3340M HT at 3.2GHz (Geekbench3 and opus decoding speed).
It doesn't have any difficulty with x264 even multithreaded. By all means it's not only suitable for x265 encode, it should be even preferred due to its vastly superior power efficiency, several magnitudes better than even the latest generation of Kaby Lake.
The only drawback comes from current implementation of x265 which on arm64 supposedly neither use NEON nor is multithreaded.

Still with manual parallelization, 2.5+fps is possible, which isn't that far from the above result measured at i7 quad
Not multithreaded? x265 is cross-platform C/C++ code, and it can be compiled for several different microprocessor architectures (x86, ARM, PowerPC). x265 is always multi-threaded. That's in the software architecture (the thread pools feature, frame parallelism, Wavefront Parallel Processing, etc.). It can't possibly run single-threaded unless you explicitly make it do that with the --pools command.

We have some limited ARM Neon optimization (x265\source\common\arm), but this is not anywhere near as complete as our x86 SIMD optimization. We've had discussions with various people at various times about doing a full optimization effort, but as of today this hasn't bubbled up to the top of the priority list for our customers or our strategic hardware partners. Of course, x265 is open source, and contributions are always welcomed.
  Reply With Quote
Old 10th September 2017, 23:29   #11  |  Link
mandarinka
Registered User
 
mandarinka's Avatar
 
Join Date: Jan 2007
Posts: 729
Quote:
Originally Posted by wqcr View Post
I don't know, really....
x265 on arm64 is running 0.35-0.45fps on 'slower' preset for 352x272 MPEG-2 video
What SoC/CPU are you using? "arm64" could mean something from awfully broad spectrum of slow to reasonably fast chips. Cortex-A53 is quite different from some higher out of order core or even the architectures Apple implemented. Clocks matter, number of cores too, etc.
mandarinka is offline   Reply With Quote
Old 30th September 2017, 11:32   #12  |  Link
wqcr
Registered User
 
Join Date: Sep 2007
Posts: 24
Encode finished in just under 78 hours - 7 h264 clips, 41mins long each at 352x272 - encoder preset "slower", bitrate based 400kbps, 1pass, audio 64k opus
I'm quite satisfied with the result. CPU throttled a little with all its cores loaded, but only 10-20% under extreme conditions (35C ambient).
2-pass encoding seems to be broken though, even with the correct params, log files were not created.

If you really want to know, this system used for encoding:
Redmi Note 4 Global version
CPU - Snapdragon 625, A53 octa-core at 2.02GHz
RAM - 4GB LPDDR3-1600
Setup - Rooted AOSGP X 2.11 on Android 7.1.1, Debian Stretch running through chroot, using hotspot mode in conjunction with sshd and vnc server to operate the machine remotely.
Typical power consumption - 3.7W on full load

Last edited by wqcr; 30th September 2017 at 11:44.
wqcr is offline   Reply With Quote
Old 2nd October 2017, 06:15   #13  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,956
Neat! Thanks for sharing your experience
Blue_MiSfit is offline   Reply With Quote
Old 16th October 2017, 14:14   #14  |  Link
wqcr
Registered User
 
Join Date: Sep 2007
Posts: 24
Another encode finished, this time 2-pass PAL source.
Preset slower, fast 1-pass, v-bitrate 400, a-bitrate 64 (opus)
7 clips (each 7 minutes long) were finished in 22 hours.

This time I used Slimrom, which by default disables any thermal throttling, so CPU was at 2016 MHz all the time. Tcase was just under 74C, phone's outer case was no more than 48C. Power consumption jumped to 4W, so beefier 5V/2A source had to be used.

Still I'm again very satisfied with the result, even though x265 haven't used any arm64 optimizations.
Performance is more or less directly comparable to similarily clocked C2Q, except that would run at 12 times the consumption compared to this little SOC.
wqcr is offline   Reply With Quote
Old 17th October 2017, 15:01   #15  |  Link
Motenai Yoda
Registered User
 
Motenai Yoda's Avatar
 
Join Date: Jan 2010
Posts: 709
7 x 7 min each = 49 min
if those are pal usually 720x576x25fps so
about 385kPx/s looks to me a bit too slow for a 2.0GHz c2q, also is useless compare 10y old cpu efficiency with a new one.

my rpi3 reach 200kPx/s (1.2GHz armv6 + neon, maybe I have to compile it with more appropriate flags)
__________________
powered by Google Translator

Last edited by Motenai Yoda; 17th October 2017 at 15:09.
Motenai Yoda is offline   Reply With Quote
Old 12th November 2017, 19:03   #16  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,047
Would be nice if someone with one of these qualcomm 2400 arm chips could does some x264 and x265 encodes tests: https://blog.cloudflare.com/arm-takes-wing/
hajj_3 is offline   Reply With Quote
Old 30th December 2020, 15:50   #17  |  Link
ReinerSchweinlin
Registered User
 
Join Date: Oct 2001
Posts: 414
Since I stumpled across some M1 x265 mentionings across the web over the past few days, I wanted to see if there are some in-depth benchmarks of the M1 handbrake/ffmpeg HEVC variants.

Found some interesting stuff:

https://www.reddit.com/r/hardware/co..._i_benched_it/

https://www.youtube.com/watch?v=iGVK...ature=youtu.be

Does anyone have some more comparisons?

Looking at the very low power a MAC Mini is drawing and given the price, it looks like the M1 chip could be a very interesting option for x265 HEVC Encodings if not "the fastes around" is neede, but a solid performance for desktop/hobby usecases.. Thinking of powering this with solar
ReinerSchweinlin is offline   Reply With Quote
Old 30th December 2020, 18:08   #18  |  Link
nakTT
Registered User
 
Join Date: Dec 2008
Posts: 407
Quote:
Originally Posted by wqcr View Post
Hi,
I'm on aarch64/arm64/armv8l and trying to figure out how to transcode video (mpeg1, mpeg2, avc) into h265 with opus audio.
FFmpeg built-in h265 encoder for debian squeezy does not currently support multicore encoding (surprisingly h264 does), further in the current version, ffmpeg's opus encoder also appears broken.
So I'd need to use ffmpeg to decode video and feed it directly to x265 encoder, then to decode audio and using opusenc create opus file, and finally mux those two streams.
Can someone advice me how to do this?

I'd have grabbed Handbrake a long time ago, but it's also not available for arm64 yet.

Thanks
Tried it on my Raspberry Pi 4B (But in my case, 32 bit OS, armhf) using Handbrake 1.2.2 and I can tell you that it's awfully slow even for the core (ARM Cortex-A72) that supposedly competitive with at least Intel Atom and the likes in many other workloads. I think it's mainly down to the lack of optimization, unlike what x86 CPU gets. What do you think?

Last edited by nakTT; 30th December 2020 at 18:21.
nakTT is offline   Reply With Quote
Old 30th December 2020, 22:16   #19  |  Link
RanmaCanada
Registered User
 
Join Date: May 2009
Posts: 299
ARM is too slow. This is only an option if time means nothing to you. You can get better results with a 35watt APU from AMD. Heck the current laptop lineup from AMD destroys this and they literally sip power.
RanmaCanada is offline   Reply With Quote
Old 31st December 2020, 00:14   #20  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,956
Maybe too slow for you, but ARM is rapidly becoming more and more prevalent as ARM chips can occupy an interesting quadrant on the power / speed curve. I imagine with thorough assembly optimization a modern ARM server CPU could outperform a modern x86_64 CPU in terms of efficiency.

If this wasn't the case we probably wouldn't see AWS, Apple, and Microsoft all investing in their own ARM silicon.

Granted, HEVC compression is a very specific use case
Blue_MiSfit is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:52.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.