Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 1st August 2015, 02:15   #1  |  Link
qyot27
...?
 
qyot27's Avatar
 
Join Date: Nov 2005
Location: Florida
Posts: 1,420
Performance on ARM?

There's a lot of these ARM single-board computers floating around, but I find that the performance reports often gloss over the details I'd most like to see the numbers for.

So those of you who have access to such boards - the RPi2, ODROID C1 or XU3/XU4, any number of Beagles, etc. - what kind of framerates can x264 get while encoding with the common presets and common resolutions like 1080p? How optimized are the ARM-centric SIMD opts, in other words (probably NEON and maybe Thumb; AArch64 would be nice, but most of these boards still use ARMv7 chips)?

What is the CPU decoding like with libavcodec? The RPi2 has an hwaccel now - what kind of performance can be expected with libavcodec in that case? Since these things surely can't do 10bit on GPU, the CPU performance in those cases matters more, even if you can actually do 1080p30@8bit in real time with the GPU. And if I'd resort to transcoding to 8bit so the GPU can be used, that's still a CPU task on decoding and encoding.

I'd really like to hold off until ARMv8-A/AArch64 becomes the norm for these things, but I have a feeling I'd be waiting a while. Feel free to provide numbers for those if you do have access to one, though.
qyot27 is offline   Reply With Quote
Old 3rd August 2015, 22:10   #2  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
I doubt anyone would do 10-bit in software when they could do 8-bit in hardware.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 4th August 2015, 03:25   #3  |  Link
qyot27
...?
 
qyot27's Avatar
 
Join Date: Nov 2005
Location: Florida
Posts: 1,420
I mostly mentioned 10-bit in the context of converting it down to 8-bit, and possibly of doing encoding. The likelihood I'd try playing 10-bit on one of those things is pretty small, unless it was 480p or something - although because I have no clue what the CPU power is like, even 480p@10-bit may not be doable. I mean, the computer I spend most of my time on right now has a Coppermine in it - so my threshold on speed improvements is set pretty low.
qyot27 is offline   Reply With Quote
Old 4th August 2015, 09:52   #4  |  Link
foxyshadis
Angel of Night
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
Then you're in luck; each core of the Pi2 is about as powerful as a 800MHz PIII. Fortunately, there's four of them, plus the various hardware accelerations, so it'll still be much faster; since x264 is almost perfectly parallel, you can assume a ~4x speedup at the same settings. They also have a hardware encoder, which is very fast indeed but as always, a bit lacking in quality. Good more for intermediates than finals. It does support OpenCL, but it might be tough to find anything that makes use of it; I'm not sure if it's powerful enough to run KNLMeansCL, for instance, but it's worth a shot.

I don't know much about any of the others you listed. The NUC and the Acer Revos are even more powerful micro-PCs, but they're also ten times the price of the Pi.
foxyshadis is offline   Reply With Quote
Old 4th August 2015, 20:57   #5  |  Link
vivan
/人 ◕ ‿‿ ◕ 人\
 
Join Date: May 2011
Location: Russia
Posts: 643
On Snapdragon 801 and on max clock (2.5 Ghz, but it gets throttled hard after few minutes of 100% load so watching 10-bit video is actually painful despite great numbers) I get around 3.5 (10 bit anime OP) to 5 (8 bit real-life) fps on 1080p video (using software decoding), with -preset veryfast -crf 18 (using some ffmpeg app from store).
With just 10-bit decoding - 1080p is almost ok (24 fps on medium OP) and 720p is almost perfect (30 fps on very heavy/grainy anime, 50+ on medium).

RPi2 should be 2-3 times slower, so 1-2 fps? Also 1080p with slower presets will require more RAM, 2 GB might be not enough.
vivan is offline   Reply With Quote
Old 5th August 2015, 17:30   #6  |  Link
mandarinka
Registered User
 
mandarinka's Avatar
 
Join Date: Jan 2007
Posts: 729
Can Cortex-A7 really catch up to Pentium 3? A7 is dual-issue in-order...
mandarinka is offline   Reply With Quote
Old 5th August 2015, 17:50   #7  |  Link
qyot27
...?
 
qyot27's Avatar
 
Join Date: Nov 2005
Location: Florida
Posts: 1,420
Quote:
Originally Posted by vivan View Post
On Snapdragon 801 and on max clock (2.5 Ghz, but it gets throttled hard after few minutes of 100% load so watching 10-bit video is actually painful despite great numbers) I get around 3.5 (10 bit anime OP) to 5 (8 bit real-life) fps on 1080p video (using software decoding), with -preset veryfast -crf 18 (using some ffmpeg app from store).
With just 10-bit decoding - 1080p is almost ok (24 fps on medium OP) and 720p is almost perfect (30 fps on very heavy/grainy anime, 50+ on medium).

RPi2 should be 2-3 times slower, so 1-2 fps? Also 1080p with slower presets will require more RAM, 2 GB might be not enough.
I'm leaning toward the ODROID XU4, which uses the Exynos 5422 Octa. As I just learned, both the 5422 and the 801 were used in two different models of the Galaxy S5, and I found a blog post that compares the two directly:

http://www.techentice.com/galaxy-s5-...422-octa-core/

Now, whether this actually translates into it also beating the 801 on a non-smartphone-based platform, I don't know. At least the ODROID board has support for eMMC 5.0, so the read and write speeds would be higher (possibly significantly higher).


I'd simply have to rely on Kodi for media playback rather than mpv, since there is no accel in FFmpeg for MFC as of yet.
qyot27 is offline   Reply With Quote
Old 6th August 2015, 02:51   #8  |  Link
foxyshadis
Angel of Night
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
Quote:
Originally Posted by mandarinka View Post
Can Cortex-A7 really catch up to Pentium 3? A7 is dual-issue in-order...
Probably depends on having a good compiler, and gcc 5.2 and llvm are quite good at arm now; they try to avoid pipeline stalls with better scheduling. It has double the L1 & L2 caches, double the registers, and its NEON support is much more powerful than the P3's MMX support -- as SSE isn't very useful for AVC en/decoding -- so despite being hamstrung by dual-issue in-order processing, the rPi2 can reasonably pretend to be 4 P3s, if code's optimized. (Compared to the PPro's to P3's 3-issue OOO.)

Last edited by foxyshadis; 6th August 2015 at 03:24.
foxyshadis is offline   Reply With Quote
Old 6th August 2015, 17:27   #9  |  Link
mandarinka
Registered User
 
mandarinka's Avatar
 
Join Date: Jan 2007
Posts: 729
Hmm, I think I have a motherboard + Coppermine (180nm) Pentium III 666 under a table and another machine that should even have bootable windows XP (a laptop).

I guess that if somebody can get a benchmark (ideally with ffmpeg I guess), or x264 in case of encoding on a Cortex-A7, we could compare...
mandarinka is offline   Reply With Quote
Old 7th August 2015, 16:41   #10  |  Link
qyot27
...?
 
qyot27's Avatar
 
Join Date: Nov 2005
Location: Florida
Posts: 1,420
For the record, the precise Coppermine CPU I have is 1GHz Coppermine-128 Celeron. One of the last ones before Tualatin came out, but since it's a Celeron the caches are easily half or less what they'd be on a Pentium.
qyot27 is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 14:19.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.