x265 HEVC Encoder [Archive] - Page 13

EncodedMango

5th April 2014, 03:59

So x265 has reached 0.9? The builds on chromashift seem to indicate so.

x265_Project

5th April 2014, 04:09

x265 0.9 is a regularly scheduled bug fix release. Many bugs have been fixed since the 0.8 tag, primarily in rate control and 10bit encodes. A race hazard on POSIX systems was fixed, and several non-determinism problems were resolved.

= API Changes =
* the stride of x265_picture is now in units of bytes, not pixels
* VUI configurables were moved into a param.vui sub-struct
* unimplemented VUI options removed
* bRepeatHeaders option added (inserts VPS+SPS+PPS each keyframe)
* fast-decode tune option added
* x265_encoder_headers() returns NAL byte count on success

= Command Line Interface Changes =
* --dither option to improve quality of pixel downshifts
* --cpuid replaced with x264 compatible --asm option
* --crf-max <float> added
* improved --help documentation, plus new online documentation
* **experimental** --interlaceMode <prog|tff|bff>
* **experimental** --weightb

= New Features =
* experimental support for interlaced content (field coding)
* experimental weightb support

We now have online documentation for this release http://x265.readthedocs.org/en/0.9/
plus http://x265.readthedocs.org/en/default/ and http://x265.readthedocs.org/en/stable for the two development branches.

See the online manual for full documentation of CLI (and API) options

Our focus for the near future remains on visual quality and rate control improvements.

uneedme

6th April 2014, 04:41

The new Vapoursynth is written in python language, so it might run on the GAE platform and using google CPU time and actualizing the cloud computing.

Is there any build of x265 with python, could hosting the public calculations online (like many people contributing their GAE accounts to join and construct one or several mega-mutilthreads to run apps with in-out ports).

I have known that a proxy-fuction software: "goagent" is working on that mode properly.

I am just a tech lame.

Or semi-mixed working mode <partial local , partial up-online>

Am I dreaming?

LigH

6th April 2014, 09:29

Dreaming? ... Probably, yes: x265 is mostly assembler based because many routines are so extremely computing intense that optimizing them at the level of CPU dependent instructions, and parallelizing them with a rather tight synchronization, is necessary to encode at a satisfying speed.

It may be possible to write another HEVC encoder in Python (similar to writing an AVC encoder for CUDA or QuickSync). But that will certainly not be "x265". Furthermore, porting algorithms to specific execution environments may have its own complexity limits, and because interpreted or JIT-compiled languages (like Python) are never as fast as well optimized Assembler code, it will probably take much longer than x265 with the same number of threads and at the same encoding complexity.

In addition, the more threads, the less optimal the encoding may become, depending on the algorithm (either it is looking for abbreviatable redundancies only in each thread's frame part independently, or sharing with other threads – if that is possible at all – will make the threads wait for each other a lot more). And finally, if you let a "Cloud" execute several threads of one algorithm in possibly very distant places, sharing information between them will cost a lot of time, while I could imagine that x265 would even try to make use of CPU-internal caches, if possible.

uneedme

6th April 2014, 15:23

Dreaming? ... Probably, yes: x265 is mostly assembler based because many routines are so extremely computing intense that optimizing them at the level of CPU dependent instructions, and parallelizing them with a rather tight synchronization, is necessary to encode at a satisfying speed.

It may be possible to write another HEVC encoder in Python (similar to writing an AVC encoder for CUDA or QuickSync). But that will certainly not be "x265". Furthermore, porting algorithms to specific execution environments may have its own complexity limits, and because interpreted or JIT-compiled languages (like Python) are never as fast as well optimized Assembler code, it will probably take much longer than x265 with the same number of threads and at the same encoding complexity.

In addition, the more threads, the less optimal the encoding may become, depending on the algorithm (either it is looking for abbreviatable redundancies only in each thread's frame part independently, or sharing with other threads – if that is possible at all – will make the threads wait for each other a lot more). And finally, if you let a "Cloud" execute several threads of one algorithm in possibly very distant places, sharing information between them will cost a lot of time, while I could imagine that x265 would even try to make use of CPU-internal caches, if possible.

o Thanks for your analysis,

assembler based...... that could be a problem...

extremely computing intense... that is why using cloud computing

CPU dependent instructions....
GAE allocates some free cpu resources for each account (10 units per account) and some storage spaces. If we got 1000 accounts deployed that would be 10000 units free usages. And We could design the uploaded-program with self-acknowledged mod. The given instructions is spreading to units within the google servers and self-immigration when eating up the current cpu unit resources and moving on... Unless exhusting all unit cpu time, it could still processing to the end.

All units almost could be called or acknowledged simultaneously, so the not optimized coding and long on-call time consuming could be neglected. (try to imaging some 10000 units work parallel)

the more threads, the less optimal the encoding may become.........
what of our consideration is to save the long long process time, not saving the utility bill or developing processing efficiency. The rather massive computing with a set of cpu units leads the program pending and dragged down to less 10 fps output, actually, there are so many "google" server resources running idly (or public cloud providers or GAE-like platforms) .

The true problems is how we recruit so many gae accounts as possible, and how to revise the x265 programm to suit the Distributed computing.

LigH

6th April 2014, 16:20

extremely computing intense... that is why using cloud computing

Well, massive parallelization only helps in certain cases: When the results of each calculation are mainly independent. But in HEVC, all results may depend on many other partial results across several frames during one GOP, all calculations probably need to share data among each other. This makes a distributed solution not very helpful, because they can't easily and – even more important – quickly exchange much data.

All units almost could be called or acknowledged simultaneously, so the not optimized coding and long on-call time consuming could be neglected. (try to imaging some 10000 units work parallel)

Not per-se helpful for HEVC, because:

the more threads, the less optimal the encoding may become

due to either a too much limited scope of motion estimation, or much overhead to share partial results among threads.

Except the calculation of a long movie gets partitioned into a distributed calculation of many parts of the movie. But that requires an optimal calculation of partitioning points first. In theory, already AVC was able to use data of very similar scenes before and after a brief inserted cut (not sure if it was measurably successful in practice).

uneedme

6th April 2014, 16:53

Well, massive parallelization only helps in certain cases: When the results of each calculation are mainly independent. But in HEVC, all results may depend on many other partial results across several frames during one GOP, all calculations probably need to share data among each other. This makes a distributed solution not very helpful, because they can't easily and – even more important – quickly exchange much data.

Not per-se helpful for HEVC, because:

due to either a too much limited scope of motion estimation, or much overhead to share partial results among threads.

Except the calculation of a long movie gets partitioned into a distributed calculation of many parts of the movie. But that requires an optimal calculation of partitioning points first. In theory, already AVC was able to use data of very similar scenes before and after a brief inserted cut (not sure if it was measurably successful in practice).

to set segmented fragments and then to collect and string up the segmented results...... o...

I wish my idea could be inspiring. And the good is it turned out to be a not all pointlessness. lol

Thanks anyway.

LigH

6th April 2014, 17:31

:o Well, sometimes I may sound pessimistic; I'll leave the inspiration to the real active developers (e.g. from Multicoreware). I'm merely a tester. :)

jackoneill

6th April 2014, 18:12

The new Vapoursynth is written in python language, so it might run on the GAE platform and using google CPU time and actualizing the cloud computing.

"The new Vapoursynth" is not written in Python. The core is written in C++, the included filters are written in C, C++, and some x86 assembly.

uneedme

7th April 2014, 07:49

"The new Vapoursynth" is not written in Python. The core is written in C++, the included filters are written in C, C++, and some x86 assembly.

o

to be precise, python adapted or python binded......

D3C0D3R

8th April 2014, 12:39

Strange, indeed. can you provide a few more details?

Nevermind. I turn off WIN_XP support on Win8 machine. But if turn it on (by default) - its ok.
Now that the reprogrammed Condition Variables (based on x264 code) are default because they are reportedly faster than the native kernel functions
Just curoius. Since project is cpp, why they didnt use C++11 standart synchronization primitives wrapped in macro and then change it when C-backport primetime?

Daemon404

8th April 2014, 19:04

Just curoius. Since project is cpp, why they didnt use C++11 standart synchronization primitives wrapped in macro and then change it when C-backport primetime?

Because the compilers they target do not all support C++11.

D3C0D3R

8th April 2014, 21:43

Which compilers they target havent support condition variables?

LigH

9th April 2014, 08:48

The support of Condition Variables is not directly related to the support of the C++11 language standard.

As far as I learned yesterday (I don't have much own experience with C and C++), modern C++ language variants (2011, 2014, 2017) tend to support "meta programming", leaving the decision about the specific implementation of features to the specific platform the compiler targets for. This may be convenient for some programmers trying to write portable code, but demands a lot from the specific compiler; you know, someone has to program the compiler first to make it able to adapt its strategy according to the C++11(ff.) standard. And it is doubtful that the result will have optimal performance in every case.

It is certainly possible to have a compiler without support for C++11 "primitives" still support Condition Variables, by explicitly programming them.

The decision to prefer kernel functions of Windows Vista+ in Windows builds broke compatibility to Windows XP (yes, a deprecated platform, but there are many people who keep it alive, maybe due to specific hardware and driver availability reasons, maybe due to some level of poverty ... don't judge them). The x264 project already used a source provided workaround before, this was adapted for the x265 project to avoid this specific kernel dependency. I could imagine that C++11 compliant compilers wouldn't care about XP compatibility anymore, either...

D3C0D3R

9th April 2014, 12:18

Both latest compilers GCC 4.8 and VC12 - support conditional variables.

I could imagine that C++11 compliant compilers wouldn't care about XP compatibility anymore, either...
Microsoft for sure will do so. Only yesterday they officially stop support XP. gcc in this case more tolerant to old platforms.

D3C0D3R

9th April 2014, 12:26

Recently i faced with another trouble on 32-bit system. I run x265 on different presets
result is same familiar window pop ups: Instruction "0x004e63f0" want address "0xffffffff" for "read".

--log=4. but nothing valuable. it fails before can encode any frame
x265 [info]: build info [Windows][GCC 4.8.2][32 bit] 8bpp
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64

Ok. Then i use --no-asm. Works perfectly.
Then using trial and error method i determined that this fail was caused --asm SSSE3.
I dont know how turn on debug info on to provide additional info.

Edit: more precise --asm 0x014003F - works. --asm 0x014007F - fails. So its SSSE3 for sure.

LigH

9th April 2014, 18:30

Now if we also knew the processor model and the exact x265 build (version+patch-githash), developers would know which patch probably introduced this issue, and if it is already known or still to be fixed...

A pity you cut the log at the wrong line.

D3C0D3R

9th April 2014, 18:46

Now if we also knew the processor model
I dont think its impotant because --asm parametr determines branch, that will be executed. Pentium E5200
the exact x265 build (version+patch-githash)
changeset: 6682:bdca492dc1d7
A pity you cut the log at the wrong line.
Uncutted.
x265_6682_asm --crf 20 --y4m --preset=medium --asm=0x014007F --ssim --log=4 --psnr --csv 1_20.csv --output 1_20.265 -
y4m [info]: 512x384 fps 25/1 i420 unknown frame count
x265 [info]: HEVC encoder version +-
x265 [info]: build info [Windows][GCC 4.8.2][32 bit] 8bpp
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x265 [info]: WPP streams / pool / frames : 6 / 2 / 1
x265 [info]: Main profile, Level-2.1 (Main tier)
x265 [info]: CU size : 64
x265 [info]: Max RQT depth inter / intra : 1 / 1
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 2
x265 [info]: Keyframe min / max / scenecut : 25 / 250 / 40
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb / refs: 1 / 1 / 0 / 3
x265 [info]: Rate Control / AQ-Strength / CUTree : CRF-20.0 / 1.0 / 1
x265 [info]: tools: rect amp rd=3 lft sao-lcu sign-hide

EDIT:Or better should send this directly into developers mailing lists?

LigH

9th April 2014, 18:52

x265 [info]: HEVC encoder version +-

Hmm, all the builds I've made so far with GCC printed their version number here. I could imagine that your x265.rc was not updated correctly.
__

P.S.:

Yes, the developer mailing list will be a good place to bring it quickly to attention. Forums are less frequently read.

The CPU type may be interesting because this issue seems to happen only on a few CPU families which do support SSSE3 (e.g. AMD Athlon/Phenom X# do not yet, or are excluded due to a slow implementation), but not yet SSE4 (Core i# and AMD FX+ will already).

D3C0D3R

9th April 2014, 18:59

Hmm, all the builds I've made so far with GCC printed their version number here. I could imagine that your x265.rc was not updated correctly.
You right.
./make-Makefiles.sh generates this:

FILEVERSION ,,0,
PRODUCTVERSION ,,0,

and i forced manually write numbers there, because otherwise it fails
x265/build/msys/x265.rc:4: syntax error
I have this troubles after 6105:0fcc87d05d10, IIRC.

LigH

9th April 2014, 19:41

Probably also an issue to investigate...

When was your last fresh clone? I'll try one right now.
__

Version info added in all required places:

#include <winresrc.h>

VS_VERSION_INFO VERSIONINFO
FILEVERSION 0,9,0,25
PRODUCTVERSION 0,9,0,25
# ...
BLOCK "StringFileInfo"
BEGIN
BLOCK "04090000"
BEGIN
VALUE "FileDescription", "HEVC video encoder"
VALUE "FileVersion", "0.9+25-bd987db26d5d"
VALUE "InternalName", "x265"
VALUE "LegalCopyright", "Multicoreware: GPLv2 or commercial"
VALUE "OriginalFilename", "libx265.dll"
VALUE "ProductName", "x265"
VALUE "ProductVersion", "0.9+25-bd987db26d5d"
END
END
BLOCK "VarFileInfo"
BEGIN
VALUE "Translation", 0x409, 1200
END
END

foxyshadis

10th April 2014, 01:56

The ASM crashes used to happen from time to time when an SSE4 instructions was included in the SSE2 branch. Hopefully Min Chen can take a look at it and see if something like that's the case here.

LigH

10th April 2014, 07:52

EDIT:Or better should send this directly into developers mailing lists?

You may have missed my later edit in my reply; so I sent a brief report on your behalf today.

raine

11th April 2014, 08:53

I just have a few quick questions regarding x265, I hope it's OK to ask them here.

Given the recent developments (hevc and opus), I'm planning to batch convert a huge film archive (6TB), mostly not-even-HD content encoded as H.264, using medium or slow preset via ffmpeg, with the hopes of shrinking the archive. I don't know how important psy-rd, trellis, intra-refresh etc. are.

Is it mature enough? (for an ordinary eye, for a home archive. I'm not Netflix or Hollywood so my requirements aren't that high :) )
Are there important improvements waiting in the queue? Should I start already or is it worth waiting?
What parameters do you suggest for encoding? I'm planning to go with -preset slow only, without using -x265-params, but if it turns out to be too slow, I can switch to medium with -x265-params me=star. (as far as I know, there are no -tune film or psy-rd based tunings yet).

I made a small experiment with a 1280x720 H264 video from my archive, using medium preset with crf=22. The resulting file was less than half of the original file with no visible (to me, a keen eye or PSNR/etc may differ) reduction in quality. So it seems it's definitely worth trouble. Even with a recent Haswell i7 processor, the video encoding (I used -c:a copy) was a bit slow (~14 FPS), though this is expected. I hope it will get faster in time though.

I will experiment with slow and slower and make a detailed post next time.

LigH

11th April 2014, 08:58

Try to estimate the energy costs caused by the required encoding time and decide if it is worth to pay your energy supplier for weeks or months of converting.

The faster you choose the preset, the less efficient a further compression will be. And it will certainly be lossy as well. As far as I remember, there is still some structure flattening. But that is the cost of efficient redundancy removal: Sometimes it is too optimistic in deciding what may be unimportant.

raine

11th April 2014, 09:24

Try to estimate the energy costs caused by the required encoding time and decide if it is worth to pay your energy supplier for weeks or months of converting.

The faster you choose the preset, the less efficient a further compression will be. And it will certainly be lossy as well. As far as I remember, there is still some structure flattening. But that is the cost of efficient redundancy removal: Sometimes it is too optimistic in deciding what may be unimportant.

Thanks for the quick response! The energy cost is OK, I have a low-TDP Haswell, I didn't even heard any fan noise during encoding. I also expect the loss, it's lossy -> lossy after all; as long as I don't realize it, it's OK. Maybe this translates to some lower CRF; I used 22.

LigH

11th April 2014, 09:39

@ D3C0D3R:

Can you provide a call stack of the crash to the developer mailing list? That would help finding the close range of the reason; the developers don't have the same hardware to check it.

But don't ask me how to do that... Let's hope that Windows will display a verbose crash dialog where you can extract details from. Crash address + your specific EXE may be a minimal anchor.

Well, maybe it can also be provoked with a better CPU by masking SSE4 out.

D3C0D3R

11th April 2014, 19:40

You may have missed my later edit in my reply; so I sent a brief report on your behalf today.

Thank you! I've have look at x265-devel yesterday, but they tried to fix something unrelated, because --n0-asm works.

I will try to retrive stacktrace, or some debug info.
Its obvious that bug-report instruction here http://www.videolan.org/developers/x265.html ./configure --enable-debug simply copy-pasted from x264 and should be updated.

I suppose CMAKE_BUILD_TYPE, CMAKE_CXX_FLAGS must be set to Debug.

D3C0D3R

11th April 2014, 21:57

Ok. I compile x265 with CMAKE_BUILD_TYPE = Debug.
And run this:
gdb --args x265_debug --crf 20 --input ffmpeg_g.exe --input-depth 8 --input-res 320x240 --fps 20 --preset=medium --asm=0x014007F --ssim --log=4 --output 1_1.265

Then type "run" in gdb console:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2916.0xf28]
0x00533950 in x265_patial_butterfly_inverse_internal_pass2_ssse3 ()
(gdb) bt
#0 0x00533950 in x265_patial_butterfly_inverse_internal_pass2_ssse3 ()
#1 0x00533b6e in x265_idct8_ssse3 ()
#2 0x0441f339 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) disass $pc-32,$pc+32
Dump of assembler code from 0x533930 to 0x533970:
0x00533930 <x265_patial_butterfly_inverse_internal_pass1_ssse3+368>: jb 0x533916 <x265_patial_butterfly_inverse_internal_pass1_ssse3+342>
0x00533932 <x265_patial_butterfly_inverse_internal_pass1_ssse3+370>: pop %es
0x00533933 <x265_patial_butterfly_inverse_internal_pass1_ssse3+371>: psubd %xmm5,%xmm1
0x00533937 <x265_patial_butterfly_inverse_internal_pass1_ssse3+375>: psrad $0x7,%xmm1
0x0053393c <x265_patial_butterfly_inverse_internal_pass1_ssse3+380>: packssdw %xmm1,%xmm4
0x00533940 <x265_patial_butterfly_inverse_internal_pass1_ssse3+384>: movq %xmm4,0x30(%edi)
0x00533945 <x265_patial_butterfly_inverse_internal_pass1_ssse3+389>: movhps %xmm4,0x40(%edi)
0x00533949 <x265_patial_butterfly_inverse_internal_pass1_ssse3+393>: ret
0x0053394a <x265_patial_butterfly_inverse_internal_pass1_ssse3+394>: nopw 0x0(%eax,%eax,1)
=> 0x00533950 <x265_patial_butterfly_inverse_internal_pass2_ssse3+0>: movdqa (%edi),%xmm0
0x00533954 <x265_patial_butterfly_inverse_internal_pass2_ssse3+4>: movdqa %xmm0,%xmm4
0x00533958 <x265_patial_butterfly_inverse_internal_pass2_ssse3+8>: pshufb 0x6a7950,%xmm4
0x00533961 <x265_patial_butterfly_inverse_internal_pass2_ssse3+17>: pmaddwd 0x6a7960,%xmm4
0x00533969 <x265_patial_butterfly_inverse_internal_pass2_ssse3+25>: phsubd %xmm4,%xmm5
0x0053396e <x265_patial_butterfly_inverse_internal_pass2_ssse3+30>: pshufd $0x4e,%xmm4,%xmm4
End of assembler dump.

eax 0xc55b90 12934032
ecx 0x163d900 23320832
edx 0x10 16
ebx 0x30 48
esp 0x10bf8c8 0x10bf8c8
ebp 0x623990 0x623990 <_ZSt16__convert_from_vRKPiPciPKcz+6437264>
esi 0x623970 6437232
edi 0x10bf8cc 17561804
eip 0x4e63f0 0x4e63f0 <x265_patial_butterfly_inverse_internal_pass2_ssse3>
eflags 0x10216 [ PF AF IF RF ]
cs 0x1b 27
ss 0x23 35
ds 0x23 35
es 0x23 35
fs 0x3b 59
gs 0x0 0

xmm0 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x31, 0xf5, 0xff, 0xff,
0x4c, 0x8, 0x0, 0x0, 0xb1, 0x3, 0x0, 0x0, 0xd7, 0x6, 0x0, 0x0}, v8_int16 = {0xf531, 0xffff, 0x84c, 0x0,
0x3b1, 0x0, 0x6d7, 0x0}, v4_int32 = {0xfffff531, 0x84c, 0x3b1, 0x6d7}, v2_int64 = {0x84cfffff531,
0x6d7000003b1}, uint128 = 0x000006d7000003b10000084cfffff531}
xmm1 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x8000000000000000}, v16_int8 = {0xd1, 0x4,
0x0, 0x0, 0xea, 0x0, 0x0, 0x0, 0x22, 0x4, 0x0, 0x0, 0xe5, 0xf6, 0xff, 0xff}, v8_int16 = {0x4d1, 0x0, 0xea,
0x0, 0x422, 0x0, 0xf6e5, 0xffff}, v4_int32 = {0x4d1, 0xea, 0x422, 0xfffff6e5}, v2_int64 = {0xea000004d1,
0xfffff6e500000422}, uint128 = 0xfffff6e500000422000000ea000004d1}
xmm2 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x8000000000000000, 0x8000000000000000},
v16_int8 = {0x80, 0x4, 0x80, 0x4, 0x80, 0xfb, 0x40, 0xf9, 0xc0, 0xfd, 0x0, 0x0, 0x0, 0x0, 0x80, 0xfb},
v8_int16 = {0x480, 0x480, 0xfb80, 0xf940, 0xfdc0, 0x0, 0x0, 0xfb80}, v4_int32 = {0x4800480, 0xf940fb80,
0xfdc0, 0xfb800000}, v2_int64 = {0xf940fb8004800480, 0xfb8000000000fdc0},
uint128 = 0xfb8000000000fdc0f940fb8004800480}
xmm3 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x8000000000000000, 0x0}, v16_int8 = {0xa7, 0xf8,
0xff, 0xff, 0xaf, 0xff, 0xff, 0xff, 0x74, 0xfe, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0xf8a7, 0xffff,
0xffaf, 0xffff, 0xfe74, 0xffff, 0x0, 0x0}, v4_int32 = {0xfffff8a7, 0xffffffaf, 0xfffffe74, 0x0}, v2_int64 = {
0xffffffaffffff8a7, 0xfffffe74}, uint128 = 0x00000000fffffe74ffffffaffffff8a7}
xmm4 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x8000000000000000, 0x8000000000000000},
v16_int8 = {0xdb, 0x6, 0x41, 0x4, 0x39, 0xfc, 0x1, 0xf9, 0xd1, 0x4, 0xea, 0x0, 0x22, 0x4, 0xe5, 0xf6},
v8_int16 = {0x6db, 0x441, 0xfc39, 0xf901, 0x4d1, 0xea, 0x422, 0xf6e5}, v4_int32 = {0x44106db, 0xf901fc39,
0xea04d1, 0xf6e50422}, v2_int64 = {0xf901fc39044106db, 0xf6e5042200ea04d1},
uint128 = 0xf6e5042200ea04d1f901fc39044106db}
xmm5 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x80, 0x82, 0x0, 0x0, 0xc0,
0xd5, 0x0, 0x0, 0xc0, 0x5, 0xfe, 0xff, 0x0, 0x87, 0x0, 0x0}, v8_int16 = {0x8280, 0x0, 0xd5c0, 0x0, 0x5c0,
0xfffe, 0x8700, 0x0}, v4_int32 = {0x8280, 0xd5c0, 0xfffe05c0, 0x8700}, v2_int64 = {0xd5c000008280,
0x8700fffe05c0}, uint128 = 0x00008700fffe05c00000d5c000008280}
xmm6 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0, 0x8, 0x0, 0x0, 0x0,
0x8, 0x0, 0x0, 0x0, 0x8, 0x0, 0x0, 0x0, 0x8, 0x0, 0x0}, v8_int16 = {0x800, 0x0, 0x800, 0x0, 0x800, 0x0,
0x800, 0x0}, v4_int32 = {0x800, 0x800, 0x800, 0x800}, v2_int64 = {0x80000000800, 0x80000000800},
uint128 = 0x00000800000008000000080000000800}
xmm7 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x8000000000000000, 0x8000000000000000},
v16_int8 = {0x5e, 0xfe, 0x62, 0xf5, 0x27, 0xfc, 0x7e, 0xff, 0xf7, 0xf5, 0xf3, 0x9, 0x0, 0xf6, 0x9d, 0xfe},
v8_int16 = {0xfe5e, 0xf562, 0xfc27, 0xff7e, 0xf5f7, 0x9f3, 0xf600, 0xfe9d}, v4_int32 = {0xf562fe5e,
0xff7efc27, 0x9f3f5f7, 0xfe9df600}, v2_int64 = {0xff7efc27f562fe5e, 0xfe9df60009f3f5f7},
uint128 = 0xfe9df60009f3f5f7ff7efc27f562fe5e}

LigH

11th April 2014, 22:17

I hope this function name already helps narrowing the search. If not, the developers may need your ffmpeg_g.exe together with this result.

D3C0D3R

12th April 2014, 13:57

the developers may need your ffmpeg_g.exe together with this result.

Post updated. Failure is _completly_ independent from yuv-input.

I look for some public-avaible builds. one which i found ran under 32-bit xp, from x265.ru (8-bit 0.9+29). And it fails exact way, at exact movdqa, only functions names unavailble in debugger, because exe without debug-info.

Sorry, now for me convenient to reply here instead mailing lists.

LigH

12th April 2014, 14:01

They are already on a track. IIUC, here is an instruction that may lead to a crash on SSSE3 only CPUs when the stack is not aligned. Gosh, how much do you need to know about it to discover that?!

smegolas

12th April 2014, 18:36

I just have a few quick questions regarding x265, I hope it's OK to ask them here.

Given the recent developments (hevc and opus), I'm planning to batch convert a huge film archive (6TB), mostly not-even-HD content encoded as H.264, using medium or slow preset via ffmpeg, with the hopes of shrinking the archive. I don't know how important psy-rd, trellis, intra-refresh etc. are.

Is it mature enough? (for an ordinary eye, for a home archive. I'm not Netflix or Hollywood so my requirements aren't that high :) )
Are there important improvements waiting in the queue? Should I start already or is it worth waiting?
What parameters do you suggest for encoding? I'm planning to go with -preset slow only, without using -x265-params, but if it turns out to be too slow, I can switch to medium with -x265-params me=star. (as far as I know, there are no -tune film or psy-rd based tunings yet).

I made a small experiment with a 1280x720 H264 video from my archive, using medium preset with crf=22. The resulting file was less than half of the original file with no visible (to me, a keen eye or PSNR/etc may differ) reduction in quality. So it seems it's definitely worth trouble. Even with a recent Haswell i7 processor, the video encoding (I used -c:a copy) was a bit slow (~14 FPS), though this is expected. I hope it will get faster in time though.

I will experiment with slow and slower and make a detailed post next time.

No, this would be not a good idea.

x265 is not mature enough yet. It is still in beta. Everybody complains about blurring with live action content. Witness how much x264 improved since the first year. Will be same with x265.

There are no hardware decoders so we don't know what 'correct' settings are to ensure widespread compatiblity. (eg HEVC equivalent of H.264 HP@L4.1)

6TB isn't really a huge collection. Pretty soon it will fit on one disk. If you have run out of space just buy more storage, it would be cheaper than your electricity bill for encoding that many movies.

x265_Project

13th April 2014, 23:43

We were at the NAB show in Las Vegas last week, giving demonstrations to many of the companies attending, showing x265 encodes of popular video sequences side by side with x264 encodes (the gold standard for quality today). Attendees were blown away by the quality of x265. We demonstrated 2 streams played back in sync, showing the middle 50% of two 4K clips on a 4K monitor. Here is a photo of our demo... http://x265.org/NAB2014_CrowdRun4K.jpg

Sample frame grabs from Crowd_Run4K50 @ 4 Mbps are here... http://x265.org/CrowdRun4K_4Mbps.7z
You can repeat these tests yourself, using the SVT clips from http://media.xiph.org/video/derf/ - Crowd Run, Old Town Cross, Ducks Take Off, In To Tree.... take your pick.

H.265 provides for much higher video coding accuracy. The thing you will notice is that x265 encodes don't have all of the temporal artifacts of H.264. x265 is typically winning comparisons against x264 even at half the bit rate. There were cases where x265 was preferred even at 1/4 the bit rate
(Crowd Run 4K - x265 veryslow 4 Mbps vs x264 veryslow @ 16 Mbps), due to the lack of temporal artifacts and macroblocking in x265.

Blue_MiSfit

14th April 2014, 05:16

I wish I had stopped by your booth :)

This is an impressive demonstration, but I think x265 is still too slow for any real use - especially for 4k. I haven't looked at alternatives yet (the glacial broadcast industry is still a long ways away from HEVC), but is there hope for significant speedup or are we already to the point where further ASM optimizations could give only 10-20% more?

To amend the above, I have looked at Elemental Live. Their HEVC encoder is supposedly real-time for 4kp30 using a single server, and 4kp60 using two servers strapped together. This is of course using scads of very powerful and expensive GPUs, but would the same thing theoretically be possible with today's technology using CPU-only encoders like x265? Let's pretend we have a beastly quad-socket fire breathing 48 core monster at our disposal.. :devil:

Thanks,
Derek

x265_Project

14th April 2014, 06:14

I wish I had stopped by your booth :)

This is an impressive demonstration, but I think x265 is still too slow for any real use - especially for 4k. I haven't looked at alternatives yet (the glacial broadcast industry is still a long ways away from HEVC), but is there hope for significant speedup or are we already to the point where further ASM optimizations could give only 10-20% more?

To amend the above, I have looked at Elemental Live. Their HEVC encoder is supposedly real-time for 4kp30 using a single server, and 4kp60 using two servers strapped together. This is of course using scads of very powerful and expensive GPUs, but would the same thing theoretically be possible with today's technology using CPU-only encoders like x265? Let's pretend we have a beastly quad-socket fire breathing 48 core monster at our disposal.. :devil:

Thanks,
Derek
We had a private suite. We'll look at getting a booth next year.

It's true that you don't get these significant improvements in video compression efficiency for free. HEVC requires substantially higher compute resources for a given performance level. x265 can scale performance nicely on multiple socket machines, and we will continue to optimize for our studio / broadcast customers that use these systems.

A number of current and pending x265 licensees are building real-time encoders. One was demonstrating real-time 4K encoding based on x265, using multiple processors (dividing the incoming stream into segments). It looked great!

Procrastinating

15th April 2014, 07:57

Is the 1.0 release intended to be the one which is "fit for public consumption", or will it be progression as usual?

Kurtnoise

15th April 2014, 12:22

https://bitbucket.org/multicoreware/x265/wiki/RoadMap
https://bitbucket.org/multicoreware/x265/wiki/TODO

raine

15th April 2014, 15:20

No, this would be not a good idea.

x265 is not mature enough yet. It is still in beta. Everybody complains about blurring with live action content. Witness how much x264 improved since the first year. Will be same with x265.

There are no hardware decoders so we don't know what 'correct' settings are to ensure widespread compatiblity. (eg HEVC equivalent of H.264 HP@L4.1)

6TB isn't really a huge collection. Pretty soon it will fit on one disk. If you have run out of space just buy more storage, it would be cheaper than your electricity bill for encoding that many movies.
Thanks so much! I think I will wait until the stable 1.0 release then (assuming all items in Quality/Efficiency TODOs are cleared).
However, I haven't seen any blurring so far (using x265 0.9, ffmpeg 2.2.1) with crf through 22 to 26. Is this an issue with the older versions?

As I mentioned above, I have a low-TDP Haswell CPU. The power supply itself is 80W (comparable to an old-fashioned bulb), so there is really no issue regarding the electricity usage. When I say Haswell, people seem to think of a big ATX machine with 800W power supply :)

And BTW, on average, I found that x265 version is 40-60% of the original file. For HD content, the encoding rate is around 10fps whereas for low-definition videos (360p or even less) it is around 40fps. I couldn't see any visual loss using crf 22-24, and it is very difficult for me to tell them apart with crf=26.

In my case, buying more storage is not a problem, but having external harddisks floating around is; I'm trying to minimize the number of harddisks, while still having space to add more movies. And yes, I'm planning to buy a 6TB harddisk :)

mandarinka

15th April 2014, 15:36

I don't think recompressing video collections was ever a good idea.
You will kill a great deal of the content, and what is really the gain? /are you planning to do this each five years when a new scheme or improved encoders are available?/

BTW, about that "1.0". It isn't going to be a particularly significant, complete, "stable" or how you want to call it milestone. It looks like it will be more of a normal, scheduled release like the ones before it.

See on the mailing list (https://mailman.videolan.org/pipermail/x265-devel/2014-April/004128.html).

x265_Project

16th April 2014, 00:54

BTW, about that "1.0". It isn't going to be a particularly significant, complete, "stable" or how you want to call it milestone. It looks like it will be more of a normal, scheduled release like the ones before it.
See on the mailing list (https://mailman.videolan.org/pipermail/x265-devel/2014-April/004128.html).
Well, I'm not going to argue with Steve, but I should point out that x265 has matured a great deal since we started the project, and it's already shipping with a number of commercial solutions.

The 1.0 milestone is well timed. We did map out these milestones early on with a plan to reach 1.0 with a fairly feature-complete and fully usable implementation, and we're on track for that.

phate89

16th April 2014, 03:01

Thanks so much! I think I will wait until the stable 1.0 release then (assuming all items in Quality/Efficiency TODOs are cleared).
However, I haven't seen any blurring so far (using x265 0.9, ffmpeg 2.2.1) with crf through 22 to 26. Is this an issue with the older versions?

As I mentioned above, I have a low-TDP Haswell CPU. The power supply itself is 80W (comparable to an old-fashioned bulb), so there is really no issue regarding the electricity usage. When I say Haswell, people seem to think of a big ATX machine with 800W power supply :)

And BTW, on average, I found that x265 version is 40-60% of the original file. For HD content, the encoding rate is around 10fps whereas for low-definition videos (360p or even less) it is around 40fps. I couldn't see any visual loss using crf 22-24, and it is very difficult for me to tell them apart with crf=26.

In my case, buying more storage is not a problem, but having external harddisks floating around is; I'm trying to minimize the number of harddisks, while still having space to add more movies. And yes, I'm planning to buy a 6TB harddisk :)
The problem of the energy needed for that amount of transcoding is still there because your cpu might need less energy but it will take more time. It's actually possible that a hungry cpu will get better results in power consumption because it could have a better cpu computing power/electric power ratio:
http://www.cpubenchmark.net/power_performance.html
and if you have to encode enough files it will probably costs more the energy than a new disk.
Btw even if the cost isn't a problem to get the advantage of a disk less and you still want to do it even knowing that a good part of the quality will go away and you will not be able to do it again because the loss would be too much (2 low bitrate compressions even with the better encoder in the world give terrible results) it's still suggested to wait at least until x265 has all the features of x264.
Because if for you the quality is ok now the next year you will get the same quality with 5/10/20/25% less bitrate (there's still room for a lot of improvements especially with psy-rdo) and consuming 5/10/20/25% less power (because it will be faster and more optimized)...

raine

16th April 2014, 04:31

I don't think recompressing video collections was ever a good idea.
You will kill a great deal of the content, and what is really the gain? /are you planning to do this each five years when a new scheme or improved encoders are available?/

BTW, about that "1.0". It isn't going to be a particularly significant, complete, "stable" or how you want to call it milestone. It looks like it will be more of a normal, scheduled release like the ones before it.

See on the mailing list (https://mailman.videolan.org/pipermail/x265-devel/2014-April/004128.html).

Wow, thanks for the friendly reply.

The gain is the reduction in the file size 40-60% (in certain non-trivial cases I got to 25%, such as a 2.5 hrs of 640x480 divx converted via x265 @crf=26; this appears to be the case with old wmv and divx 360p-480p videos), and actually, everything I have can be easily found somewhere on the net. Yes, I'm familiar with the downsides of lossy -> lossy transcoding. My point is, I'm apparently not killing a great deal of content. If the extra loss is not visible to my eyes, I'm OK with that.

As a matter of fact, this is the first time I'm going to do this. Among other things, I have ancient low-definition MPEG videos (created long before x264 even existed) that are totally wasting space.

And no, I probably won't transcode them again with x269.

The problem of the energy needed for that amount of transcoding is still there because your cpu might need less energy but it will take more time. It's actually possible that a hungry cpu will get better results in power consumption because it could have a better cpu computing power/electric power ratio:
http://www.cpubenchmark.net/power_performance.html
and if you have to encode enough files it will probably costs more the energy than a new disk.
Btw even if the cost isn't a problem to get the advantage of a disk less and you still want to do it even knowing that a good part of the quality will go away and you will not be able to do it again because the loss would be too much (2 low bitrate compressions even with the better encoder in the world give terrible results) it's still suggested to wait at least until x265 has all the features of x264.
Because if for you the quality is ok now the next year you will get the same quality with 5/10/20/25% less bitrate (there's still room for a lot of improvements especially with psy-rdo) and consuming 5/10/20/25% less power (because it will be faster and more optimized)...
Luckily, operational speed per watt is not a linear function, and more importantly, encoding FPS numbers I have look actually good (I know, they will look much better with a 4770K, but I'm OK being 10% slower at half TDP).

And thanks a lot, this is the kind of stuff I wanted to know about. x264 feature-parity might be too much, but I will wait at least until we have psy-rdo.

fumoffu

16th April 2014, 15:54

If the source is good quality and sharp the blurring usually is very tolerable and not that noticeable. The problem is much more visible if you have source encoded for example with not the best h264 encoder (pretty much all encoders except x264 including hardware solutions) - now the things that were not very detailed are becoming completely flat.
Also I'm surprised that you get good results at such low resolutions - I would think that x265 would be better than x264 mainly in HD resolution since from what I understand the main improvements are bigger CU and better motion search. Have you tried re-encoding those movies with x264 using good-high settings? Maybe the results would be comparable?

benwaggoner

16th April 2014, 17:46

If the source is good quality and sharp the blurring usually is very tolerable and not that noticeable. The problem is much more visible if you have source encoded for example with not the best h264 encoder (pretty much all encoders except x264 including hardware solutions) - now the things that were not very detailed are becoming completely flat.
Also I'm surprised that you get good results at such low resolutions - I would thing that x265 would be better than x264 mainly in HD resolution since from what I understand the main improvements are bigger CU and better motion search. Have you tried re-encoding those movies with x264 using good-high settings? Maybe the results would be comparable?
HEVC has a bigger differential advantage over H.264 at high frame sizes (particularly >1080p), but there's lots of good stuff in there that will help significantly at every frame size.

I wouldn't be surprised if a x265 as well-tuned as x264 is today might only need 40% the bitrate at 4K resolutions and 50% at lower resolutions to hit the same subjective quality.

LigH

16th April 2014, 19:32

Oh, Ben, you just cut the cake I put into the oven today!

I was not yet able to encode 4K video because it takes so much time (and a Phenom-II X4 is the best I can spend here, and x265 does not even try to use SSE3 instructions on it)... but some 1080p test samples already tell an interesting tale:

I encoded some 10 s clips from Derf's Xiph media archive (crowd_run; ducks_take_off; in_to_tree) with x265 --crf 30 --preset veryslow and also with x264 in 2-pass mode to get

the same bitrate
the double bitrate
the triple bitrate
and even the quadruple bitrate in one case

and would like to invite (https://www.mediafire.com/folder/6lfp2jlygogwa/HEVC) you all to compare and rate.

When I am able to use a better CPU, I may even try the 2160p versions...

sKRUVEN

16th April 2014, 20:43

Oh, Ben, you just cut the cake I put into the oven today!

I was not yet able to encode 4K video because it takes so much time (and a Phenom-II X4 is the best I can spend here, and x265 does not even try to use SSE3 instructions on it)... but some 1080p test samples already tell an interesting tale:

I encoded some 10 s clips from Derf's Xiph media archive (crowd_run; ducks_take_off; in_to_tree) with x265 --crf 30 --preset veryslow and also with x264 in 2-pass mode to get

the same bitrate
the double bitrate
the triple bitrate
and even the quadruple bitrate in one case

and would like to invite (https://www.mediafire.com/folder/6lfp2jlygogwa/HEVC) you all to compare and rate.

When I am able to use a better CPU, I may even try the 2160p versions...
I'm currently doing the same thing but in 2160p (all five sequences from SVT_MultiFormat_2160p50 but slowed down to 25p). I've got a 4770k so i can use avx2 so the speed is not that bad. But I ran into some problems that maybe someone here can sort out. I'm trying to do some visual comparisons between x264 and x265 and for some reason x265 (crf22 ~48Mbps slow preset) is looking worse then x264 (vbr2pass 50Mbps slow preset)

I use avs4x265 (with the 0.9+53 8bpp build from x265.ru) then remux it with mp4box
avs4x265.exe --crf 22 --preset slow -o output.hevc input.avs
mp4box.exe -add output.hevc#trackID=1:fps=25.0 -new output.mp4

I use MPC-BE, Lav filer, madvr for playback (all up to date) and use the screen grab command (alt-i) to export frames for comparison. Anyone know why the x265 version is less detailed? The playback is not smooth either, anyone know a better decoder for 4k x265 material or is not possible yet for 4k?

benwaggoner

16th April 2014, 21:21

But I run into some problems that maybe someone here can sort out. I'm trying to do some visual comparisons between x264 and x265 and for some reason x265 (crf22 ~48Mbps slow preset) is looking worse then x264 (vbr2pass 50Mbps slow preset)?
You're not doing any perceptual tuning in your setting, so it'll default to tuning for PSNR, while x264 will default to tuning for Rate Factor.

At a minimum, you should use --tune ssim. --aq-mode 2 and --rdpenalty 1 may help as well. I'm not sure how frame size and content-dependent they are.

sKRUVEN

16th April 2014, 22:09

You're not doing any perceptual tuning in your setting, so it'll default to tuning for PSNR, while x264 will default to tuning for Rate Factor.

At a minimum, you should use --tune ssim. --aq-mode 2 and --rdpenalty 1 may help as well. I'm not sure how frame size and content-dependent they are.
Oh, that might be it, the source contains alot of film grain and the x265 version looks denoised so there is less grain and detail compared to the x264 one. Will try those tweeks and see if it helps to retain the detail.

Edit. That did nothing, the one with --tune ssim and --aq-mode 2 look exactly the same as the old one. Don't even know if it worked or that it default ssim since the new file has the exact same file size.

Atak_Snajpera

16th April 2014, 22:34

--tune film?