Log in

View Full Version : x264 development


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

martinfrombern
26th September 2009, 12:29
ETA, for weighted p-frames?

Chengbin
26th September 2009, 14:20
ETA, for weighted p-frames?

Be patient.

It is probably already done, just at the stage of intense testing.

burfadel
26th September 2009, 14:29
Weighted B-frames work just fine with MB-tree right now.

Oops my mistake :)

martinfrombern
26th September 2009, 15:44
It is probably already done, just at the stage of intense testing.My intention is not to rush anybody, just wondering an estimate of when could it be released.
Of course it's better to test it before going public :)

G_M_C
26th September 2009, 15:57
Be patient.

It is probably already done, just at the stage of intense testing.

seeing all the post about it, its also at the state of intense expectations ;)

kemuri-_9
26th September 2009, 16:17
afaik, the GSoC student who did the weightp project has become very busy with other matters and as such there is no set time for it to get committed to the repository

Sagekilla
26th September 2009, 18:03
afaik, the GSoC student who did the weightp project has become very busy with other matters and as such there is no set time for it to get committed to the repository

That aside, isn't Dark more or less taking over all those students who dropped their projects anyway? IIRC, the student who was working on MBTree dropped it, so Dark took it over.

Chengbin
26th September 2009, 20:47
That aside, isn't Dark more or less taking over all those students who dropped their projects anyway? IIRC, the student who was working on MBTree dropped it, so Dark took it over.

Well, when they drop it, they don't get paid.

The student working on weight-p will still finish it, even though SoC is over. After all he IS getting $4500 to do this project, he should finish it.

moviefan
27th September 2009, 00:19
Is there at least any progress with the nal-hrd patch? It's ashame that Blu-ray compliance is not possible with the current revision... :(

G_M_C
1st October 2009, 08:08
It's now 7 days since the last commit on the GIT shortlog. I'm getting an anxious feeling Dark Shikari, Akupenguin and others are preparing for a big surprise and/or big bumb of the build number.

DS, can lift a bit of the veil without spoiling the surprise ... just a teaser of what's to come so to speak ?

;)

Dark Shikari
1st October 2009, 09:19
It's now 7 days since the last commit on the GIT shortlog. I'm getting an anxious feeling Dark Shikari, Akupenguin and others are preparing for a big surprise and/or big bumb of the build number.

DS, can lift a bit of the veil without spoiling the surprise ... just a teaser of what's to come so to speak ?

;)Well, at the moment, I'm getting a whole lot less crappy at Scarlet Weather Rhapsody (http://en.wikipedia.org/wiki/Scarlet_Weather_Rhapsody). I'm also finally getting around to watching Akagi (http://en.wikipedia.org/wiki/Akagi_(manga)). I played a pretty awesome game of Civilization 4: Beyond the Sword (http://en.wikipedia.org/wiki/Civilization_IV:_Beyond_the_Sword) where I got a Space Race victory in ~1850. I also upgraded my Touhou merchandise collection with one of these (http://i36.tinypic.com/v3288y.jpg). And I finally ordered a new laptop (http://www.dell.com/us/en/home/notebooks/laptop-studio-xps-16/pd.aspx?refid=laptop-studio-xps-16&s=dhs&cs=19) now that the Core i7 mobiles came out.

Wait, you mean x264? There were some intra-prediction-related optimizations that didn't pan out. And a media streaming company that wishes to remain anonymous is sponsoring some low-latency streaming optimizations for x264. But nothing important... :p

juGGaKNot
1st October 2009, 10:34
But nothing important.

cool, finaly working on 4:4:4!

G_M_C
1st October 2009, 15:05
Well, at the moment, I'm getting a whole lot less crappy at Scarlet Weather Rhapsody (http://en.wikipedia.org/wiki/Scarlet_Weather_Rhapsody). I'm also finally getting around to watching Akagi (http://en.wikipedia.org/wiki/Akagi_(manga)). I played a pretty awesome game of Civilization 4: Beyond the Sword (http://en.wikipedia.org/wiki/Civilization_IV:_Beyond_the_Sword) where I got a Space Race victory in ~1850. I also upgraded my Touhou merchandise collection with one of these (http://i36.tinypic.com/v3288y.jpg). And I finally ordered a new laptop (http://www.dell.com/us/en/home/notebooks/laptop-studio-xps-16/pd.aspx?refid=laptop-studio-xps-16&s=dhs&cs=19) now that the Core i7 mobiles came out.

Wait, you mean x264? There were some intra-prediction-related optimizations that didn't pan out. And a media streaming company that wishes to remain anonymous is sponsoring some low-latency streaming optimizations for x264. But nothing important... :p

Looks like a week well spent :cool:

iwod
2nd October 2009, 09:58
@ DS
What do you think of the New Nvidia Chip? Any chance X264 getting something out of it?

LigH
2nd October 2009, 10:10
The advantage of x264 (from the quality PoV) is just that it calculates the compressed content itself, instead of leaving that completely to an external engine it cannot control as detailed as necessary.

If a software will use a GPU internal compressor for AVC video, it won't be x264 anymore. And if it only uses marginal basic functions of video conversion, then it won't gain much speed -- the slowest part of the AVC compression is the intelligence to decide what kind and how much on which part of the video, and that can't be delegated to the outside easily.

At least it would be a challenge to port motion prediction and quantization adaption - in the complexity of x264 routines - to GPU shaders... :D

benwaggoner
2nd October 2009, 19:09
If a software will use a GPU internal compressor for AVC video, it won't be x264 anymore. And if it only uses marginal basic functions of video conversion, then it won't gain much speed -- the slowest part of the AVC compression is the intelligence to decide what kind and how much on which part of the video, and that can't be delegated to the outside easily.
Yep, doing anything with refinement loops where the entire frame needs to be analyzed is pretty tricky to parallelize on a GPU. I imagine a trellis implementation would require slicing, for example, as would adaptive quantization.

Initially, I think GPUs will be most valuable for "good enough, faster." Particularly as we look at technologies like Smooth Streaming where 8-12 different versions of the same content need to be encoded live simultaneously, ideally in a single rack-mount box.

The biggest potential for quality gains in GPU, as I said before, is really in source decode and preprocessing, where things are much more determinant. ATSC/DVB stream reencoding is just exploding as a market, so a good deblocking MPEG-2 decoder would be very valuable, as would be high-quality resizing, noise reduction and deinterlacing/IVTC.

For those kinds of tasks, a GPU could do AVISynth like quality with real-time HD perf a lot easier than they could do x264 like quality, since the algorithmic mix is MUCH more tilted towards parallelizable DSP functions.

So, for those who want to see GPUs speed up x264 workflows, getting AVISynth and your favorite filters using the GPU is probably much lower hanging fruit. After all, all the CPU power saved from that goes straight to x264, and there's plenty of times where decode or preprocessing is the long pole anyway, particularly on 8-way machines.

Dark Shikari
2nd October 2009, 19:14
Yep, doing anything with refinement loops where the entire frame needs to be analyzed is pretty tricky to parallelize on a GPU. I imagine a trellis implementation would require slicing, for example, as would adaptive quantization.Trellis is impossible on a GPU. You'd need to have each thread warp do a single trellis, which would be absurd.

AQ is trivial.

Blue_MiSfit
2nd October 2009, 19:27
So, for those who want to see GPUs speed up x264 workflows, getting AVISynth and your favorite filters using the GPU is probably much lower hanging fruit. After all, all the CPU power saved from that goes straight to x264, and there's plenty of times where decode or preprocessing is the long pole anyway, particularly on 8-way machines.


fft3dgpu ;) - this sucker came out way back in '05 if memory serves. Now all we need is a good way to pass all this along via remote desktop in a way that doesn't break everything. Am I wrong in thinking that Windows 7 can do this?

If I could get MDegrain2 "for free" on the GPU, I'd be disgustingly happy.

But yes, hear hear - lots of the time on my 8 core machines, the decode / preprocess is the bottleneck - especially when using DGDecode or QuickTime. Single threaded decoding sucks :( The solution for now is to run as many encodes in parallel until I've saturated the CPUs or I/O.

~MiSfit

benwaggoner
3rd October 2009, 01:56
fft3dgpu ;) - this sucker came out way back in '05 if memory serves. Now all we need is a good way to pass all this along via remote desktop in a way that doesn't break everything. Am I wrong in thinking that Windows 7 can do this?
Well, DirectX 11 certainly allows pixel shaders to pass frames back for further processing if thats what you mean. There's nothing architecturally challenging about doing this, but it's a whole lot of code that needs to get written.

If I could get MDegrain2 "for free" on the GPU, I'd be disgustingly happy.


But yes, hear hear - lots of the time on my 8 core machines, the decode / preprocess is the bottleneck - especially when using DGDecode or QuickTime. Single threaded decoding sucks :( The solution for now is to run as many encodes in parallel until I've saturated the CPUs or I/O.
Exactly. Only speeding up the bottlenecks actually speed up the end-to-end process. Windows 7 is doing some pretty amazing stuff with decode in both software and hardware, including a good bob deinterlace; maybe we need a MediaFoundationSource for AVISynth :).

The dream would be able to have bitstream to final YV12 all happen on GPU, leaving the CPU wide open for compression.

I could imagine some preanalysis stuff being done on GPU that could help x264. Delivering a good luma histogram for determining fade correction, for example. That should be trivally parallelizable and wouldn't need any feedback from the encoder.

There has been some interesting research on reusing source motion vectors as a hint for a transcode. It didn't help quality in the end, but was useful improving speed, as it basically served as an initial coarse motion search to be refined by the real encoer.

Blue_MiSfit
3rd October 2009, 02:19
Well, DirectX 11 certainly allows pixel shaders to pass frames back for further processing if thats what you mean. There's nothing architecturally challenging about doing this, but it's a whole lot of code that needs to get written


I mean running anything that uses DirectX (or any hardware acceleration, actually) across remote desktop without totally breaking things. If you run fft3dgpu on a remote server, and RDP into it, things die :(

~MiSfit

SeeManRun
3rd October 2009, 15:20
Hi Guys,
Trying to figure something out. I use the profile x264 HQ Slowest for my encodes, and I can barely tell the diff between the encode and a blu-ray movie, which is what I am going for. The problem is it takes a very long time to perform the encode, about 10 fps on first pass and 4 fps on second pass with quad core machine (q6600). I have just upgraded from x264 1183 to 1259, and now the second pass runs at nearly 12 fps, which is awesome. But I am curious as to what has been changed to give it such a speed boost and am wondering if the quality has been sacrificed.

Does someone know of the change that gave this crazy speed boost?

RunningSkittle
3rd October 2009, 19:40
@ SeeManRun: http://git.videolan.org/?p=x264.git;a=shortlog;h=496d79dfcb90066f0254e07d593471f2c885a153

me7
3rd October 2009, 23:29
What happened to www.x264.nl ?

LoRd_MuldeR
3rd October 2009, 23:32
What happened to www.x264.nl ?

http://forum.doom9.org/showthread.php?t=149949

http://forum.doom9.org/showthread.php?t=149962

:search:

SeeManRun
6th October 2009, 14:15
@ SeeManRun: http://git.videolan.org/?p=x264.git;a=shortlog;h=496d79dfcb90066f0254e07d593471f2c885a153

Thanks, that is perfect. I see a couple CL's that could be this speed increase, but the performance has been dramatic, so whichever one, good work to the x264 devs!

nm
6th October 2009, 14:25
Thanks, that is perfect. I see a couple CL's that could be this speed increase, but the performance has been dramatic, so whichever one, good work to the x264 devs!
Your 3x speed increase is probably not caused by improvements in x264 but because something else changed at your end or your old x264 build was broken.

dstln
6th October 2009, 15:18
I'm not sure what x264-hq-slowest is, but judging things on your cpu and fps, it sounds to me like it was more something on the order of having tesa on previously and umh on now.

DarkZell666
6th October 2009, 15:26
I'm not sure what x264-hq-slowest is, but judging things on your cpu and fps, it sounds to me like it was more something on the order of having tesa on previously and umh on now.

It's an MeGUI preset, and AFAIK they've been converted for other GUI's lying around here, so it could be any of them ...
x264's cli options have been heavily modified on 07/07/09, so it could be the profile's option conflicts with x264's new defaults, if they weren't updated (or something along those lines, get the idea ?).

@SeeManRun, would you mind posting the x264 command line generated by MeGUI or <insert_your_GUI_here> with that preset so we can check the settings ?

juGGaKNot
12th October 2009, 12:10
Constrained intra prediction support, enable with --constrained-intra. Significantly reduces compression, but required for the base layer of SVC encodes and maybe some other use-cases.

Required for what ?

LoRd_MuldeR
12th October 2009, 12:11
Constrained intra prediction support, enable with --constrained-intra. Significantly reduces compression, but required for the base layer of SVC encodes and maybe some other use-cases.
Required for what ?

http://en.wikipedia.org/wiki/Scalable_Video_Coding

VFR maniac
12th October 2009, 17:28
Bug report.

Rev1283 crashes without B-frames.

rack04
12th October 2009, 17:31
Bug report.

Rev1283 crashes without B-frames.

That would probably be the reason for my crash. Reported here (http://forum.doom9.org/showthread.php?p=1333951#post1333951).

pcordes
16th October 2009, 03:39
I was looking over the git commits after updating my x264 repo (some people follow sports. I follow open source develpment for entertainment :).

I noticed something that could be improved on in commit 60c630c01dd4cb125a2bcc08da88873ce3a41dbc
(Optimize exp2fix8
Slightly faster and more accurate rounding.)

int i = foo + 512.5;
isn't as good as the C99 rounding functions which can be implemented more efficiently with one of the newer SSE float->int instructions.
int i = lrint(foo + 512);
may generate better code (with -ffast-math, so lrint is inlined). Although I haven't checked what the asm looks like in either case...

see the comments on
http://assemblyrequired.crashworks.org/2009/01/12/why-you-should-never-cast-floats-to-ints/

I tend to use lrint() for all my float->int conversions these days. On projects where I'm limited by MSVC compat requirements, I use some ugly code that prob. doesn't actually generate very good code with MSVC. But at least I can still write lrint(f) instead of (int)(f+0.5).


#ifdef _MSC_VER
// Windows math.h doesn't include C99 standard rounding functions :(
// workaround from https://svn.boost.org/trac/boost/ticket/2513
#if _MSC_VER < 1400
extern "C" { const double rint(double); }
#else
static inline double rint(double x) {
_asm FLD [x] ;
_asm FRNDINT ;
//_asm RET ;
}
#endif
static inline long lrint(double x){ return static_cast<long>(rint(x)); }
#endif


FYI, C99 has other useful rounding functions (although I don't know which of them are fast):
http://www.gnu.org/software/libc/manual/html_node/Rounding-Functions.html
http://linux.die.net/man/3/lrint

straying a little further off topic, see also the references at the bottom of
http://www.digitalmars.com/d/2.0/d-floating-point.html
for general floating-point goodness.

akupenguin
16th October 2009, 10:26
lrintf compiles to cvtss2si. (int) compiles to cvttss2si. No difference in speed or precision.

Dark Shikari
16th October 2009, 11:09
On GCC 3.4.5, lrintf compiles to:

29a: f3 0f 11 44 24 1c movss [esp+0x1c],xmm0
2a0: d9 44 24 1c fld dword[esp+0x1c]
2a4: db 5c 24 3c fistp dword[esp+0x3c]

Oh dear. :p

me7
18th October 2009, 17:20
What about weightp? Has it been put on hold or are you still waiting for the SoC student?
Don't mean to rush anyone, just asking for an update.

LoRd_MuldeR
18th October 2009, 17:25
What about weightp? Has it been put on hold or are you still waiting for the SoC student?
Don't mean to rush anyone, just asking for an update.

Status page:
http://wiki.videolan.org/SoC_2009/Weighted_P-frame_Prediction

Git Repository:
http://repo.or.cz/w/x264/x264-p-frames.git?a=shortlog

Looks like the project has already made some good progress. But there isn't much development going on lately, probably because SoC '09 has ended long ago (August 17th).

me7
18th October 2009, 18:09
Looks like the project has already made some good progress. But there isn't much development going on lately, probably because SoC '09 has ended long ago (August 17th).

Exactly, so does this mean that the feature has been put on hold or are the x264 devs waiting for the student to finish it?

Chengbin
18th October 2009, 18:30
Exactly, so does this mean that the feature has been put on hold or are the x264 devs waiting for the student to finish it?

The student is finishing it.

Unforunately university students are extremely busy. It will take a while.

I'm hoping it'll be done within a month. That's when I'll be re-encoding my huge collection of videos, DVDs, and some Blu-rays.

CpT
18th October 2009, 22:30
Possible bug report.

I have to disable B-Frame Pyramid even if mbtree is disabled or I get this error
Assertion failed: cost >= 0, file encoder/slicetype.c, line 1035

Tested with 1292 from here http://x264.nl/
and 1292 techouse > http://techouse.project357.com/

Running windows xp32
Gui's used, MeGui and sx264. Both give the same error.

I also tested all versions starting at 1286 though 1292. All give the same error. 1281 works.

Dark Shikari
18th October 2009, 22:30
Possible bug report.

I have to disable B-Frame Pyramid even if mbtree is disabled or I get this error
Assertion failed: cost >= 0, file encoder/slicetype.c, line 1035

Tested with 1292 from here http://x264.nl/
and 1292 techouse > http://techouse.project357.com/

Running windows xp32
Gui's used, MeGui and sx264. Both give the same error.

I also tested all versions starting at 1286 though 1292. All give the same error. 1281 works.Known issue, fixed locally already.

CpT
18th October 2009, 22:33
fast reply...

Thanks for the heads up ;)

jpsdr
19th October 2009, 09:30
Hello.
Actualy B-Frame Pyramid must be disabled if i use mbtree, for what i've understood.
Is something planed to make the two compatibles ?
If yes, is it scheduled for 2009 or not before 2010 ?
Just asking to know how much the wait could be, not to push anything.

CpT
19th October 2009, 11:49
Hello.
Actualy B-Frame Pyramid must be disabled if i use mbtree, for what i've understood.

Re-read my post ;)

nurbs
19th October 2009, 12:03
@jpsdr:
x264 | branch: master | Lamont Alston <wewk584 at gmail.com> | Mon Oct 12 23:32:16 2009 -0700| [e2659dbdc0aed2d2cd4f6538faddf370e7740ada] | committer: Jason Garrett-Glaser

Make B-pyramid spec-compliant
The rules of the specification with regard to picture buffering for pyramid coding are widely ignored.
x264's b-pyramid implementation, despite being practically identical to that proposed by the original paper, was technically not compliant.
Now it is.
Two modes are now available:
1) strict b-pyramid, while worse for compression, follows the rule mandated by Blu-ray (no P-frames can reference B-frames)
2) normal b-pyramid, which is like the old mode except fully compliant.
This patch also adds MMCO support (necessary for compliant pyramid in some cases).
MB-tree still doesn't support b-pyramid (but will soon).

G_M_C
19th October 2009, 12:16
The student is finishing it.

Unforunately university students are extremely busy. It will take a while.

I'm hoping it'll be done within a month. That's when I'll be re-encoding my huge collection of videos, DVDs, and some Blu-rays.

You shure ? Cause this answer is about the same answer the stundent posted himself, about a month ago.

I'm just hoping we get it as a christmas present :)

Chengbin
19th October 2009, 12:32
I couldn't help but notice x264 r1301 is a lot smaller, at 961KB, than revisions before, which were 1100KB.

juGGaKNot
19th October 2009, 12:59
--b-pyramid strict: Strictly heirarchical pyramid
--b-pyramid normal: Non-strict (not Blu-ray compatible)

1) strict b-pyramid, while worse for compression, follows the rule mandated by Blu-ray (no P-frames can reference B-frames)
2) normal b-pyramid, which is like the old mode except fully compliant.

So no more DXVA problems when using --b-pyramid normal + --b-pyramid strict for blu-ray right ?

What about hrd ? when will it be commited ?

When will the presets be updated to include --b-pyramid normal ? after mb-tree fix ?

LoRd_MuldeR
19th October 2009, 23:18
Improve CRF initial QP selection, fix get_qscale bug
If qcomp=1 (as in mb-tree), we don't need ABR_INIT_QP.
if( !h->param.rc.i_lookahead || h->param.i_keyint_max == 1 || h->param.rc.f_qcompress == 1 )
h->param.rc.b_mb_tree = 0;

The first implies MB-Tree uses qcomp=1 (or do I read it wrong?) while the second shows that qcomp=1 will disable MB-Tree. I'm confused :confused:

MasterNobody
20th October 2009, 00:16
LoRd_MuldeR
The real qcomp is "rc->qcompress" not the "h->param.rc.f_qcompress". So look at this:

if( h->param.rc.b_mb_tree )
{
h->param.rc.f_pb_factor = 1;
rc->qcompress = 1;
}
else
rc->qcompress = h->param.rc.f_qcompress;