View Full Version : x264 Development Progress Thread
Dark Shikari
3rd February 2008, 23:36
I figure we should have a thread like this so that people can see the general status of x264 development.
http://i36.tinypic.com/11qqpsn.png
Updated October 19th, 2008.
If you are working on any patch or improvement, or have a specific idea for an improvement that should be put on here in any form (or anything that I'm missing), say so and I'll add it in.
Key
Development Progress
None: Nothing's been done. Its just an idea.
Algorithm: A good algorithm has been come up with and thought through--its ready to be coded. There may already be coding in progress, but so far it doesn't work.
Patch: A working patch is ready that implements the algorithm fully. It may not be optimal at all, but it works.
Ongoing: Development consists of many various improvements or is a very large project, and is ongoing.
Testing/Improvement Progress
None: Nothing's been tested.
Low: Its been tested somewhat, but there's definitely lots of room for improvement.
Medium: Its been tested a good bit, but there's no real guarantee its optimal.
High: Its been tested loads; the odds of a considerable improvement at this point are pretty low.
Importance
Low: A minor change that won't have much effect alone. Often includes small code optimizations, etc.
Medium: Directly and noticeably useful in some fashion--not just a small incremental change.
High: Useful, and important. Would considerably improve x264 in relation to its competitors.
Very High: Extremely important--would, singlehandedly, make x264 vastly better.
Difficulty
Low: Easy. Consists of implementing a very simple algorithm or a small code optimization. Requires very little knowledge of x264 to implement.
Medium: Not that much harder to code, but requires considerably more knowledge of x264 and/or ability to design, implement, and test various algorithms.
High: Quite difficult. Requires in-depth knowledge of x264 and the ability to implement more complex algorithms.
Very High: Extremely difficult. Requires very in-depth knowledge of x264 and understanding of the H.264 standard.
Sagekilla
4th February 2008, 00:07
Very nice Dark Shikari.. Any possibility of wrangling up links to the patches?
lexor
4th February 2008, 00:15
Aww how cute, the table colours match your avatar ;)
On a more prosaic note, you do realize that this thread will simply degenerate into meaningless arguments about options/defaults and general "I want X, you not giving me X fast enough, give me X faster!"
On a personal note, I find the chart to be most informative, since before hand x264 development seemed like one large black box. Thanks for posting this.
bob0r
4th February 2008, 00:18
Features:
- AQ(--aq-strength 0.5 --aq-sensitivity 13): just awesome, reduces blocking and blurring.
- ME-prepass(--me-prepass): Once I update the patch, its a nice feature for the more insane encoders. Runs a hexagon search prepass on all motion search predictors to improve motion search effectiveness.
- nal hrd (--nal-hrd): bluray/hd-dvd compatibiliy
- RCRD(--rcrd-lambda, --rcrd-window): rate-distortion optimized ratecontrol, lower quants on frames that have a greater influence on future frames, higher quants on frames that have less influence
- 2pass vbv: 2pass_vbv makes vbv work in 2pass ratecontrol, without it, vbv is mostly ignored, it also happens to be quite a bit better quality than 1pass cbr
- gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
- Better B-Frame decision: Not optimal.
--
- faster-dia: <not recommended, Dark Shikari>
- thread pool: <not making it faster on my system, pengvado>
--
- improved intra refine: General improvement for faster intra refine on subme7 *added*
- ESA SATD hadamard(--fpel-cmp satd, now --me tesa): A highly optimized exhaustive motion search that uses SATD on the best candidates for even more accurately motion searching. *added*
--
Fixes:
- Cabac fix: fixes a single-line incorrectness in CABAC encoding *fixed*
- .mkv minor memory leak fix, Haali can surely fix *fixed*
- memory leak fix: fix huge frames memory leak in x264_encoder_close function (specially with multithreading)
- 32x32samples_crash.diff - fix x264 crash with 32x32 samples and adaptive B-frames, pengvado: not the right fix
Bugmaster:
Here is all my patches which I written/modified. Description:
32x32samples_crash.diff - fix x264 crash with 32x32 samples and adaptive B-frames
colorspace_patch.diff - my changes in colorspace conversion (more precision, BT.601/BT.709 and TV/PC scale support)
cosmetic.diff - cosmetic change in macroblock.c for synchronization of declaration and definition of x264_macroblock_probe_skip function
debug-defines.diff - add definition of _DEBUG and NDEBUG in configure script (improve speed when compiling in MinGW)
fix_stats_file_work_for_cli.diff - more accurate calculation of i_frame, removes workaround for VFW Nth-pass ratecontrol
fix_stats_file_work_for_vfw.diff - more accurate calculation of i_frame, more correct workaround for VFW Nth-pass ratecontrol
frames_memoryleak.diff - fix huge frames memory leak in x264_encoder_close function (specially with multithreading)
multithreading_Nth_pass_ratecontrol.diff - fix incorrect Nth-pass ratecontrol work with multithreading
thread-pool.diff - modified thread pool patch (must be applied last)
url: http://mailman.videolan.org/pipermail/x264-devel/2008-January/003921.html
Most information comes from Dark Shikari, just thought i add the 3rd party patches aswell.
Sagekilla
4th February 2008, 00:23
IMO I think RCRD is possibly the most insane option up there, I don't think enabling any one setting is slower than that.. Correct me if I'm wrong though, I haven't personally tested tesa yet to see how slow it runs.
<Obligatory whine> want faster encoding, you're not giving me it fast enough, give me it faster! </Whine>
Terranigma
4th February 2008, 00:24
have a specific idea for an improvement that should be put on here in any form (or anything that I'm missing), say so and I'll add it in.
OK, since you put it that way, i'd say 9-14 bit encoding. When Avisynth's 2.6.0's released, it should allow processing up to 16-bits (although we'd need filters that can process at 16-bits). The point of my post, is for you to add it as an option. =P
Dark Shikari
4th February 2008, 00:29
IMO I think RCRD is possibly the most insane option up there, I don't think enabling any one setting is slower than that.. Correct me if I'm wrong though, I haven't personally tested tesa yet to see how slow it runs.RCRD speed cost is independent of settings chosen, so its actually non-insane if you use a low window size. But its still pretty buggy. The main reason its up there is that RCRD has a potential to give insight into ways to improve 2pass ratecontrol.
OK, since you put it that way, i'd say 9-14 bit encoding. When Avisynth's 2.6.0's released, it should allow processing up to 16-bits (although we'd need filters that can process at 16-bits). The point of my post, is for you to add it as an option. =PSuch a change might require a new "difficulty" option: Guru. Because it would make almost all the functions in x264 completely obsolete :p
Sagekilla
4th February 2008, 00:29
On a side note, why isn't the faster dia suggested? Is it because of incompatibilities? I still see some use in a faster dia since, IIRC, dia is used in selection of best candidates before a slower ME pass is used.
Dark Shikari
4th February 2008, 00:45
On a side note, why isn't the faster dia suggested? Is it because of incompatibilities? I still see some use in a faster dia since, IIRC, dia is used in selection of best candidates before a slower ME pass is used.'cause its not actually faster.
Sagekilla
4th February 2008, 00:50
'cause its not actually faster.
That explains things a bit.. Thanks for the explanation.
Adub
4th February 2008, 01:02
Small bug:
RCRD is "RDRC" on the table.
Oh, and thank you very much for this thread. It gives a very good outline of what still needs to be done. Very useful. Also, I don't know everything there is to know about x264, but is there anything on that table about psy optimizations?
CruNcher
4th February 2008, 01:09
lookahead lookahead i want oops @ Lexor :D
IgorC
4th February 2008, 01:27
Nice to see progress on x264.
What about explicit weighted prediction of P frames in x264?
What is actual situation around implicit (and explicit if it really exists?) weighted prediction of B-frames in x264?
For speed benchmark may be it can be useful to compare to last MC encoder. It has very fast and high quality Fast RDO (fast intra/inter, MV and ref search).
Dark Shikari
4th February 2008, 01:37
Nice to see progress on x264.
What about explicit weighted prediction of P frames in x264?Pengvado implemented it a while ago, but didn't put it in SVN because it didn't seem to be very useful.
What is actual situation around implicit (and explicit if it really exists?) weighted prediction of B-frames in x264?The problem is we need a good heuristic for it.
Terranigma
4th February 2008, 01:47
Pengvado implemented it a while ago, but didn't put it in SVN because it didn't seem to be very useful.
Is there a patch for this?
CruNcher
4th February 2008, 01:48
Mainconcepts Reference H.264 Encoder has absolutely no AQ @ all or am i missing something ? and Elecards Converter Studio is eating a huge amount of Memory for Encoding (almost unbeliveable it wants the double of X264's memory for the same Avisynth script).
Dark Shikari
4th February 2008, 01:51
Is there a patch for this?Yes, but its very very old. Linkage (http://akuvian.org/src/x264/x264_wpredp.0.diff).
Mainconcepts Reference H.264 Encoder has absolutely no AQ @ allCorrect.
IgorC
4th February 2008, 01:54
Mainconcepts Reference H.264 Encoder has absolutely no AQ @ all or am i missing something ? and Elecards Converter Studio is eating a huge amount of Memory for Encoding (almost unbeliveable it wants the double of X264's memory for the same Avisynth script).
So?
It's all about fast rdo. Not state of art new AQ and other slow stuff.
It's pretty old but hasn't changed too much for fast encoding area http://forum.doom9.org/showthread.php?t=105763
foxyshadis
4th February 2008, 02:17
On a more prosaic note, you do realize that this thread will simply degenerate into meaningless arguments about options/defaults and general "I want X, you not giving me X fast enough, give me X faster!"
These posts will be struck and/or deleted. Good show with the cool chart.
Dark Shikari
4th February 2008, 03:06
Update: QNS is now apparently working, though extremely slow (due to being extremely, and intentionally, unoptimized). Patch (http://pastebin.com/fe7e1189).
I'll update the image later.
Edit: Patch updated again. Rare 4x4 artifacting still not fixed.
CruNcher
4th February 2008, 09:25
IgorC especialy for Low Bitrate scenarios the New AQ is important for mixed type of sources and Mainconcepts Encoder would have problems in many scenes holding the quality constant but that's also the difference between Mainconcept and Ateme and now to a degree X264 they have been designed for Broadcasting.
Mainconcepts own Encoder wasn't designed for that it is the most efficient and fastest HD-DVD and Blu-Ray Encoder, but in Broadcasting it would loose currently versus it's competition and X264 in this regards is with Dark Shikaris AQ very well balanced for both Scenarios :), tough Ateme is still some Generations ahead of X264 and Mainconcept/Elecard in R&D especialy everything that has todo with low bitrate psychovisual optimizations (those are esential for Broadcasting).
CruNcher
4th February 2008, 09:33
Dark Shikari QNS should make the picture not so blurry anymore right (more like the input)?
Dark Shikari
4th February 2008, 09:35
Dark Shikari QNS should make the picture not so blurry anymore right (more like the input)?QNS has an... interesting... effect. It should concentrate quantization error in areas that one would least notice it. Its a type of quantization psy optimization, and is especially effective at lower bitrates.
Try it and see, though right now the 4x4 artifacting somewhat obscures its real effect.
If I had to guess what Elecard's three "quant modes" are, I would guess Deadzone, Trellis, and QNS, respectively.
froggy1
4th February 2008, 10:50
Will there be different QNS strengths available? Like in ffmpeg/lavc? (eg, qns=1, qns=2 and qns=3)
akupenguin
4th February 2008, 11:59
The qns parameter in lavc isn't strength, it's speed.
IgorC
4th February 2008, 14:13
IgorC especialy for Low Bitrate scenarios the New AQ is important for mixed type of sources and Mainconcepts Encoder would have problems in many scenes holding the quality constant but that's also the difference between Mainconcept and Ateme and now to a degree X264 they have been designed for Broadcasting.
Mainconcepts own Encoder wasn't designed for that it is the most efficient and fastest HD-DVD and Blu-Ray Encoder, but in Broadcasting it would loose currently versus it's competition and X264 in this regards is with Dark Shikaris AQ very well balanced for both Scenarios :), tough Ateme is still some Generations ahead of X264 and Mainconcept/Elecard in R&D especialy everything that has todo with low bitrate psychovisual optimizations (those are esential for Broadcasting).
I'm totally agree that new VAQ is good and x264 in high quality mode (pretty slow settings) is better than MC.
But in medium or fast modes MC is better for the same speed.
To match the same speed x264 with MC should low subme from 6/7 to 5. And that's where x264 is worse than MC. And AQ won't help. Not everyone want +0.5-1 SSIM at cost of 2-3x (or even more) speed lost.
Try for yourself last MC or Elecard with all fasts enabled and compare it to x264 at the same speed with AQ and other bleeding edge new stuff.
Why don't optimize x264 at fast speed? It has nothing to do with AQ and other last Pengvado's and Dark Shikari's improvements. That's why MC is a good to compare to it.
We are talking about two different directions of development which in conjunction will be very good. Speed and Quality.
Just think: fast encoder and very high quality.
CruNcher
4th February 2008, 14:29
I'm totally agree that new VAQ is good and x264 in high quality mode (pretty slow settings) is better than MC.
But in medium or fast modes MC is better for the same speed.
To match the same speed x264 with MC should low subme from 6/7 to 5. And that's where x264 is worse than MC. And AQ won't help. Not everyone want +0.5-1 SSIM at cost of 2-3x (or even more) speed lost.
Try for yourself last MC or Elecard with all fasts enabled and compare it to x264 at the same speed with AQ and other bleeding edge new stuff.
Why don't optimize x264 at fast speed? It has nothing to do with AQ and other last Pengvado's and Dark Shikari's improvements. That's why MC is a good to compare to it.
We are talking about two different directions of development which in conjunction will be very good. Speed and Quality.
Just think: fast encoder and very high quality.
Thats exactly what i did, and the Visual result was clear MC Ref 1.1 lost X264 was more efficient with the new AQ in a CBR Broadcast Sceneraio and it was not that much slower :) the most speed in Mainconcept Reference comes from the Fast Inter/Intra decission X264 is like 12% (with subme2) slower in the 0.x range compared to Mainconcepts fastest Encoder Setting but the visual result was clear on X264 side for the target bitrate of 3Mbit for 1440x1080. Elecard Converter Studio on the other side has AQ but it took my whole memory (i only have 1gb) and so was ultra slow to encode with it because it swaped virtualy everything to the pagefile.
IgorC
4th February 2008, 14:36
Maybe it's valid for 1pass CBR. But I have another results comparing on medium/fast settings for 2 pass mode.
We are still talking about two different things.
CruNcher
4th February 2008, 14:44
yes and that's why i say X264 is very good balanced and improves steadily in both directions and doesn't shift only in 1 direction, and you should know that first come the features and stability and then comes the speed optimizations :)
Sagittaire
4th February 2008, 16:32
Mainconcepts Reference H.264 Encoder has absolutely no AQ @ all or am i missing something ? and Elecards Converter Studio is eating a huge amount of Memory for Encoding (almost unbeliveable it wants the double of X264's memory for the same Avisynth script).
Mainconcept/Elecard core codec have AQ implementation. AQ is simply not actived in Mainconcepts Reference H.264 Encoder Gui. You can set lumi/contrast/complexity AQ with Mainconcept/Elecard encoder (with Sonic Scenarist Gui for example).
Dark Shikari
4th February 2008, 16:47
QNS is now working (http://pastebin.com/f692027ee). A bit faster than before, but still drastically slower than it should be. Next steps are to optimize speed and tweak the parameters.
Edit: Diff updated to fix bug with artifacting at high QPs.
CruNcher
4th February 2008, 17:16
Mainconcept/Elecard core codec have AQ implementation. AQ is simply not actived in Mainconcepts Reference H.264 Encoder Gui. You can set lumi/contrast/complexity AQ with Mainconcept/Elecard encoder (with Sonic Scenarist Gui for example).
Very user friendly indeed you pay 490/1790 € for no AQ option, the cheapest Nero Recode has that and it cost's alot less than that :rolleyes:
Adub
4th February 2008, 18:39
Small bug:
RCRD is "RDRC" on the table.
I don't know if you saw this Dark Shikari, so I am just posting it again.
Dark Shikari
4th February 2008, 18:40
I don't know if you saw this Dark Shikari, so I am just posting it again.The two seem to be interchangeable.
RCRD doesn't make sense, RDRC makes more sense, but its called RCRD... I don't really care either way.
akupenguin
4th February 2008, 19:07
In english adjectives precede their noun. Otoh, my variable/function naming system is big-endian, which usually means nouns come first. That is all.
Adub
4th February 2008, 20:20
Okay, I just wasn't sure if you knew or not.
professor_desty_nova
2nd April 2008, 10:11
Maybe the table should be change now that VAQ1 is in non-patched builds...
Dark Shikari
2nd April 2008, 10:22
Maybe the table should be change now that VAQ1 is in non-patched builds...I'll update it tomorrow.
Sagittaire
2nd April 2008, 10:57
Here some Developement if you want that x264 became professional tool for high bitrate encoding like VC1 from CinevisionPSE or H264 from Mainconcept.
- Black level normalisation: use 3D filtering for denoise/degrain in dark area [16-40]. It's a psy filtering pre-process (use with dark AQ is a good psy way)
- Dark area improvement: use AQ with dark masking for improve dark area quantisation (use with dark filtering is a good psy way)
CruNcher
2nd April 2008, 18:46
Sagittaire under which viewing conditions (calibration) do you rate X264 Visual subjective Quality in Dark areas?
DeathTheSheep
4th April 2008, 00:08
Quantizer noise shaping? I've been holding my breath for something like this to hit the world of x264. What other amazing patches are you pulling out, DS?! (Yes, I read the table...).
QNS--I'm going to test this. And you know exactly what I'm going to test it on first. ;)
Okay...here's the result with r798: ([edit] problem fixed)
encoder/rdo.c:544: error: 'struct <anonymous>' has no member named 'block'
encoder/rdo.c:558: error: 'struct <anonymous>' has no member named 'block'
Result: this pertains to two of your new lines:
h->zigzagf.scan_4x4( h->dct.block[idx].luma4x4, dct );
and
block_residual_write_cabac(h,&cabac_tmp,i_ctxBlockCat,idx,h->dct.block[idx].luma4x4,16 - (i_ctxBlockCat == DCT_LUMA_AC));
Any idea what to replace to get it working with git798+?
Dark Shikari
4th April 2008, 00:21
A few revisions ago, pengvado merged the DCT block struct so that there was no more "luma4x4" and "residualac"; instead, they were both given the same variable name.
As a result, h->dct.block[idx].luma4x4 is now h->dct.luma4x4[idx].
DeathTheSheep
4th April 2008, 01:30
Ah, awesome. Thanks, I'll give 'er a go now. :)
[edit]Oh, forgot to ask: is the latest patch you posted usable with -A p4x4 and such (since you mentioned some sort of artifacting)? And, most importantly, under what conditions is it activated?
[edit2]: Darn, this thing needs trellis! With mobile/low-bitrate for mobile devices (what I encode for), it's a no-go. Is there a way to implement this in baseline (meaning without CABAC dependence)? Or, less realistically, to somehow hack together some form of non-CABAC dependent trellis? (Though they're intertwined right now, since trellis "optimizes" the CABAC stuff or whatnot).
Dark Shikari
4th April 2008, 01:36
Oh, forgot to ask: is the latest patch you posted usable with -A p4x4 and such (since you mentioned some sort of artifacting)? And, most importantly, under what conditions is it activated?
[edit2]: Darn, this thing needs trellis! With mobile/low-bitrate for mobile devices (what I encode for), it's a no-go. Is there a way to implement this in baseline (meaning without CABAC dependence)? Or, less realistically, to somehow hack together some form of non-CABAC dependent trellis? (Though they're intertwined right now, since trellis "optimizes" the CABAC stuff or whatnot).The artifacting was fixed, assuming the latest patch was the one you used.
It does not actually need trellis. If you want to remove trellis, here's what you have to do:
1. Instead of running trellis before running QNS, have it run normal quant.
2. Replace the CABAC bit-cost functions with the CAVLC bit-cost functions in QNS.
For 2), you'd probably be best off just making an IF statement, like in RD_COST_MB, e.g. if(CABAC) {do cabac} else {do cavlc}.
QNS, unlike trellis, is not dependent on the entropy encoding method. Note that any fast implementation of QNS, however, would be dependent.
DeathTheSheep
4th April 2008, 02:40
Derr, I was hoping you'd do that. But I guess nobody likes CAVLC, despite it being the most compatible and (bummer me horez) the most widely used, unfortunately. Or so the saying goes, with all those baseline encoders/profiles out on the market, and the devices to go with 'em.
block_residual_write_cavlc( x264_t *h, bs_t *s, int i_idx, int16_t *l, int i_count )
Whole different argument set than CABAC (obviously, it stands to reason). I'm unfamiliar with the arguments here, so what would I even use for the analogous bs_t argument for CAVLC's residual write function? h->out.bs? Obviously not h->cavlc... :p And as for dct_luma_8x8 (the i_ctxBlockCat term), it just vanishes since it's unused for CAVLC anyway, eh?
Oh gawsh.
Dark Shikari
4th April 2008, 02:49
Derr, I was hoping you'd do that. But I guess nobody likes CAVLC, despite it being the most compatible and (bummer me horez) the most widely used, unfortunately. Or so the saying goes, with all those baseline encoders/profiles out on the market, and the devices to go with 'em.
block_residual_write_cavlc( x264_t *h, bs_t *s, int i_idx, int16_t *l, int i_count )
Whole different argument set than CABAC (obviously, it stands to reason). I'm unfamiliar with the arguments here, so what would I even use for the analogous bs_t argument for CAVLC's residual write function? h->out.bs? Obviously not h->cavlc... :p And as for dct_luma_8x8 (the i_ctxBlockCat term), it just vanishes since it's unused for CAVLC anyway, eh?
Oh gawsh.
block_residual_write_cavlc( x264_t *h, bs_t *s, int i_idx, int16_t *l, int i_count )
block_residual_write_cabac( x264_t *h, x264_cabac_t *cb, int i_ctxBlockCat, int i_idx, int16_t *l, int i_count )
Looks pretty similar to me ;)
Just remove the i_ctxBlockCat and pass the bs_t instead of cabac_t. See rd_cost_mb for specifics on this. (Make sure to pass a temporary copy, of course, e.g. bs_t bs_tmp = h->out.bs; )
DeathTheSheep
4th April 2008, 02:52
That's what I thought, good.
But it's precisely the handling of bits_encoded that troubles me. Seems so much easier to use cabac's cute wittle integwated function. *shrug*
PS: I say that since I don't know how to use either, yet I'm trying to go from one to the other. XD
...and don't tell me its as easy as replacing f8_bits_encoded with i_bits_encoded...
Wee, you can't put declarations inside an if statement. And now I have bits_c and bits_v for cabac bits and cavlc bits. Only one of which will be used. And then an if statement updating one or the other. For some strange reason, I get the feeling that this is turning out to be harder than I'd initially thought. :p
And how much are weighted error4 and weighted error8 the same? Looks like the same darn function almost. I just copy-pasted the majority of it over. Not the for loop of course...LOL @ 4 loop.
Dark Shikari
4th April 2008, 03:23
That's what I thought, good.
But it's precisely the handling of bits_encoded that troubles me. Seems so much easier to use cabac's cute wittle integwated function. *shrug*
PS: I say that since I don't know how to use either, yet I'm trying to go from one to the other. XD
...and don't tell me its as easy as replacing f8_bits_encoded with i_bits_encoded...
Wee, you can't put declarations inside an if statement. And now I have bits_c and bits_v for cabac bits and cavlc bits. Only one of which will be used. And then an if statement updating one or the other. For some strange reason, I get the feeling that this is turning out to be harder than I'd initially thought. :p
And how much are weighted error4 and weighted error8 the same? Looks like the same darn function almost. I just copy-pasted the majority of it over. Not the for loop of course...LOL @ 4 loop.Yes, they're basically the same function with a few things changed (for the different size).
Why not just use the same variable for bits--declare outside the if statement, and assign it one value if its CABAC, and another if its CAVLC?
To compensate for CABAC's bit cost being 256 times higher precision, you can multiply CAVLC's by 256.
DeathTheSheep
4th April 2008, 03:50
By that you mean
bits *= lambda2_tab[i_qp]*256; ?
And what do I do with this?
block_residual_write_cavlc(h,&cavlc_tmp,idx,h->dct.luma4x4[idx],16 - (i_ctxBlockCat == DCT_LUMA_AC));
i_ctxBlockCat isn't passed to CAVLC directly or at all in weighted_error_8, but it still plays a role here?
Well, as things stand now, it crashes upon encode, with or without cabac. :/
[edit]Er, to put things in perspective (and to check if I'm going in the right direction at all with this:
uint64_t weighted_error4( x264_t *h, int16_t dct[4][4], uint16_t inv_variance[4][4], int i_qp, int idx, int i_quant_cat, int i_ctxBlockCat )
{
uint8_t idct[4*FDEC_STRIDE];
int16_t ndct[4][4];
int i,j;
uint8_t *decpix = h->mb.pic.p_fdec[0] + 4*block_idx_x[idx] + 4*FDEC_STRIDE*block_idx_y[idx];
h->mc.copy[PIXEL_4x4]( idct, FDEC_STRIDE, decpix, FDEC_STRIDE, 4 );
memcpy( ndct, dct, sizeof(ndct) );
h->zigzagf.scan_4x4( h->dct.luma4x4[idx], dct );
h->quantf.dequant_4x4( ndct, h->dequant4_mf[CQM_4PY], i_qp );
//[i]The exe crashes at this bold line with "illegal operation"
if( i_ctxBlockCat == DCT_LUMA_AC )
ndct[0][0] = dct[0][0];
h->dctf.add4x4_idct( idct, ndct );
uint64_t error = 0;
uint8_t *pix = h->mb.pic.p_fenc[0] + 4*block_idx_x[idx] + 4*FENC_STRIDE*block_idx_y[idx];
for( i = 0; i < 4; i++ )
for( j = 0; j < 4; j++ )
{
int pix_error = idct[i+j*FDEC_STRIDE] - pix[i+j*FENC_STRIDE];
error += ((pix_error*pix_error)*inv_variance[j][i]);
}
x264_cabac_t cabac_tmp = h->cabac;
bs_t cavlc_tmp = h->out.bs;
if( h->param.b_cabac )
{
block_residual_write_cabac(h,&cabac_tmp,i_ctxBlockCat,idx,h->dct.luma4x4[idx],16 - (i_ctxBlockCat == DCT_LUMA_AC));
}
else
{
block_residual_write_cavlc(h,&cavlc_tmp,idx,h->dct.luma4x4[idx],16 - (i_ctxBlockCat == DCT_LUMA_AC));
}
error *= 38;
uint64_t bits;
if( h->param.b_cabac )
{
bits = cabac_tmp.f8_bits_encoded;
bits *= lambda2_tab[i_quant_cat][i_qp];
}
else
{
bits = cavlc_tmp.i_bits_encoded;
bits *= lambda2_tab[i_quant_cat][i_qp]*256;
}
error += (bits + 8) >> 4;
return error;
}
Dark Shikari
4th April 2008, 04:30
Try aligning ndct.
E.g. DECLARE_ALIGNED_16(int16_t ndct[4][4]);
If --no-asm fixes the crash, that's probably the problem.
This is because I added an SSE2 version of dequant, which requires alignment; when I wrote QNS, dequant didn't require alignment.
DeathTheSheep
4th April 2008, 05:25
...omgholybullfrog. All that time and it was alignment.
Mkay, it's encoding. Slow and steady. Now with 100% more asm. ;)
[edit]I sure hope that 256 multiplier is in the right place. Is it?
Dark Shikari
4th April 2008, 05:54
...omgholybullfrog. All that time and it was alignment.
Mkay, it's encoding. Slow and steady. Now with 100% more asm. ;)
[edit]I sure hope that 256 multiplier is in the right place. Is it?Yes it is.
DeathTheSheep
4th April 2008, 06:05
Sweet!! 1.51fps, exactly 50% slower than without! I couldn't have done it without you. Obviously. A nice birthday present indeed. :)
Now I should just make it optional, and then release it as part of my diff package (0.46, me-pre2, etc)...?
Hm, reminiscent of RCRD, the slowdown of 50% applies to an insane encoding, but it produces a much more significant relative speed impact when non-insane options are used. 2.5fps with fast options (>95% slowdown) vs 1.51fps (50% slowdown). Intriguing.
You said there was more hope for speedup? I hope you meant *before* it's limited to trellis/CABAC-only. :)
Dark Shikari
4th April 2008, 06:31
Sweet!! 1.51fps, exactly 50% slower than without!You better not try QNS 2 (--trellis 2) then ;)
Blue_MiSfit
7th April 2008, 22:40
I just want to say thank you to everyone for their hard work developing x264. The open source video encoding world has been _literally_ transformed in the last few years because of your efforts.
We've got a codec that dominates the (very expensive) competition, and doesn't cost a thing.
It never ceases to amaze me, especially when "big" things happen, like dark_shikari's AQ.
Thank you all!!
~MiSfit
bob0r
8th April 2008, 00:16
Except for your signature, i agree! :D
And who knows what Google's summer of code will bring us.
But to be completely honest, i love x264 the way it is now, just perfect for my setup :)
Rodger
8th April 2008, 22:29
I want a 64bit Version!!! :p
Still no chance to see a 64bit compiled build in the near future?:confused:
Saw in another thread there would be problems with 64bit?!
burfadel
23rd April 2008, 05:06
I've got an enquiry about --merange, something which I've noticed which I'm not so sure about!
I did some tests with merange set to 16 and 32, with some strange results. Its probably coincidence but every encode with merange set to 32 was faster, although the difference was mostly less the 0.1 fps so thats within a tolerable error level.
The main point is the output was quite literally bit identical for several clips I tried. I'm not talking about just filesize, I'm referring to using the fc command from the command prompt (that is, no differences encountered). This was using the standard rev 826 build from www.x264.nl
Is this normal for everything to be bit identical like that? How come merange of 32 seems fractionally faster? (as I said this could be just coincidence since its slight).
DeathTheSheep
23rd April 2008, 05:10
If you're using something like dia or hex, which are range-independent, it doesn't matter what you set merange to, there will be no difference. If you use umh or esa you will find that upping it does indeed slow you down.
Dark Shikari
23rd April 2008, 05:36
If you're using something like dia or hex, which are range-independent, it doesn't matter what you set merange to, there will be no difference. If you use umh or esa you will find that upping it does indeed slow you down.merange is used for DIA/HEX; its just clipped to [4,16].
DeathTheSheep
23rd April 2008, 05:38
Yup, so raising it to, say, 32+ wouldn't be at all different from just leaving it alone, as in burfadel's case (if I read correctly). Though why you'd want to lower it to 4 in the first place...is another story (unless you really want the speed?). :)
Oh, isn't the merange adaptive? So if UMH thinks something's just out of reach it will nab at it anyway, correct?
akupenguin
23rd April 2008, 05:55
Oh, isn't the merange adaptive? So if UMH thinks something's just out of reach it will nab at it anyway, correct?
UMH's range is adaptive, but it adapts proportionally, i.e. multiplies merange by some heuristic. "just out of reach" is ascribing way too much intelligence. It just takes a wild guess at the typical mv size before searching.
I want a 64bit Version!!!
Still no chance to see a 64bit compiled build in the near future?
Whenever you implement it. Unless you fancy Linux or OSX; those can use the 64bit version right now.
burfadel
23rd April 2008, 06:26
ah ok! that makes sense then :)
Rodger
23rd April 2008, 09:55
Whenever you implement it. Unless you fancy Linux or OSX; those can use the 64bit version right now.
I donīt get that...I thought x264 is free from plattform-restrictions.
Available for any plattform. That was the main intention to get rid of vfw ect.
What I understood now off your posting, that this thinking is somehow wrong.
There is a 64bit version available, but for any system, but Windows?
Any acknowledgements from this 64bit version? is it faster?
If I knew how to help you guys I would have years from now ;)
By the way "DeathTheSheep" does a rockinī job! Last time I checked the vfw-version was competitive to the cli-version.
So it will take some more time to implement the 64bit code, huh?
hmmpf :(
Dark Shikari
23rd April 2008, 10:04
I donīt get that...I thought x264 is free from plattform-restrictions.
Available for any plattform. That was the main intention to get rid of vfw ect.
What I understood now off your posting, that this thinking is somehow wrong.
There is a 64bit version available, but for any system, but Windows?
Any acknowledgements from this 64bit version? is it faster?Pengvado routinely breaks Windows 64-bit builds when hacking the assembly, since Windows has a slightly different calling convention from Linux on 64-bit, IIRC.
64-bit is about 10-15% faster, I believe.
akupenguin
23rd April 2008, 10:41
I donīt get that...I thought x264 is free from plattform-restrictions.
Assembly is inherently platform specific. Windows chose to do everything differently, so it gets to be not supported. I don't code for OSX either, but it's much less different, and thus doesn't require constant maintenance.
btw, does anyone know what exactly would happen if we omitted the official stack frame and unwind metadata, and just treated it like linux except with different register allocation? Stack unwinding only pertains to exceptions, so that stuff shouldn't even be used unless the program has already crashed.
Rodger
23rd April 2008, 20:23
Thanks for the information! Didnīt know the code-handling was that different.
64-bit is about 10-15% faster, I believe.
GOOD GOD!:eek:
I was hoping for a bit more than 5% and now itīs more like 10-15%.
Now THAT IS good news.
Makes me wanna habe a 64bit-version eben more ;)
So if you guys need a beta-tester for a 64bit build...just drop me a line.
Avenger007
9th May 2008, 21:38
Any word on when another patching spree will be unleashed? :)
Also, any rough estimates on speed and quality improvements we can expect to see in the near future, like weeks from now and then a few months from now?
Final question, should I encode with the stable 839 version or wait until after the patching spree?
:thanks: to all x264 developers!
Dark Shikari
9th May 2008, 22:32
Any word on when another patching spree will be unleashed? :)
Also, any rough estimates on speed and quality improvements we can expect to see in the near future, like weeks from now and then a few months from now?
Final question, should I encode with the stable 839 version or wait until after the patching spree?
:thanks: to all x264 developers!Encode with the stable 839 version; there's some speed improvements in the pipe, but all major stuff is saved for the summer.
I've been busy the past 2-3 days writing Photon, my own custom codec (mostly for the purpose of learning how to do so).
plonk420
15th May 2008, 13:22
horrible horrible question to ask a developer, i know, but any ideas as to ETA for an x264-like binary (or binary-building-ready code)? this sounds really really exciting for whatever reason... i wish i knew enough to code and/or build myself, but i only know enough to fix PHP scripts that won't load on my site... :(
What are you talking about? Just compile the binary yourself from the source code. Or am I misunderstanding your definition of binary.
plonk420
16th May 2008, 00:40
i pretty much don't know how to compile. my several attempts have ended in massive failure :/
(i'm an encoder enthusist and wannabe graphic designer, not a coder!)
jeffy
16th May 2008, 00:47
i pretty much don't know how to compile. my several attempts have ended in massive failure :/
(i'm an encoder enthusist and wannabe graphic designer, not a coder!)
What platform are you trying to compile the binary on and why, if you just need the x264 binary, don't you download it?
plonk420
16th May 2008, 01:31
i haven't tried compiling this and don't really have a desire to. i've tried compiling a program or two of DVD Jon's and something else i can't remember on x86, none of which succeeded. and i'm not asking for x264, i'm selfishly asking for Photon. i guess i'll shut up now and try my best to be patient :S
Dark Shikari
16th May 2008, 01:42
i'm not asking for x264, i'm selfishly asking for Photon.Wait, what? You want to use a totally custom intra-only highly inefficient codec with no asm whatsoever, whose latest version exists only on my computer? :p
plonk420
16th May 2008, 03:12
it's the idea of new technology that fascinates me, good sir! hell.. the only thing i encode in anymore is MPEG-2 and AVC :P so a happy medium between the two sounds like an interesting prospect :O
Dark Shikari
16th May 2008, 03:18
it's the idea of new technology that fascinates me, good sir! hell.. the only thing i encode in anymore is MPEG-2 and AVC :P so a happy medium between the two sounds like an interesting prospect :OMPEG-2 is still more efficient than Photon, given that Photon is intra-only and doesn't use custom VLCs ;)
Inventive Software
16th May 2008, 03:18
Wait, what? You want to use a totally custom intra-only highly inefficient codec with no asm whatsoever, whose latest version exists only on my computer? :p
Consider yourself popular! :D
squid_80
16th May 2008, 04:10
btw, does anyone know what exactly would happen if we omitted the official stack frame and unwind metadata, and just treated it like linux except with different register allocation? Stack unwinding only pertains to exceptions, so that stuff shouldn't even be used unless the program has already crashed.
If there's no unwind metadata and an exception is thrown (i.e. a crash, unless you're crazy enough to code throwing exceptions in asm) the app simply *poof* vanishes. No crash dialog, no offer to report it to MS. Not really a big deal with x264.exe, but when libx264 is used inside something else (ffdshow for example) it's not very polite to bring down the host app (which may be able to catch properly thrown exceptions) when you make a mistake.
(One other difference I remember between the linux/windows 64-bit code; on windows it is NEVER safe to write beyond the stack pointer.)
plonk420
16th May 2008, 04:58
MPEG-2 is still more efficient than Photon, given that Photon is intra-only and doesn't use custom VLCs ;)
at least it's something NEW to try out..! >_>
akupenguin
16th May 2008, 05:02
If there's no unwind metadata and an exception is thrown (i.e. a crash, unless you're crazy enough to code throwing exceptions in asm) the app simply *poof* vanishes
Sounds fine to me. Now all we need is a way to mix win64 and linux64 abis in the same lib, and we can get away with no new asm.
Kurtnoise
19th June 2008, 22:04
DS, can we have an update to your table in your 1st post with the last infos/news ?
I'm particularly interested by the GPGPU challenge...;)
professor_desty_nova
21st August 2008, 09:43
Can we have an update of the table with the latest info?;)
And how is progress going in the "Google Summer of Code Projects"?
Ranguvar
21st August 2008, 16:53
I've recently switched to XP x64, 'twould be awesome if x264 worked in 64-bit for the Windows platform as well :)
cyberbeing
18th September 2008, 21:39
Since that other thread got locked, I guess I should post this here since it never got answered.
On that note, something that may be interesting would be fully tweakable presets for rate-control to better suit the types of frames where you want quality to be and have the ability to give things different weights of importance (dark areas, light areas, medium areas, detailed areas, flat areas, gradients, fades, fast moving scenes, slow moving scenes etc). I think AQ and PsyRD does some of this in a generalized fashion, but a streamlined and specialized weight based rate control, based on different frame characteristics would be interesting. It would give a level of control beyond what we have today.
How feasible would it be to implement something like this? Anybody interested in trying to make a patch?
It would definitely have its uses if it doesn't slow down encoding too much, but ease of use would be another factor. For that reason the current rate control should still always be kept as the default if this gets added.
Sagekilla
18th October 2008, 06:55
Hey Dark Shikari, is there any hope of any progress on the RCRD patch, or even an improvement on the existing RC through usage of RCRD concepts?
Dark Shikari
18th October 2008, 08:02
Hey Dark Shikari, is there any hope of any progress on the RCRD patch, or even an improvement on the existing RC through usage of RCRD concepts?Well, I've managed to port it up to the most recent revisions ;)
The main problem with using RCRD concepts to improve regular ratecontrol is that its rather difficult to calculate the optimal lamba to use--there's really no obvious way of knowing.
MBtree would achieve a good portion of what RCRD does, and on a macroblock level, without the lambda issues or the speed penalty.
Sagekilla
18th October 2008, 08:24
Ah, you mean this: Macroblock Trees GSOC (http://wiki.videolan.org/SoC_2008/x264/Macroblock_tree). Nifty, this always sounded like something that could make -big- difference in video output, since temporally important frames/MBs could be given a quality boost compared to frames/MBs that show up maybe only once.
Sagekilla
18th October 2008, 19:59
Hrm, I read your dev blog and you say you have nehalem specific optimizations: Is it too much to give away to say that x264 would run vastly better on a quad core nehalem vs quad core penryn at the same clock speeds?
Edit: Sorry about the double post, I'm rather excited about these recent changes ;)
nurbs
18th October 2008, 20:07
Nehalem would be faster even without further optimizations: http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3326&p=6
Dark Shikari
20th October 2008, 04:31
Original post updated on request.
video_magic
25th October 2008, 05:33
My archiving of VHS -> x264 has been very pleasing so far, at bitrates of around 1200 for 720x576 25fps captures.
As I am capturing using a BT878a based card I have many captures in YUY2, I was wondering about YUY2 support in x264 as I read here:
http://wiki.videolan.org/SoC_x264#4:4:4_and_4:2:2_color_support
It would save any further (lossy?) conversion from my captures, and I don't know how much encoding-time it might save, because of the converttoyv12 in my avs?
Anyway, thanks, I appreciate everything about this amazingly good codec. I suppose version r1000 is being held off for numerous improvements...
akupenguin
25th October 2008, 06:03
Save? Converttoyv12 throws away information, and therefore makes the encoding process faster as it doesn't have to encode that information. You could ask for 4:2:2 if you think your VHS actually contain 360x576 pixels worth of real chroma (hah) and think keeping that is worth the cost of 33% more pixels.
video_magic
25th October 2008, 06:27
....Converttoyv12 throws away information, and therefore makes the encoding process faster as it doesn't have to encode that information.....
Ah ok, thanks for that information! :thanks:
So converting to YV12, from YUY2 is 'visually lossless' or effectively so (negligible loss) to any reasonable person?
Sagekilla
25th October 2008, 23:22
Well, YUY2 vs YV12 has some minor loss in color resolution, but unless you're doing chroma keying (even then there's tricks to mask artifacts) 99% of all people will not notice a difference.
The thing akupenguin was trying to get at is that a 720x576 cap from VHS has no where near that much chroma resolution. They have so little resolution that the difference between YUY2 and YV12 is negligible.
professor_desty_nova
8th August 2009, 12:00
I figure we should have a thread like this so that people can see the general status of x264 development.
http://i36.tinypic.com/11qqpsn.png
Updated October 19th, 2008.
I think this needs an update :D
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.