x264 Development Progress Thread [Archive]

Dark Shikari

3rd February 2008, 23:36

I figure we should have a thread like this so that people can see the general status of x264 development.

http://i36.tinypic.com/11qqpsn.png
Updated October 19th, 2008.

If you are working on any patch or improvement, or have a specific idea for an improvement that should be put on here in any form (or anything that I'm missing), say so and I'll add it in.

Key
Development Progress
None: Nothing's been done. Its just an idea.
Algorithm: A good algorithm has been come up with and thought through--its ready to be coded. There may already be coding in progress, but so far it doesn't work.
Patch: A working patch is ready that implements the algorithm fully. It may not be optimal at all, but it works.
Ongoing: Development consists of many various improvements or is a very large project, and is ongoing.
Testing/Improvement Progress
None: Nothing's been tested.
Low: Its been tested somewhat, but there's definitely lots of room for improvement.
Medium: Its been tested a good bit, but there's no real guarantee its optimal.
High: Its been tested loads; the odds of a considerable improvement at this point are pretty low.
Importance
Low: A minor change that won't have much effect alone. Often includes small code optimizations, etc.
Medium: Directly and noticeably useful in some fashion--not just a small incremental change.
High: Useful, and important. Would considerably improve x264 in relation to its competitors.
Very High: Extremely important--would, singlehandedly, make x264 vastly better.
Difficulty
Low: Easy. Consists of implementing a very simple algorithm or a small code optimization. Requires very little knowledge of x264 to implement.
Medium: Not that much harder to code, but requires considerably more knowledge of x264 and/or ability to design, implement, and test various algorithms.
High: Quite difficult. Requires in-depth knowledge of x264 and the ability to implement more complex algorithms.
Very High: Extremely difficult. Requires very in-depth knowledge of x264 and understanding of the H.264 standard.

Sagekilla

4th February 2008, 00:07

Very nice Dark Shikari.. Any possibility of wrangling up links to the patches?

lexor

4th February 2008, 00:15

Aww how cute, the table colours match your avatar ;)

On a more prosaic note, you do realize that this thread will simply degenerate into meaningless arguments about options/defaults and general "I want X, you not giving me X fast enough, give me X faster!"

On a personal note, I find the chart to be most informative, since before hand x264 development seemed like one large black box. Thanks for posting this.

bob0r

4th February 2008, 00:18

Features:
- AQ(--aq-strength 0.5 --aq-sensitivity 13): just awesome, reduces blocking and blurring.
- ME-prepass(--me-prepass): Once I update the patch, its a nice feature for the more insane encoders. Runs a hexagon search prepass on all motion search predictors to improve motion search effectiveness.
- nal hrd (--nal-hrd): bluray/hd-dvd compatibiliy
- RCRD(--rcrd-lambda, --rcrd-window): rate-distortion optimized ratecontrol, lower quants on frames that have a greater influence on future frames, higher quants on frames that have less influence
- 2pass vbv: 2pass_vbv makes vbv work in 2pass ratecontrol, without it, vbv is mostly ignored, it also happens to be quite a bit better quality than 1pass cbr
- gaussian cplxblur: gives a tiny improvement in 2pass ratecontrol
- Better B-Frame decision: Not optimal.
--
- faster-dia: <not recommended, Dark Shikari>
- thread pool: <not making it faster on my system, pengvado>
--
- improved intra refine: General improvement for faster intra refine on subme7 *added*
- ESA SATD hadamard(--fpel-cmp satd, now --me tesa): A highly optimized exhaustive motion search that uses SATD on the best candidates for even more accurately motion searching. *added*
--
Fixes:
- Cabac fix: fixes a single-line incorrectness in CABAC encoding *fixed*
- .mkv minor memory leak fix, Haali can surely fix *fixed*
- memory leak fix: fix huge frames memory leak in x264_encoder_close function (specially with multithreading)
- 32x32samples_crash.diff - fix x264 crash with 32x32 samples and adaptive B-frames, pengvado: not the right fix

Bugmaster:
Here is all my patches which I written/modified. Description:
32x32samples_crash.diff - fix x264 crash with 32x32 samples and adaptive B-frames
colorspace_patch.diff - my changes in colorspace conversion (more precision, BT.601/BT.709 and TV/PC scale support)
cosmetic.diff - cosmetic change in macroblock.c for synchronization of declaration and definition of x264_macroblock_probe_skip function
debug-defines.diff - add definition of _DEBUG and NDEBUG in configure script (improve speed when compiling in MinGW)
fix_stats_file_work_for_cli.diff - more accurate calculation of i_frame, removes workaround for VFW Nth-pass ratecontrol
fix_stats_file_work_for_vfw.diff - more accurate calculation of i_frame, more correct workaround for VFW Nth-pass ratecontrol
frames_memoryleak.diff - fix huge frames memory leak in x264_encoder_close function (specially with multithreading)
multithreading_Nth_pass_ratecontrol.diff - fix incorrect Nth-pass ratecontrol work with multithreading
thread-pool.diff - modified thread pool patch (must be applied last)
url: http://mailman.videolan.org/pipermail/x264-devel/2008-January/003921.html

Most information comes from Dark Shikari, just thought i add the 3rd party patches aswell.

Sagekilla

4th February 2008, 00:23

IMO I think RCRD is possibly the most insane option up there, I don't think enabling any one setting is slower than that.. Correct me if I'm wrong though, I haven't personally tested tesa yet to see how slow it runs.

<Obligatory whine> want faster encoding, you're not giving me it fast enough, give me it faster! </Whine>

Terranigma

4th February 2008, 00:24

have a specific idea for an improvement that should be put on here in any form (or anything that I'm missing), say so and I'll add it in.

OK, since you put it that way, i'd say 9-14 bit encoding. When Avisynth's 2.6.0's released, it should allow processing up to 16-bits (although we'd need filters that can process at 16-bits). The point of my post, is for you to add it as an option. =P

Dark Shikari

4th February 2008, 00:29

IMO I think RCRD is possibly the most insane option up there, I don't think enabling any one setting is slower than that.. Correct me if I'm wrong though, I haven't personally tested tesa yet to see how slow it runs.RCRD speed cost is independent of settings chosen, so its actually non-insane if you use a low window size. But its still pretty buggy. The main reason its up there is that RCRD has a potential to give insight into ways to improve 2pass ratecontrol.

OK, since you put it that way, i'd say 9-14 bit encoding. When Avisynth's 2.6.0's released, it should allow processing up to 16-bits (although we'd need filters that can process at 16-bits). The point of my post, is for you to add it as an option. =PSuch a change might require a new "difficulty" option: Guru. Because it would make almost all the functions in x264 completely obsolete :p

Sagekilla

4th February 2008, 00:29

On a side note, why isn't the faster dia suggested? Is it because of incompatibilities? I still see some use in a faster dia since, IIRC, dia is used in selection of best candidates before a slower ME pass is used.

Dark Shikari

4th February 2008, 00:45

On a side note, why isn't the faster dia suggested? Is it because of incompatibilities? I still see some use in a faster dia since, IIRC, dia is used in selection of best candidates before a slower ME pass is used.'cause its not actually faster.

Sagekilla

4th February 2008, 00:50

'cause its not actually faster.

That explains things a bit.. Thanks for the explanation.

Adub

4th February 2008, 01:02

Small bug:

RCRD is "RDRC" on the table.

Oh, and thank you very much for this thread. It gives a very good outline of what still needs to be done. Very useful. Also, I don't know everything there is to know about x264, but is there anything on that table about psy optimizations?

CruNcher

4th February 2008, 01:09

lookahead lookahead i want oops @ Lexor :D

IgorC

4th February 2008, 01:27

Nice to see progress on x264.

What about explicit weighted prediction of P frames in x264?
What is actual situation around implicit (and explicit if it really exists?) weighted prediction of B-frames in x264?

For speed benchmark may be it can be useful to compare to last MC encoder. It has very fast and high quality Fast RDO (fast intra/inter, MV and ref search).

Dark Shikari

4th February 2008, 01:37

Nice to see progress on x264.

What about explicit weighted prediction of P frames in x264?Pengvado implemented it a while ago, but didn't put it in SVN because it didn't seem to be very useful.
What is actual situation around implicit (and explicit if it really exists?) weighted prediction of B-frames in x264?The problem is we need a good heuristic for it.

Terranigma

4th February 2008, 01:47

Pengvado implemented it a while ago, but didn't put it in SVN because it didn't seem to be very useful.

Is there a patch for this?

CruNcher

4th February 2008, 01:48

Mainconcepts Reference H.264 Encoder has absolutely no AQ @ all or am i missing something ? and Elecards Converter Studio is eating a huge amount of Memory for Encoding (almost unbeliveable it wants the double of X264's memory for the same Avisynth script).

Dark Shikari

4th February 2008, 01:51

Is there a patch for this?Yes, but its very very old. Linkage (http://akuvian.org/src/x264/x264_wpredp.0.diff).

Mainconcepts Reference H.264 Encoder has absolutely no AQ @ allCorrect.

IgorC

4th February 2008, 01:54

Mainconcepts Reference H.264 Encoder has absolutely no AQ @ all or am i missing something ? and Elecards Converter Studio is eating a huge amount of Memory for Encoding (almost unbeliveable it wants the double of X264's memory for the same Avisynth script).

So?

It's all about fast rdo. Not state of art new AQ and other slow stuff.

It's pretty old but hasn't changed too much for fast encoding area http://forum.doom9.org/showthread.php?t=105763

foxyshadis

4th February 2008, 02:17

On a more prosaic note, you do realize that this thread will simply degenerate into meaningless arguments about options/defaults and general "I want X, you not giving me X fast enough, give me X faster!"

These posts will be struck and/or deleted. Good show with the cool chart.

Dark Shikari

4th February 2008, 03:06

Update: QNS is now apparently working, though extremely slow (due to being extremely, and intentionally, unoptimized). Patch (http://pastebin.com/fe7e1189).

I'll update the image later.

Edit: Patch updated again. Rare 4x4 artifacting still not fixed.

CruNcher

4th February 2008, 09:25

IgorC especialy for Low Bitrate scenarios the New AQ is important for mixed type of sources and Mainconcepts Encoder would have problems in many scenes holding the quality constant but that's also the difference between Mainconcept and Ateme and now to a degree X264 they have been designed for Broadcasting.
Mainconcepts own Encoder wasn't designed for that it is the most efficient and fastest HD-DVD and Blu-Ray Encoder, but in Broadcasting it would loose currently versus it's competition and X264 in this regards is with Dark Shikaris AQ very well balanced for both Scenarios :), tough Ateme is still some Generations ahead of X264 and Mainconcept/Elecard in R&D especialy everything that has todo with low bitrate psychovisual optimizations (those are esential for Broadcasting).

CruNcher

4th February 2008, 09:33

Dark Shikari QNS should make the picture not so blurry anymore right (more like the input)?

Dark Shikari

4th February 2008, 09:35

Dark Shikari QNS should make the picture not so blurry anymore right (more like the input)?QNS has an... interesting... effect. It should concentrate quantization error in areas that one would least notice it. Its a type of quantization psy optimization, and is especially effective at lower bitrates.

Try it and see, though right now the 4x4 artifacting somewhat obscures its real effect.

If I had to guess what Elecard's three "quant modes" are, I would guess Deadzone, Trellis, and QNS, respectively.

microchip8

4th February 2008, 10:50

Will there be different QNS strengths available? Like in ffmpeg/lavc? (eg, qns=1, qns=2 and qns=3)

akupenguin

4th February 2008, 11:59

The qns parameter in lavc isn't strength, it's speed.

IgorC

4th February 2008, 14:13

IgorC especialy for Low Bitrate scenarios the New AQ is important for mixed type of sources and Mainconcepts Encoder would have problems in many scenes holding the quality constant but that's also the difference between Mainconcept and Ateme and now to a degree X264 they have been designed for Broadcasting.
Mainconcepts own Encoder wasn't designed for that it is the most efficient and fastest HD-DVD and Blu-Ray Encoder, but in Broadcasting it would loose currently versus it's competition and X264 in this regards is with Dark Shikaris AQ very well balanced for both Scenarios :), tough Ateme is still some Generations ahead of X264 and Mainconcept/Elecard in R&D especialy everything that has todo with low bitrate psychovisual optimizations (those are esential for Broadcasting).

I'm totally agree that new VAQ is good and x264 in high quality mode (pretty slow settings) is better than MC.

But in medium or fast modes MC is better for the same speed.
To match the same speed x264 with MC should low subme from 6/7 to 5. And that's where x264 is worse than MC. And AQ won't help. Not everyone want +0.5-1 SSIM at cost of 2-3x (or even more) speed lost.

Try for yourself last MC or Elecard with all fasts enabled and compare it to x264 at the same speed with AQ and other bleeding edge new stuff.

Why don't optimize x264 at fast speed? It has nothing to do with AQ and other last Pengvado's and Dark Shikari's improvements. That's why MC is a good to compare to it.
We are talking about two different directions of development which in conjunction will be very good. Speed and Quality.
Just think: fast encoder and very high quality.

CruNcher

4th February 2008, 14:29

I'm totally agree that new VAQ is good and x264 in high quality mode (pretty slow settings) is better than MC.

But in medium or fast modes MC is better for the same speed.
To match the same speed x264 with MC should low subme from 6/7 to 5. And that's where x264 is worse than MC. And AQ won't help. Not everyone want +0.5-1 SSIM at cost of 2-3x (or even more) speed lost.

Try for yourself last MC or Elecard with all fasts enabled and compare it to x264 at the same speed with AQ and other bleeding edge new stuff.

Why don't optimize x264 at fast speed? It has nothing to do with AQ and other last Pengvado's and Dark Shikari's improvements. That's why MC is a good to compare to it.
We are talking about two different directions of development which in conjunction will be very good. Speed and Quality.
Just think: fast encoder and very high quality.

Thats exactly what i did, and the Visual result was clear MC Ref 1.1 lost X264 was more efficient with the new AQ in a CBR Broadcast Sceneraio and it was not that much slower :) the most speed in Mainconcept Reference comes from the Fast Inter/Intra decission X264 is like 12% (with subme2) slower in the 0.x range compared to Mainconcepts fastest Encoder Setting but the visual result was clear on X264 side for the target bitrate of 3Mbit for 1440x1080. Elecard Converter Studio on the other side has AQ but it took my whole memory (i only have 1gb) and so was ultra slow to encode with it because it swaped virtualy everything to the pagefile.

IgorC

4th February 2008, 14:36

Maybe it's valid for 1pass CBR. But I have another results comparing on medium/fast settings for 2 pass mode.
We are still talking about two different things.

CruNcher

4th February 2008, 14:44

yes and that's why i say X264 is very good balanced and improves steadily in both directions and doesn't shift only in 1 direction, and you should know that first come the features and stability and then comes the speed optimizations :)

Sagittaire

4th February 2008, 16:32

Mainconcepts Reference H.264 Encoder has absolutely no AQ @ all or am i missing something ? and Elecards Converter Studio is eating a huge amount of Memory for Encoding (almost unbeliveable it wants the double of X264's memory for the same Avisynth script).

Mainconcept/Elecard core codec have AQ implementation. AQ is simply not actived in Mainconcepts Reference H.264 Encoder Gui. You can set lumi/contrast/complexity AQ with Mainconcept/Elecard encoder (with Sonic Scenarist Gui for example).

Dark Shikari

4th February 2008, 16:47

QNS is now working (http://pastebin.com/f692027ee). A bit faster than before, but still drastically slower than it should be. Next steps are to optimize speed and tweak the parameters.

Edit: Diff updated to fix bug with artifacting at high QPs.

CruNcher

4th February 2008, 17:16

Mainconcept/Elecard core codec have AQ implementation. AQ is simply not actived in Mainconcepts Reference H.264 Encoder Gui. You can set lumi/contrast/complexity AQ with Mainconcept/Elecard encoder (with Sonic Scenarist Gui for example).

Very user friendly indeed you pay 490/1790 € for no AQ option, the cheapest Nero Recode has that and it cost's alot less than that :rolleyes:

Adub

4th February 2008, 18:39

Small bug:

RCRD is "RDRC" on the table.

I don't know if you saw this Dark Shikari, so I am just posting it again.

Dark Shikari

4th February 2008, 18:40

I don't know if you saw this Dark Shikari, so I am just posting it again.The two seem to be interchangeable.

RCRD doesn't make sense, RDRC makes more sense, but its called RCRD... I don't really care either way.

akupenguin

4th February 2008, 19:07

In english adjectives precede their noun. Otoh, my variable/function naming system is big-endian, which usually means nouns come first. That is all.

Adub

4th February 2008, 20:20

Okay, I just wasn't sure if you knew or not.

professor_desty_nova

2nd April 2008, 09:11

Maybe the table should be change now that VAQ1 is in non-patched builds...

Dark Shikari

2nd April 2008, 09:22

Maybe the table should be change now that VAQ1 is in non-patched builds...I'll update it tomorrow.

Sagittaire

2nd April 2008, 09:57

Here some Developement if you want that x264 became professional tool for high bitrate encoding like VC1 from CinevisionPSE or H264 from Mainconcept.

- Black level normalisation: use 3D filtering for denoise/degrain in dark area [16-40]. It's a psy filtering pre-process (use with dark AQ is a good psy way)

- Dark area improvement: use AQ with dark masking for improve dark area quantisation (use with dark filtering is a good psy way)

CruNcher

2nd April 2008, 17:46

Sagittaire under which viewing conditions (calibration) do you rate X264 Visual subjective Quality in Dark areas?

DeathTheSheep

3rd April 2008, 23:08

Quantizer noise shaping? I've been holding my breath for something like this to hit the world of x264. What other amazing patches are you pulling out, DS?! (Yes, I read the table...).

QNS--I'm going to test this. And you know exactly what I'm going to test it on first. ;)

Okay...here's the result with r798: ([edit] problem fixed)

encoder/rdo.c:544: error: 'struct <anonymous>' has no member named 'block'
encoder/rdo.c:558: error: 'struct <anonymous>' has no member named 'block'
Result: this pertains to two of your new lines:

h->zigzagf.scan_4x4( h->dct.block[idx].luma4x4, dct );
and
block_residual_write_cabac(h,&cabac_tmp,i_ctxBlockCat,idx,h->dct.block[idx].luma4x4,16 - (i_ctxBlockCat == DCT_LUMA_AC));

Any idea what to replace to get it working with git798+?

Dark Shikari

3rd April 2008, 23:21

A few revisions ago, pengvado merged the DCT block struct so that there was no more "luma4x4" and "residualac"; instead, they were both given the same variable name.

As a result, h->dct.block[idx].luma4x4 is now h->dct.luma4x4[idx].

DeathTheSheep

4th April 2008, 00:30

Ah, awesome. Thanks, I'll give 'er a go now. :)

[edit]Oh, forgot to ask: is the latest patch you posted usable with -A p4x4 and such (since you mentioned some sort of artifacting)? And, most importantly, under what conditions is it activated?

[edit2]: Darn, this thing needs trellis! With mobile/low-bitrate for mobile devices (what I encode for), it's a no-go. Is there a way to implement this in baseline (meaning without CABAC dependence)? Or, less realistically, to somehow hack together some form of non-CABAC dependent trellis? (Though they're intertwined right now, since trellis "optimizes" the CABAC stuff or whatnot).

Dark Shikari

4th April 2008, 00:36

Oh, forgot to ask: is the latest patch you posted usable with -A p4x4 and such (since you mentioned some sort of artifacting)? And, most importantly, under what conditions is it activated?

[edit2]: Darn, this thing needs trellis! With mobile/low-bitrate for mobile devices (what I encode for), it's a no-go. Is there a way to implement this in baseline (meaning without CABAC dependence)? Or, less realistically, to somehow hack together some form of non-CABAC dependent trellis? (Though they're intertwined right now, since trellis "optimizes" the CABAC stuff or whatnot).The artifacting was fixed, assuming the latest patch was the one you used.

It does not actually need trellis. If you want to remove trellis, here's what you have to do:

1. Instead of running trellis before running QNS, have it run normal quant.

2. Replace the CABAC bit-cost functions with the CAVLC bit-cost functions in QNS.

For 2), you'd probably be best off just making an IF statement, like in RD_COST_MB, e.g. if(CABAC) {do cabac} else {do cavlc}.

QNS, unlike trellis, is not dependent on the entropy encoding method. Note that any fast implementation of QNS, however, would be dependent.

DeathTheSheep

4th April 2008, 01:40

Derr, I was hoping you'd do that. But I guess nobody likes CAVLC, despite it being the most compatible and (bummer me horez) the most widely used, unfortunately. Or so the saying goes, with all those baseline encoders/profiles out on the market, and the devices to go with 'em.

block_residual_write_cavlc( x264_t *h, bs_t *s, int i_idx, int16_t *l, int i_count )
Whole different argument set than CABAC (obviously, it stands to reason). I'm unfamiliar with the arguments here, so what would I even use for the analogous bs_t argument for CAVLC's residual write function? h->out.bs? Obviously not h->cavlc... :p And as for dct_luma_8x8 (the i_ctxBlockCat term), it just vanishes since it's unused for CAVLC anyway, eh?

Oh gawsh.

Dark Shikari

4th April 2008, 01:49

Derr, I was hoping you'd do that. But I guess nobody likes CAVLC, despite it being the most compatible and (bummer me horez) the most widely used, unfortunately. Or so the saying goes, with all those baseline encoders/profiles out on the market, and the devices to go with 'em.

block_residual_write_cavlc( x264_t *h, bs_t *s, int i_idx, int16_t *l, int i_count )
Whole different argument set than CABAC (obviously, it stands to reason). I'm unfamiliar with the arguments here, so what would I even use for the analogous bs_t argument for CAVLC's residual write function? h->out.bs? Obviously not h->cavlc... :p And as for dct_luma_8x8 (the i_ctxBlockCat term), it just vanishes since it's unused for CAVLC anyway, eh?

Oh gawsh.

block_residual_write_cavlc( x264_t *h, bs_t *s, int i_idx, int16_t *l, int i_count )

block_residual_write_cabac( x264_t *h, x264_cabac_t *cb, int i_ctxBlockCat, int i_idx, int16_t *l, int i_count )

Looks pretty similar to me ;)

Just remove the i_ctxBlockCat and pass the bs_t instead of cabac_t. See rd_cost_mb for specifics on this. (Make sure to pass a temporary copy, of course, e.g. bs_t bs_tmp = h->out.bs; )

DeathTheSheep

4th April 2008, 01:52

That's what I thought, good.

But it's precisely the handling of bits_encoded that troubles me. Seems so much easier to use cabac's cute wittle integwated function. *shrug*

PS: I say that since I don't know how to use either, yet I'm trying to go from one to the other. XD

...and don't tell me its as easy as replacing f8_bits_encoded with i_bits_encoded...

Wee, you can't put declarations inside an if statement. And now I have bits_c and bits_v for cabac bits and cavlc bits. Only one of which will be used. And then an if statement updating one or the other. For some strange reason, I get the feeling that this is turning out to be harder than I'd initially thought. :p

And how much are weighted error4 and weighted error8 the same? Looks like the same darn function almost. I just copy-pasted the majority of it over. Not the for loop of course...LOL @ 4 loop.

Dark Shikari

4th April 2008, 02:23

That's what I thought, good.

But it's precisely the handling of bits_encoded that troubles me. Seems so much easier to use cabac's cute wittle integwated function. *shrug*

PS: I say that since I don't know how to use either, yet I'm trying to go from one to the other. XD

...and don't tell me its as easy as replacing f8_bits_encoded with i_bits_encoded...

Wee, you can't put declarations inside an if statement. And now I have bits_c and bits_v for cabac bits and cavlc bits. Only one of which will be used. And then an if statement updating one or the other. For some strange reason, I get the feeling that this is turning out to be harder than I'd initially thought. :p

And how much are weighted error4 and weighted error8 the same? Looks like the same darn function almost. I just copy-pasted the majority of it over. Not the for loop of course...LOL @ 4 loop.Yes, they're basically the same function with a few things changed (for the different size).

Why not just use the same variable for bits--declare outside the if statement, and assign it one value if its CABAC, and another if its CAVLC?

To compensate for CABAC's bit cost being 256 times higher precision, you can multiply CAVLC's by 256.

DeathTheSheep

4th April 2008, 02:50

By that you mean
bits *= lambda2_tab[i_qp]*256; ?

And what do I do with this?
block_residual_write_cavlc(h,&cavlc_tmp,idx,h->dct.luma4x4[idx],16 - (i_ctxBlockCat == DCT_LUMA_AC));
i_ctxBlockCat isn't passed to CAVLC directly or at all in weighted_error_8, but it still plays a role here?

Well, as things stand now, it crashes upon encode, with or without cabac. :/

[edit]Er, to put things in perspective (and to check if I'm going in the right direction at all with this:

uint64_t weighted_error4( x264_t *h, int16_t dct[4][4], uint16_t inv_variance[4][4], int i_qp, int idx, int i_quant_cat, int i_ctxBlockCat )
{
uint8_t idct[4*FDEC_STRIDE];
int16_t ndct[4][4];
int i,j;
uint8_t *decpix = h->mb.pic.p_fdec[0] + 4*block_idx_x[idx] + 4*FDEC_STRIDE*block_idx_y[idx];
h->mc.copy[PIXEL_4x4]( idct, FDEC_STRIDE, decpix, FDEC_STRIDE, 4 );
memcpy( ndct, dct, sizeof(ndct) );
h->zigzagf.scan_4x4( h->dct.luma4x4[idx], dct );
h->quantf.dequant_4x4( ndct, h->dequant4_mf[CQM_4PY], i_qp );
//[i]The exe crashes at this bold line with "illegal operation"
if( i_ctxBlockCat == DCT_LUMA_AC )
ndct[0][0] = dct[0][0];
h->dctf.add4x4_idct( idct, ndct );
uint64_t error = 0;
uint8_t *pix = h->mb.pic.p_fenc[0] + 4*block_idx_x[idx] + 4*FENC_STRIDE*block_idx_y[idx];
for( i = 0; i < 4; i++ )
for( j = 0; j < 4; j++ )
{
int pix_error = idct[i+j*FDEC_STRIDE] - pix[i+j*FENC_STRIDE];
error += ((pix_error*pix_error)*inv_variance[j][i]);
}
x264_cabac_t cabac_tmp = h->cabac;
bs_t cavlc_tmp = h->out.bs;
if( h->param.b_cabac )
{
block_residual_write_cabac(h,&cabac_tmp,i_ctxBlockCat,idx,h->dct.luma4x4[idx],16 - (i_ctxBlockCat == DCT_LUMA_AC));
}
else
{
block_residual_write_cavlc(h,&cavlc_tmp,idx,h->dct.luma4x4[idx],16 - (i_ctxBlockCat == DCT_LUMA_AC));
}
error *= 38;
uint64_t bits;
if( h->param.b_cabac )
{
bits = cabac_tmp.f8_bits_encoded;
bits *= lambda2_tab[i_quant_cat][i_qp];
}
else
{
bits = cavlc_tmp.i_bits_encoded;
bits *= lambda2_tab[i_quant_cat][i_qp]*256;
}
error += (bits + 8) >> 4;
return error;
}

Dark Shikari

4th April 2008, 03:30

Try aligning ndct.

E.g. DECLARE_ALIGNED_16(int16_t ndct[4][4]);

If --no-asm fixes the crash, that's probably the problem.

This is because I added an SSE2 version of dequant, which requires alignment; when I wrote QNS, dequant didn't require alignment.