What is XviD's "kludge" edition? [Archive] - Page 2

spyder

23rd May 2003, 06:24

Ok, this may be totally absurd but...

Would it be useful to use wavelets to reduce the complexity of dark portions of an image, much as lumimasking would but maybe not so problematic? My idea is to take a 2d wavelet of the entire frame(myabe in multiple levels), discard the high frequencies, return to the original frame resolution through inverse transform, and then replace the portions that are too dark or too bright with the lower complexity ones. This may not help much with DCT though...

I was also thinking of doing the same for blurry backgrounds, a lot of times the background could be easily reduced to 1/2 resolution and still look decent when resized to full res. This of course requires some way of determining image complexity, and I don't believe there is a good way to do this...at least not for most sources ;)

I want to play with wavelet transforms in avisynth again...maybe i can get some free time soon. :) But first I have to finish my Matroska writer API.

Kyo

23rd May 2003, 09:32

Can We even call this a codec??? LOL!

Great Work! Nice Idea! Cool Bugs!

slavickas

23rd May 2003, 10:36

Originally posted by Kyo
Can We even call this a codec??? LOL!

Great Work! Nice Idea! Cool Bugs!

and only the codecs are vfw :p

bergi

23rd May 2003, 11:02

maven uses a 5/3 integer wavelet transform, but his code isn't designed for a video codec, perhaps only the transform und the rice coding, but quant selection is to slow for videos. I would use quant like this

for(i=0;i<max_level;i++) {
...
qu=(qu+1)/2;
}

and for p frame use a hadamard transform. I've made some test and it works realy great, but my test were made without halfpel, so there are many jumping block and also i think bitrate will decrease with halfpel motion vectors.

23rd May 2003, 12:47

Originally posted by spyder
Ok, this may be totally absurd but...

Would it be useful to use wavelets to reduce the complexity of dark portions of an image, much as lumimasking would but maybe not so problematic? My idea is to take a 2d wavelet of the entire frame(myabe in multiple levels), discard the high frequencies, return to the original frame resolution through inverse transform, and then replace the portions that are too dark or too bright with the lower complexity ones. This may not help much with DCT though...

I was also thinking of doing the same for blurry backgrounds, a lot of times the background could be easily reduced to 1/2 resolution and still look decent when resized to full res. This of course requires some way of determining image complexity, and I don't believe there is a good way to do this...at least not for most sources ;)

I want to play with wavelet transforms in avisynth again...maybe i can get some free time soon. :) But first I have to finish my Matroska writer API.
Nice ideas. About the blurry stuff, wanna write an RRV decision for us? RRV can be set on a macroblock-basis.

MfA

23rd May 2003, 14:29

At low bitrates hadamard tends to block, everywhere.

spyder

23rd May 2003, 16:51

Originally posted by mf
Nice ideas. About the blurry stuff, wanna write an RRV decision for us? RRV can be set on a macroblock-basis.

Heh, an RRV decision...by me?!?! LMAO...

At least we know this is possible now ;) Maybe one day when I get time to read more about how some of this stuff works I can hack on Kludge a bit...but don't expect much ;)

PS: I did some tests last night and reducing to quarter res is horrid...but 1/2 res is difficult to tell from the original...only on the blurry parts of course :P

gino25

23rd May 2003, 17:40

When will be there a new version of kludge?

I really think that this will be the greatest codec

23rd May 2003, 18:40

Here is a draft for a new GUI for kludge:
http://mf.onthanet.com/IDD_BLOAT.png

:D

Sigmatador

23rd May 2003, 20:06

ROFL :D

gino25

24th May 2003, 06:20

wonderful

Animaniac

24th May 2003, 09:28

Ahh, the world would be a much better place if all developers followed this example.

Wait, Microsoft already follows it too seriously.

sysKin

24th May 2003, 17:28

Originally posted by gino25
When will be there a new version of kludge?

I really think that this will be the greatest codec
OK, it's high time I write something to this thread ;)

1. A statement. Although I am the only kludge developer, I am not a kludge developer. :confused: Kludge is mf's creation. He just took some my experimental code and put it on a webpage.

2. I might start working (really working) on kludge when I have time. It will be based on xvid dev-api-4 so I'll have to re-start the whole thing. I don't know when it will happen.

3. While coding VHQ for bframes, I discovered some new mpeg-4 crazyness. Really, mpeg4 people were creating VLCs at random, or without thinking, or based on their hopes and wishes. But surely not based on any statistical experiments. Anyway, after discovering this, I'm more determinated to start coding kludge.
But VHQ for bframes will come first.

Radek

24th May 2003, 20:22

Don't start binding things to me know, you called your experiments "Kludge" too, I didn't have to do any binary hacking for that :D. So it's yours too. You don't need to say "it's not mine, it's mf's", cause we can say or do anything we want anyway. When you feel like stopping the whole thing, I can just close the project and pend it for deletion on sf, no difficulties there. We don't guarantee anything, so that's the beauty. No need to go make it my responsibility now, you coded it :p.

Oh and a note to everyone: while XviD is an educational project, it produces usable code. We just test how we can improve on MPEG-4 by not sticking to specs. That creates broken things, and things in general you want to avoid. So XviD is for general use, and Kludge is only for experimental use. Please don't publish things with it, because you will only create yourself nightmares, and others. This is educational and experimental ONLY. Plus it's fun for us :D.

gldblade

24th May 2003, 21:30

So, about how long until the "Be better than everyone else" and "Rape DXN" features are ready? :D

24th May 2003, 22:01

/me rapes gldblade

Sigmatador

24th May 2003, 22:13

on the paper, wavelet keyframe and h.264 delta frame --> monsterful!:D

gldblade

25th May 2003, 02:33

Ow.

Latexxx

25th May 2003, 17:19

I have a very weird idea!

Why not creating integrated I/P-frames. The frames would be in gop like I-frames, but they could only hold information which tells how to form this frame by adding/removing some stuff from existing i-frames somewhere in the movie. It would be ideal for "lo motion" movies where there are for example two cameras aimed at a person and the angle changes from camera a to camera b and then back to camera a. In cases like this it would be ideal to take an existing frame from camera a and build the new frameusing information from that.

Ps. I hope that the text above isn't too hard to read and understand.

spyder

25th May 2003, 17:41

Hmmm, this sounds very similar to a regular P-frame...maybe only reversed...I don't know that it would work.

25th May 2003, 17:44

Originally posted by Latexxx
Ps. I hope that the text above isn't too hard to read and understand.
From what I understand of it you want to have a P-frame reference an earlier coded I-frame?

Sigmatador

25th May 2003, 18:01

too complicated too estimate i think.

well not too compicated, but too long or something else. and the decoding wil be strange too ^^

Acaila

25th May 2003, 18:21

Actually P-frames referencing older I-frames is already in a standard somewhere. Just can't remember if it was H.26L or MPEG-4 or something else entirely.

Atamido

26th May 2003, 05:34

This kind of far-away referencing is supported in Matroska because you use a timecode for the referenced frame instead of just saying to use the previous or following frame.

However, playback of this kind of referencing would be a huge pain because of the buffering/seeking that would have to take place.

shlezman

26th May 2003, 06:44

Long term reference frames are a part of H.264 and are designed for exactly what Lat3x describes. There's another nice feature called weighted prediction which allows weighting of two reference blocks from two sources.

26th May 2003, 10:05

Originally posted by Pamel
This kind of far-away referencing is supported in Matroska because you use a timecode for the referenced frame instead of just saying to use the previous or following frame.

However, playback of this kind of referencing would be a huge pain because of the buffering/seeking that would have to take place.
Please go elsewhere with matroska pimping.
We know that matroska can do everything that AVI can't.

No need to remind us. When interested WE will come to YOU.

Thank you for your understanding.

Sigmatador

26th May 2003, 10:31

well calm down ^^ there's no war ^^
imho, Ok mkv is a bit young, but we can't deny this is promising ^^ and personnally i wait the day mkv will be fully implemented.

btw, i stay thinking faraway referencing is not a good idea. H.264 specs are already slow and efficient ^^ and adding the wavelet intraframe coding already a monsterful idea ^^

Atamido

26th May 2003, 16:52

I was not trying to pimp anything, I was just making the point that if you wanted to attempt this kind of faraway referencing, then there is a container that can support it.

But, it looks like Sigmatador agrees with me that its probably not a good idea right now to do this.

TheXung

27th May 2003, 05:13

I must say, it sure does seem like the matroska members do pimp their product a lot.

This thread is about discussing video compression science and on every other page, there is a blurb about how matroska would be the best container for it's features.

Atamido

27th May 2003, 07:14

I don't know about "best". I was just trying to point out a possibility when a feature that could be added couldn't be supported by something like AVI.

There is a lot of Matroska pimping that goes on, but I don't think of myself as a pimper. I only mention if if there is a specific reason to. Honestly just trying to be helpful, but I don't need to be.

DaveEL

27th May 2003, 07:59

Originally posted by Pamel
I don't know about "best". I was just trying to point out a possibility when a feature that could be added couldn't be supported by something like AVI.

Seeing as your argument with not putting b-frames in an avi with mpeg-4 is that its not a real mpeg-4 stream it does not apply to kludge, as its doesn't need to comply to any standard so packing several frames together like in an avi would be perfectly fine, As long as you make sure by the time you decode a frame all the frames it references have been passed to the decoded the container doesn't matter at all.

Of course that is assuming you don't reference a frame outside the next and previous i-frames, in that case you need to either:
a) not mark all the i-frames as keyframes in the avi until all referenced frames are contained between the two keyframes or
b) repeat the needed frames within each set of i-frames its referenced.

The real problem is the VCM assuming one in one out and yes we all know UCI is coming but its not here yet. And besides the way we work around this limitation in the VCM is now well understood anyway.

DaveEL

spyder

27th May 2003, 18:00

I hope that UCI is available for my 25th brithday but one can only dream...

It seems development has stopped.

PS: I was born in 1985 ;)

Atamido

28th May 2003, 00:35

Originally posted by DaveEL
Seeing as your argument with not putting b-frames in an avi with mpeg-4 is that its not a real mpeg-4 stream it does not apply to kludge, as its doesn't need to comply to any standard so packing several frames together like in an avi would be perfectly fine, As long as you make sure by the time you decode a frame all the frames it references have been passed to the decoded the container doesn't matter at all. Sorry, I get caught up sometimes in trying to have things done the right way. Of course, in this case, the codec is ruleless by nature, and so breaking old-school rules is not really an issue (no more right/wrong way anymore). This technique would certainly work fine for encoding to a file, and for playback.

The problems show up when trying to edit the file using "Direct Stream copy". You could edit out a minute of video, but keep 90% of the data. But, this would probably be a rare instance.

As for UCI, I don't hold out much hope in it. I think the main developer killed it before it even got off of the ground. Of course I am not one to doubt ChristianHJW's powers to keep a project going.

bergi

28th May 2003, 08:31

Here (http://www.bergos.org/files/transforms_dct.zip) is a little test program (c source code) for block transforms + program (c source code) for calculating integer values for integer dct, if someone want to test some transforms. The program open a bmp file change colorspace to yuv (xvid c), block the image, transform the blocks, quant, write test.dat, inverse transform, deblock, change colorspace to rgb24, write bmp. Compress the test.dat file with zip to get the file size of the codec image. The program is GPLed.

Some ideas to the p-frame to an earlyer i-frame:

in the encode_i_frame function:

if key_frame_interval == max_key_frame_interval {
clear_all_buffer();
sub_encode_i_frame();
tell_container_frame_is_key_frame();
} else {
store_reference_frame_in_next_buffer();
if(check_frame_is_delta_frame_of_in_buffer_frame()) {
sub_encode_p_frame(ref_frame = buffer_frame);
}
tell_container_frame_is_delta_frame();
}

sysKin

29th May 2003, 10:46

Originally posted by bergi
[B]Here (http://www.bergos.org/files/transforms_dct.zip) is a little test program (c source code) for block transforms + program (c source code) for calculating integer values for integer dct, if someone want to test some transforms. The program open a bmp file change colorspace to yuv (xvid c), block the image, transform the blocks, quant, write test.dat, inverse transform, deblock, change colorspace to rgb24, write bmp. Compress the test.dat file with zip to get the file size of the codec image. The program is GPLed.
Thanks bergi, I'll take a long look when I'm back to Kludge (+/- when xvid 1.0 is out). By the way, I'm very happy that your trasform can work on different sizes, I think I'll try shape-adaptive way of doing stuff :)

Some ideas to the p-frame to an earlyer i-frame:

in the encode_i_frame function:
[CODE]
if key_frame_interval == max_key_frame_interval {
clear_all_buffer();
sub_encode_i_frame();
tell_container_frame_is_key_frame();
} else {
store_reference_frame_in_next_buffer();
if(check_frame_is_delta_frame_of_in_buffer_frame()) {
sub_encode_p_frame(ref_frame = buffer_frame);
}
tell_container_frame_is_delta_frame();
}So you mean to only use one previous I-frame? That's all possible, we could even use more (if it wasn't for seeking problems in all decoders I can think of - even matroska won't directly help). It shouldn't even be slow :)

However, my first 'important' Kludge improvement is extending B-frames - I want almost all frames to be B-frames, with (ideally) I-frame beginning a scene and P-frame ending it (of course this is theory, but one P-frame every 25 frames looks ok when in low motion). By the way, it will be impossible to decode without big buffer (which will break sound synchronization) and I don't care :P

Regards,
Radek

duartix

29th May 2003, 12:37

By the way, it will be impossible to decode without big buffer (which will break sound synchronization) and I don't care :P Cool, new bugs are already being planned. :D :D :D

Selur

29th May 2003, 15:25

:devil: It's not a bug, it's a feature: :devil:

No more disadvantage for deaf people, now everybody needs the subs. *gig*

gino25

31st May 2003, 05:45

Originally posted by rududu author
It's he !

Has rududu reply?

rududu author

31st May 2003, 07:31

Originally posted by mf

Might you be interested in coding an integer wavelet algorithm in C for kludge? :)

Well, for the moment I can't work a lot on rududu, so working on an other project seems difficult to me.

Originally posted by sysKin

However, my first 'important' Kludge improvement is extending B-frames - I want almost all frames to be B-frames, with (ideally) I-frame beginning a scene and P-frame ending it (of course this is theory, but one P-frame every 25 frames looks ok when in low motion).

Are you sure it will be an improvement ? Look at this doc : http://ise.stanford.edu/class/ee392j/projects/jiang_report.pdf

Nicolas

Acaila

31st May 2003, 08:29

But what can you expect with a testcase of 61 frames? That's like 2-2.5 secs. Besides, the author used quantizers 10,15,20,25 which is hardly representative for what we do on this board.

I can imagine that the B-frame overhead can start becoming significant if you only use a very small video, but most normal sized videos will have both low and high motion scenes where B-frame compression can really work it's magic.

Not saying I agree with the statement that almost every frame should be a B-frame ;), but just that the experiment in the doc doesn't seem very credible to me...

rududu author

31st May 2003, 09:31

Originally posted by Acaila
But what can you expect with a testcase of 61 frames? That's like 2-2.5 secs. Besides, the author used quantizers 10,15,20,25 which is hardly representative for what we do on this board.

I can imagine that the B-frame overhead can start becoming significant if you only use a very small video, but most normal sized videos will have both low and high motion scenes where B-frame compression can really work it's magic.

Not saying I agree with the statement that almost every frame should be a B-frame ;), but just that the experiment in the doc doesn't seem very credible to me...

Why do you want a longer sequence ? To be sure of the statistics ? There are already different video sequences tested. Why do you think the results will change with a longer sequence?

What do you imagine as overhead for a B frame ? One bit to say this is not a P frame it's a B frame ? Not really significant ... And video compression is not a magic process !

You don't know the quantizer used, in the test it's a normalized quantizer parameter. Instead the bit rate used is interresting : 65 kbit/s with a 176x144 frame will give you 788kbit/s for a 640x480 video (with QP = 10).

If this experiment doesn't seem credible to you, show me an experiment where 10 consecutive B frames give you a better result than [(I)PBPBPB...] or [(I)PBBPBBPBB...]

Nicolas

Acaila

31st May 2003, 09:50

Why do you want a longer sequence ? To be sure of the statistics ? There are already different video sequences tested. Why do you think the results will change with a longer sequence?
It's been proven that the effect of B-frames is different on high or low motion scenes, so any test on optimal consequtive number, quantizer ratio or whatever should be performed on a video sequence that contains both those types of scenes in sufficient length. Something that 61 frames can definately not do.

What do you imagine as overhead for a B frame ? One bit to say this is not a P frame it's a B frame ? Not really significant ...I don't know what the overhead exactly is, but there is one. Ask the XviD developers. In my opinion in a sequence of 61 frames you're mostly looking at the effect of that overhead. But I could be wrong of course.

And video compression is not a magic process !I can't believe you took that literally :). It was just an expression.

If this experiment doesn't seem credible to you, show me an experiment where 10 consecutive B frames give you a better result than [(I)PBPBPB...] or [(I)PBBPBBPBB...]
I didn't say I believed that a high number of consecutive B-frame is a good thing, just that the test in that doc was just too small to determine what really is optimal in a realistic video sequence.

sysKin

31st May 2003, 11:33

Let me put that clear: I'm NOT talking about mpeg4 bframes!

I want the 'last' frame to be forward-predicted (I might call it F-Frame) because it has to be predicted one-way. All other frames can be predicted two-ways, and that's what I'm planning to do :)

To be clear, this is my coding order:
1. frame 1 as I-frame
2. frame 25 as F-frame
3. frame 12 as B-frame, referencing from 1 and 25
3. frame 6 as B-frame, referencing from 1 and 12
4. frame 3 (B all the time), referencing from 1 and 6
5. frame 2, referencing from 1 and 3
6. frame 4, referencing from 3 and 6
7. frame 5.......... and so on

No other codec I know uses that coding order. And there are reasons not to do that - you have to decode 5 frames before you can show frame 2. It will require decoding ahead (or 100GHz cpu), and will make many things impossible (such as: watching the movie :D :scared: :scared: :rolleyes: )

Anyways, don't worry, I just like to invent stupid bugs. This coding order is not the only bug I'm planning.

Radek

TheXung

31st May 2003, 17:17

Longer B-frame sequences is more easily possible with improvements to motion estimation

MfA

1st June 2003, 17:58

B-frames code the same stuff multiple times.

gino25

27th September 2003, 08:58

Any news about this promising codec?

27th September 2003, 21:00

Originally posted by gino25
Any news about this promising codec?
It's dormant until api4 is "done". Then there'll be something clean to build on.

sh0dan

28th September 2003, 09:37

Originally posted by sysKin
[...]
To be clear, this is my coding order:
1. frame 1 as I-frame
2. frame 25 as F-frame
3. frame 12 as B-frame, referencing from 1 and 25
[...]

Very interesting, but wouldn't it require a ME-only pass 1, for determining the best encode strategy for a sequence?
It could be interesting developing a tool that used a "ME-hint"-file to decide a good stategy for B-frame distance, to avoid a too large B-frame distance, when there is much motion.

An idea I've been curious about is "I-frames" referencing older I-frames. This could be efficient in many reallife situations, where scenes are shot with two or three different kameras. If a scenechange could reference a slightly older I-frame instead of being an I-frame itself, it might be possible to reduce the amount of I-frames.
This would of course require caching of I-frames and give slightly larger seek time.