Videocodec on wavelet base (ala JPEG2000)? [Archive]

Xanatos

17th February 2004, 23:01

Hello @ all,

is there a videocodec out which uses Wavelet compression instead of discrete cosinus transformation for the I-frames and that utilizes delta-frames?

I've tested PICVideo? Wavelet2000 Codec
http://www.pegasusimaging.com/picvideowavelet.htm
It has very good video quality at twice of the DivX-videobitrate but it uses only I-frames !

A codec on wavelet-base with delta-frames should be kicking divx/xvix/mpeg4 down from throne of videocodecs.

greetings
Xanatos

Koepi

17th February 2004, 23:10

If you knew how wrong you are.

Eduardo Tagle implemnted an experimental wavelet codec, find it on his webpage http://sf.net/projects/btwincap/

You'll very fast see that your assumption is wrong. Use deltaframes and the quality isn't acceptable at all. Without them it rocks though, but the bitrate is quite high (i sometimes use it for tv captures though it's very cpu/memory bandwidth hungry).

Btw., that codec is marked experimental for a good reason, it's not recommendable for daily usage. Unless you know how to modify code (which is there too) you won't get any support from anyone with it.

Regards
Koepi

Tommy Carrot

18th February 2004, 00:14

Rududu codec (http://rududu.ifrance.com) is the most advanced wavelet codec i know about, and it has wavelet based delta frames too. It's overally slightly less efficient than the best mpeg4 codecs, but it also has a few advantages (no blocks/ringing, better noise tolerance).

Xanatos

18th February 2004, 00:20

@Koepi:

The codec didn't work for me at all :-(

I think the problem is only the defect code.

I've tested JPEG2000 with still pictures in photoshop.
It subjective and opjective better.
so even if u use the same delta algorithm it should result in smaller video

warning: page is 5 mb big!
http://neroon.gmxhome.de/
all pictures are converted to png
the modified ones are modifed with "auto tonwertkorrektur" (german, I don't now the english term)

greetings
Xanatos

Xanatos

18th February 2004, 00:41

@Tommy Carrot:

This codec is awesome.
I think mpeg4 is dead, wavelett rulez.
I hope the project is not truely dead :-(

It would be great if this codec goes open source ...

greetings
Xanatos

justin

21st February 2004, 12:30

the Ruduru team says the Wavelet Transform is not as efficient for video as for pictures, but I'd still like to see the differences between the same thing encoded with FFT (Fast Fourier) and then encoded based on WT (Wavelet)

pieter1976

21st February 2004, 17:27

Have a look at this one:

http://www.vsofts.com/codec/wlt_demo.html

I think that it uses more than only I frames.

MfA

21st February 2004, 18:33

The problem with video coding is that the error frames are nothing like natural images ... and the better you do motion compensation the less like natural images they become.

Consider for a moment the size of transform h.264 uses for the DFD, a 4x4 sized transform. Let's assume for a moment that there is negligible correlation between pixels beyond 4x4 ... that means anything beyond 2 steps of the haar transform and even a single step of the wavelet transform with most other basis will have basis functions beyond the range of correlation.

sysKin

22nd February 2004, 04:12

Right.

The advantage of wavelets in picture coding comes from the fact that wavelet works on a picture as a whole, rather than splitting it to individual blocks, as if these blocks were independent.

However, a delta frame picture, after block-based based motion compensation, *is* independent set of blocks. In this case blocky trasform is exactly what you need.

Rududu doesn't use normal block-based motion compensation, but noone found better MC than block-based (yet) despite many efforts. When they do, wavelets can dominate the world.

Radek

PS. don't go too far with jpeg->jpeg2000 comparison. Jpeg is very simplistic, doesn't take advantage of any (almost) corellation between blocks. Intra coding in mpeg-4 is much more efficient, and it's even better in h264 again. Blocks will still live.

bot

22nd February 2004, 04:57

Excuse my noobness, but with the kind of compression jpeg2000 and other wavelet functions can perform on whole pictures, why bother with blocks and differential frames at all?

I have a feeling the answer is simple.

RadicalEd

22nd February 2004, 05:56

This is making me wonder what a 3D DCT would look like instead of MC o.o

Probably really weird :|

virus

22nd February 2004, 11:27

Originally posted by sysKin
PS. don't go too far with jpeg->jpeg2000 comparison. Jpeg is very simplistic, doesn't take advantage of any (almost) corellation between blocks.

Comparing DCT & DWT using JPEG & JPEG 2000 as examples is really unfair, since they use an entirely different entropy coder (fixed Huffman for JPEG Baseline, context-based arithmetic for JPEG 2000).
As a matter of fact, these two transforms give very similar results (and in terms of R/D they're asymptotically equivalent). So we are not going to really gain so much by changing the trasform... (this applies to Hadamard, too).

BTW JPEG uses 8x8 DCT mainly because the image statistics are strongly localized, switching to higher-order DCTs will be worse...there's little "correlation between blocks" in a still picture, let alone the DC coefficients similarity, which JPEG exploits though (but there is correlation between different DWT levels, which JPEG 2000 does not exploit... this is the only way to improve over it AFAIK)

@RadicalEd:
no one proposed the 3D-DCT I think, but there's the 3D-DWT: all the schemes proposed for video coding still use block-based MC AFAIK

virus

MfA

22nd February 2004, 21:22

Actually theres a bunch of papers on using 3D DCT. As a low complexity image sequence coder it does alright, just like non-MC 3D wavelet coders. But as I often say to GLDM, to little effect, you wont get state of the art video compression without explicit motion coding.

virus

23rd February 2004, 00:16

Originally posted by MfA
Actually theres a bunch of papers on using 3D DCT. As a low complexity image sequence coder it does alright, just like non-MC 3D wavelet coders.
Do you mean 3D DCT with or without MC? I've never seen a practical scheme working with MC'ed DCT... do you know of some working code using this? Papers are a thing, practical implementations are a whole different thing.
And yeah, obviously, no-MC = crap. We don't even need to care about non-MC'ed schemes ;)

MfA

23rd February 2004, 01:01

Without MC, just like RadicalEd asked about ...

No code, but then code is rarely released with papers ... let alone for papers on video compression.

RadicalEd

23rd February 2004, 01:47

Well, a 3D DCT with time as the z axis couldn't use MC, because there's no frames to predict motion for.. just a grid of DCT cubes. I was doing some tests last night, and it doesn't look like a time axis would be any harder to handle than regular width/height axes. Granted, DCT wouldn't save as many bits as MC, but it certainly would be interesting to play around with. Macrocubes.. tasty.

Anyway, there must be some way to exploit the benefits of wavelet intraframe compression over multiple frames without using block based MC. RLE on a set of difference frames sounds like a start, though only practical for low motion (but then, what isn't).
Ah well, I'm talking out my ass for the moment. I need to do more research on the methods that have been proposed/tested.

General Lee D. Mented

23rd February 2004, 04:52

Originally posted by RadicalEd
Well, a 3D DCT with time as the z axis couldn't use MC, because there's no frames to predict motion for.. just a grid of DCT cubes. I was doing some tests last night, and it doesn't look like a time axis would be any harder to handle than regular width/height axes. Granted, DCT wouldn't save as many bits as MC, but it certainly would be interesting to play around with. Macrocubes.. tasty.

Anyway, there must be some way to exploit the benefits of wavelet intraframe compression over multiple frames without using block based MC. RLE on a set of difference frames sounds like a start, though only practical for low motion (but then, what isn't).
Ah well, I'm talking out my ass for the moment. I need to do more research on the methods that have been proposed/tested.

Actually you can still search for motion via comparing the cubes to each other under translation. I.e, is cube A the same as cube B but shifted 2 frames later 4 pixels up and 6.25 pixels left, etc. The problem is your N^2 motion search algorithm becomes an N^3 motion search algorithm, and that's going to be kinda slow.

There's other things that can be done but MFA doesn't believe me and I'm still stuck on bugs that I'm not getting much help with.

pieter1976

2nd March 2004, 20:04

I think Wavelets are very similar to interpolation.

I wonder if better compression ratio's could be achieved using better interpolations.

http://audio.rightmark.org/lukin/graphics/resampling.htm

This is dynamic interpolation using edge enhancement.

MfA

2nd March 2004, 22:06

Wavelets are a lot of things, you could fit such methods into nonlinear/adaptive wavelets.

d'Oursse

2nd March 2004, 22:20

Originally posted by MfA
Wavelets are a lot of things
mathematically, "Wavelets" are an orthonormal basis of L^2(R) which satisfies specific properties.

MfA

2nd March 2004, 23:15

Meh, okay ... the wavelet transform can be a lot of things then (transforms build on second generation wavelet framework with non-linear/adaptive predict/update steps are wavelet transforms in common lingo). I sense most mathematicians dont particularely like the whole second generation wavelet thing ... but hey, you dont get to define what words mean all by yourself :)

From an engineering point of view the lifting scheme was sheer brilliance ... it actually does reduce the main problem of designing a good wavelet transform to designing a good interpolator. So there is a connection IMO.

d'Oursse

2nd March 2004, 23:28

Originally posted by MfA

From an engineering point of view the lifting scheme was sheer brilliance.
from a mathematician point of view (the one of one of my professsors, who is a mathematician) too :)

edit : Mfa, please look at neuron's forum, i have asked a question, maybe you will be able to help me.

General Lee D. Mented

3rd March 2004, 00:11

Originally posted by MfA
Wavelets are a lot of things, you could fit such methods into nonlinear/adaptive wavelets.

Heh, except that making an adaptive wavelet transform is probably going to really hurt performance by several orders of magnitude, unless you have a way to "cheat" and can infer things quickly from the attributes of the data. Otherwise selecting what basis function to use by trial and error is going to take forever.

There's also multibasis wavelet transforms, where you transform, find the significant coeficients for the given basis, then inverse transform, subtract from the original data, and then run a different transform basis function on the remainder, etc. In theory using this method could produce very good compression if you had the right functions for the right data. Think of a music sample where you have the complex wave patterns of each instruments, then you transform by each individually and can deduce the component instruments and their frequency and amplitude at a given point in time. It's like wav->midi conversion and in theory could produce similar ratios. Voice would be more difficult but not impossible. The key would be getting the right basis functions, which would probably mean some method of scanning the data for predictable wave patterns. Again, this would probably take eternity right now without some kind of breakthrough hardware or algorithm.

Still, given moore's law lasts long enough, there's still places to go in the future.

MfA

3rd March 2004, 01:05

Not slower than a BWT though :p

The transform you mention sounds more like matching pursuit than wavelets. The closest thing which comes to mind in the wavelet realm is the fast wavelet packet transform.

Actually computing least square optimal filters for each step in a wavelet transform can probably be done in real time using the lifting framework. The update/predict steps can be optimized using levinson-durbin (only with seperable row/column filtering though, wont work with a quincunx based transform) which is pretty fast.

This is part of one of the ideas which I have shelved away for the time I ever stop being lazy, I wanted to optimize the predict/update steps for each 16x16 block at each step in the dyadic wavelet transform for lossless coding.

General Lee D. Mented

3rd March 2004, 02:18

Originally posted by MfA
Not slower than a BWT though :p

Assuming most transforms are computable in Nlog(N) time like FFT, then when we apply multiple transforms we need to multiply by K, which is the number of basis functions. BWT is N^2, so if K > log(N) then K*Nlog(N) > N^2. Therefore it could be slower than a BWT in many cases.

One of BWT's major downsides is that its compute time is fairly symmetric on decompress side. I may consider a slower compress side algorithm that decompresses faster yet has better compression. Haven't really gotten to the point of testing entropy coders yet though, still having some bugs with media players not recieving the right colorspace codes.

vinouz

4th March 2004, 10:07

Nope. If K > (N/log(N)) BWT is smaller. log(N)*log(N) = 2log(N).

In sounds I had an idea when I was around 15 (in '93) which would be to try to take FFT (DCT), take its biggest value and calculate a shape of harmonics around it the most covering possible (biggest surface) but not bigger than any coeff. Then coding it, removing it and doing the same with the rest until a threshold (surface remaining/initial surface, number of harmonics coded) is attained. This would be like "take the N most important sounds, code them ". A set of harmonics would be coded by it's center, it's decreasing laws (how to define them ? bell shape, triangle, dirac forms... Which parameters ?), and its amplitude.
I also wanted to code the set of harmonic's amplitude exponentially, on the scale of the maximum level of the entire transform (could be of the last coded set of harmonic's level maybe). (Maybe this last detail is not very useful, if using CABAC also.)

Anyone interested ?

Also, what about this article who proposed to code texture separately from the rest, for regenerating if later. Is seemed great on JPEG2000. Can't it be added to a (DCT, wavelet based) codec ? How's its cost. I think it could make wonders with blurry codecs, such as RV...

MfA

4th March 2004, 11:26

It is not so much coding the texture ... as coding the texture statistics. If you just coded the texture itself you wouldnt gain anything ... the gain comes from only synthesizing something which looks like the original texture, but in a PSNR sense it will actually do nothing at best and get worse if you are unlucky. I agree, it could be nice ... getting it to work well with motion compensation would be tricky though.

As for the first idea, I dont quite see why you would expect harmonic relations to be of much relevance to images. For audio sure, for images I wouldnt expect it. Especially not with the small block transforms necessary to get good compression (transforms without localization in space suck for image compression).

vinouz

4th March 2004, 14:30

I was talking about coding the texture statistics, too. I don't remember the paper, but it has been mentioned in the forums here earlier.
I'm not talking about PSNR here. That's precisely for this kind of things PSNR to my taste is not that relevant. (But I admit, processingly wise, it's a good tool).
I don't remember the way they described texture information, but I'm not sure at all it was relative to the color values in the image. So if it's independant, texture description may have nothing to do with motion estimation.

For my second part, it had nothing to do with image, but I threw it there aside. Sorry for not having done a separate thread in audio encoding. It was an example of a different coding approach than the traditional quantization. I think it's still worth being considered.

Vincent.

MfA

4th March 2004, 15:02

It's called WITCH (http://dewww.epfl.ch/~nadenau/Research/Pub_12.htm), and I probably was the one who mentioned it :)

Im not concerned about motion estimation, but motion compensation ... I doubt just synthesizing a new independent texture each frame would look very good, it would pop. On the other hand coded motion isnt always real motion so the statistics can change.

Nothing insurmountable, just something to keep in mind.

vinouz

4th March 2004, 19:25

Yes, that was it and tht was you :o).
I understand the concern here. You think that as two consecutive images with generated textures won't have it generated coherently from one frame to another it vould look like noise and not like texture... (kind of irregular film grain).
The idea that comes to my minde here - apart from trying to see if it doesn't look good anyways, which would be great - is to have maybe an own motion compensation for the distortion of the texture. If the texture (random process, iterative process ?) has a seed the evolution of this seed spatially could be distorted accordincly to motion vectors. Maybe reusing motion vectors when it's the same kind of texture properties that apply to the block than the one that the majority of the surface it's taken from had, and having a new seed when the texture identified differs from the majority of the surface it's taken from.