Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Usage
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 12th August 2002, 10:36   #201  |  Link
vlad59
Vlad, the Buffy slayer
 
vlad59's Avatar
 
Join Date: Oct 2001
Location: France
Posts: 445
Quote:
Originally posted by bb
@vlad59:
I would like to test the Convolution3D with standard and full-1 matrix at the same time; this way it's easier to compare.
BTW: What do you say about the possibility of optimizing the algorithm for a full-1 matrix (in fact you wouldn't need a matrix at all in this special case)?

bb
I don't fully understand, what I'll do is to provide a new Convolution wich has a new parameter to choose between the two matrix.
I hope that's what you want

Optimizing for the full 1 matrix is not easy, let's take an example :
Code:
Tresholded matrix are : 
10 11 11     13 15 5     10 11 11     
10 11 11     13 15 5     10 11 11     
10 11 11     13 15 5     10 11 11     

To find my new convoluted values I've to add all the matrix values :
Sum = 10 + 11 + 11 + 13 + 15 + 5 + .........

And then divide by the numbers of matrix value i.e. 9+9+9 = 27

new values = sum / 27

That division is the problem :
 - I have no MMX opcode to perform a division (or I missed something)
 - and a simple division will be way too slow.

That's why I first tried having a weight matrix like that : 
1 1 1    1 1 1    1 1 1  
1 2 1    2 2 2    1 2 1
1 1 1    1 1 1    1 1 1

With this I can use a left shift to divide by 32
But with this the compressibility test is worse than with the standard matrix
So if anybody has an idea ....
I have got no time to think about it this weekend, thursday is off in France so you can expect a release in 2 or 3 days max.
__________________
Vlad59
Convolution3D for avisynth 2.0X : http://www.hellninjacommando.com/con3d
Convolution3D for avisynth 2.5 : http://www.hellninjacommando.com/con3d/beta
vlad59 is offline   Reply With Quote
Old 12th August 2002, 11:22   #202  |  Link
dividee
Registered User
 
Join Date: Oct 2001
Location: Brussels
Posts: 358
If you don't have division... use multiplication.
Basic Idea:
If you accumulate the sums in words in an MMX register, load an MMX register with four times 65536/27. The usual way in C code would be to multiply by that value and then >> 16. In MMX code, you can use PMULHW (or PMULHUW in ISSE) so you don't even have to do the shift.

Take care of rounding (if you don't want Acaila to complaint about a green tint ):
in C code you could do:
(sum * 65536/27 + 32768) >> 16
but that doesn't work with PMULHW (3DNow has a nice PMULHRW, though).
The technique I used in MMX code can be described in C code as:
((sum<<1) + sum) * 32768/27) >> 16

It becomes more difficult to do the "divisions" in parallel if the divisor can vary for each word.
In TemporalSoften(2), I had independent divisors for each word that can vary from 1 to 16, so I constructed a big lookup table with 16^4 QWORD entries. I packed four divisors in a WORD to use as an index in the table an so was able to do 4 "divisions" in parallel.
__________________
dividee

Last edited by dividee; 12th August 2002 at 11:24.
dividee is offline   Reply With Quote
Old 12th August 2002, 13:10   #203  |  Link
vlad59
Vlad, the Buffy slayer
 
vlad59's Avatar
 
Join Date: Oct 2001
Location: France
Posts: 445
@dividee

That's what I call a clear explaination, thanks a lot.

I'll have a look to TemporalSoften code to be sure I have understood.

thanks again, I'm now obliged to optimize better Convolution3D
__________________
Vlad59
Convolution3D for avisynth 2.0X : http://www.hellninjacommando.com/con3d
Convolution3D for avisynth 2.5 : http://www.hellninjacommando.com/con3d/beta
vlad59 is offline   Reply With Quote
Old 12th August 2002, 14:01   #204  |  Link
bb
Moderator
 
bb's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 2,665
@vlad59:
Just another idea: what about using a 2x2x2 or a 4x4x2 matrix?

bb
bb is offline   Reply With Quote
Old 12th August 2002, 15:30   #205  |  Link
vlad59
Vlad, the Buffy slayer
 
vlad59's Avatar
 
Join Date: Oct 2001
Location: France
Posts: 445
@bb

In theory it's possible but with those matrix you have not a center pixel. I think it's annoying.
I thought about a 5x5x5 matrix, but it will be slower and I first want to have a stable Convolution3d before changing too much the basis.
But if you explain me better what could be cool with this, I can change my mind. You usually have great ideas .

I also thought of not using the previous and next frame when there is too much difference between prev, current and next pixel (only luma will be checked). I should help a lot in fade in or fade out scene.
__________________
Vlad59
Convolution3D for avisynth 2.0X : http://www.hellninjacommando.com/con3d
Convolution3D for avisynth 2.5 : http://www.hellninjacommando.com/con3d/beta
vlad59 is offline   Reply With Quote
Old 12th August 2002, 16:46   #206  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Vlad -

As a small point on performance issues you can usually expand the matrix horizontally faster than vertically. That's because the data will be coming from the same cache lines. So something like a 3x5 will probably be faster than a 4x4 by more than you might expect.

- Tom
trbarry is offline   Reply With Quote
Old 12th August 2002, 19:25   #207  |  Link
bb
Moderator
 
bb's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 2,665
@trbarry:
You are probably the master of filter optimisation. What do think about my proposal of interleaving the frames before filtering, then de-interleave again? Like you store 1st line of 1st frame, then 1st line of 2nd frame, then 1st line of 3rd frame, then 2nd line of 1st frame, etc., like
1111111111111111 (first line)
2222222222222222 (first line)
3333333333333333 (first line)
1111111111111111 (second line)
2222222222222222 (second line)
3333333333333333 (second line)
...

This way the pixels you have to touch during filtering would be close together in memory, which - I think - would improve cache hits.

Non-cubic matrices will have an impact on the picture, as far as I can see. You could get the effect that it smoothes more horizontally than vertically. I also thought about dropping the edge pixels, because they have the biggest distance to the center pixel. The perfect 3D matrix would probably be a sphere, not a cube... What do you think?

@vlad59:
You're right, not having a center pixel is annoying, I thought of that problem. I had the idea of something like a running total; the value of a pixel would be updated more than once while the algorithm is "passing by". I still have to think that over, but I can post my idea if you find it useful.

There's a VirtualDub filter using a 5x5x2 matrix, but I was a little disappointed with that one...

bb
bb is offline   Reply With Quote
Old 12th August 2002, 20:13   #208  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
bb -

Dunno, you'd have to try it. But in order to do this you would have to copy the data 2 extra times and I'm not sure whether that would make up for the (hopefully) better arrangement. And the gain would be processor dependent since the different machines have different cache sizes and stuff.

- Tom
trbarry is offline   Reply With Quote
Old 12th August 2002, 21:33   #209  |  Link
dividee
Registered User
 
Join Date: Oct 2001
Location: Brussels
Posts: 358
Why 2 extra times ? Or do you count read & write as 2 times ?
TemporalSoften (and TemporalSmoother) use a similar idea as bb exposed (I didn't invent it, that's the technique Avery used in Temporal Smoother for Virtualdub), except these filters interleave the clips on a per pixel basis instead of per line.
So you have ,with bb's notation (center = frame 2)
123123123...

Since these filters only operate on the temporal axis, it makes the inner loop works totally linearly. As long as the clip is read in sequential order, all you have to do for the next frame is replace the oldest pixels by the newest ones:
423423423 (for frame 3)
and then
453453453 (for frame 4)
The pixel replacement is also done in the inner loop.

The arithmetic of using such a circular buffer is sometimes a bit complicated (especially when you try to add full-scene change detection in the mix ), but I think it's worth it.
__________________
dividee
dividee is offline   Reply With Quote
Old 12th August 2002, 23:39   #210  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Maybe I didn't think it through well enough. I was thinking that if you wanted to work with the data in a different format that you would also have to copy/reformat the results back at some point, but that's not necessarily true I guess.

But again, I guess you would have to try it and see if what you gain makes up for the up-front overhead.

When I wrote Greedy/HM (DScaler GreedyHMA version) I reformated the data into what I thought was a clever arrangement that got around some then-current DScaler limitations but in hind sight I'm not sure it was worth it. And I'm almost certain that if I went to the work of pulling out that stuff for the Avisynth version it would go faster. But maybe I just did something silly when I designed the data structure I used.

I guess the way I think about it now is that it is not as important to keep everything close together in memory as it is to minimize the number of cache lines that have to be repeatedly filled on each pass through the filter. So if you hit say, 6 or 8 cache lines (of 32, 64, or 128 bytes each), but then move 8 bytes to the right and it is still the same 6 or 8 cache lines, then you sort of get it for free. But when you move to the next line of the screen then it is probably going to be a different cache line or at least a different (slower) level of memory cache.

There, I have thoroughly confused myself. What did I just say?

- Tom
trbarry is offline   Reply With Quote
Old 13th August 2002, 19:50   #211  |  Link
vlad59
Vlad, the Buffy slayer
 
vlad59's Avatar
 
Join Date: Oct 2001
Location: France
Posts: 445
@dividee
you've made a typo when you explain me how to divide :
instead of :
((sum<<1) + sum) * 32768/27) >> 16
you should have written :
((sum<<1) + 1) * 32768/27) >> 16

I just coded it, it works without any problem, thanks again.

@all
Tomorrow you'll have the new matrix (full 1).
__________________
Vlad59
Convolution3D for avisynth 2.0X : http://www.hellninjacommando.com/con3d
Convolution3D for avisynth 2.5 : http://www.hellninjacommando.com/con3d/beta
vlad59 is offline   Reply With Quote
Old 13th August 2002, 20:39   #212  |  Link
dividee
Registered User
 
Join Date: Oct 2001
Location: Brussels
Posts: 358
Indeed I made a typo, but your correction is also wrong; it should be
((sum<<1)+27)*(32768/27))>>16 (or: ((sum+(27/2))*(65536/27))>>16 )
which is equivalent to the previously given
(sum * 65536/27 + 32768) >> 16
__________________
dividee

Last edited by dividee; 13th August 2002 at 20:42.
dividee is offline   Reply With Quote
Old 13th August 2002, 20:45   #213  |  Link
Koepi
Moderator
 
Koepi's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 4,454
walk like an egyptian...
Koepi is offline   Reply With Quote
Old 13th August 2002, 21:22   #214  |  Link
vlad59
Vlad, the Buffy slayer
 
vlad59's Avatar
 
Join Date: Oct 2001
Location: France
Posts: 445
@dividee

Of course you're right dividee, I really regret all the time I spent sleeping instead of listening to my Maths teacher.
Sorry to be such a pain in ....

@Koepi
I love the way you walk

@all

thanks to dividee, you can now download the new version of convolution3D (beta 1) with sources (not commented at all , I'll make that in 2 days).

You now have the full 1 weight matrix. The compressibility is better with this matrix.

No more speed for now, but I'll need a day or 2 to understand the discussion between Tom and Dividee

EDIT : Removed old attachment
__________________
Vlad59
Convolution3D for avisynth 2.0X : http://www.hellninjacommando.com/con3d
Convolution3D for avisynth 2.5 : http://www.hellninjacommando.com/con3d/beta

Last edited by vlad59; 25th August 2002 at 13:39.
vlad59 is offline   Reply With Quote
Old 13th August 2002, 21:32   #215  |  Link
Koepi
Moderator
 
Koepi's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 4,454
vlad:

wanna walk that way as well?

then we could change to RunDMC feat. airosmith(?) - walk this way... and nod our heads like will smith.

Maybe that helps understanding all those formulars.

I'm glad that I finally found out that there's a simple shift operator in c/++ - guess what I had to code around that in the xcdbackupcreator :-/

Dancing,

Koepi

NP: Velvet Acid Christ - Alien Surfaces
Koepi is offline   Reply With Quote
Old 14th August 2002, 02:10   #216  |  Link
baz00ie
Registered User
 
Join Date: Dec 2001
Posts: 36
Thanks for your work, Vlad59.
I've noticed an increase in quality and speed with the latest release.

take care
baz
baz00ie is offline   Reply With Quote
Old 14th August 2002, 09:47   #217  |  Link
vlad59
Vlad, the Buffy slayer
 
vlad59's Avatar
 
Join Date: Oct 2001
Location: France
Posts: 445
@baz00ie

First, thanks for testing and using my filter.
Increase in speed ??, I've done nothing yesterday to speed up Convolution3D . I'll make benchs tonight.
__________________
Vlad59
Convolution3D for avisynth 2.0X : http://www.hellninjacommando.com/con3d
Convolution3D for avisynth 2.5 : http://www.hellninjacommando.com/con3d/beta
vlad59 is offline   Reply With Quote
Old 17th August 2002, 22:18   #218  |  Link
vlad59
Vlad, the Buffy slayer
 
vlad59's Avatar
 
Join Date: Oct 2001
Location: France
Posts: 445
beta 2

Here is the beta 2 of Convolution3D.

The main changes are the new temporal tresholds (thanks Tom for the idea) that will allow to have better compressibility and less ghosting.

This version shouldn't be faster than the beta 1.

I change some part of the internal engine, so if you see some difference, post here.

thanks in advance for testing.

I made some tests :
On my torture test (an old noisy anime badly mastered) :
Convolution3d (1, 12, 20, 8, 8, 0)
and
TemporalSmoother (4)
have roughly the same compressibility test (51.8 for C3D and 50.8 for TS)
but TemporalSmoother produce some ghosting and handle very badly fade in scene (C3D was also bad but somehow better than TS, Tom's STMedianFilter was the best for this scene).
On still scene TS is way better but lose some details.

I'll make new tests with MAM tomorrow.
__________________
Vlad59
Convolution3D for avisynth 2.0X : http://www.hellninjacommando.com/con3d
Convolution3D for avisynth 2.5 : http://www.hellninjacommando.com/con3d/beta
vlad59 is offline   Reply With Quote
Old 18th August 2002, 01:07   #219  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Vlad -

I just took a peek at your code. Looks like you've done a pretty good job optimizing in MMX.

I even learned something new from it. I hadn't realized the pinsrw and pextrw instructions could use general purpose register operands. Cool.

- Tom
trbarry is offline   Reply With Quote
Old 18th August 2002, 07:49   #220  |  Link
vlad59
Vlad, the Buffy slayer
 
vlad59's Avatar
 
Join Date: Oct 2001
Location: France
Posts: 445
Quote:
Originally posted by trbarry
Vlad -

I just took a peek at your code. Looks like you've done a pretty good job optimizing in MMX.

I even learned something new from it. I hadn't realized the pinsrw and pextrw instructions could use general purpose register operands. Cool.

- Tom
Thanks Tom, I read a lot of code (from you and dividee mainly) to understand better asm and learn new tips. I think I learned something . But I'm still not satisfied with my code, there is still a lot to do (especially to comment more).

Yep I used pinsrw and pextrw to compute each luma value individually, it cost a lot of time, and I still don't know if it really usefull.
__________________
Vlad59
Convolution3D for avisynth 2.0X : http://www.hellninjacommando.com/con3d
Convolution3D for avisynth 2.5 : http://www.hellninjacommando.com/con3d/beta
vlad59 is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:57.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.