Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 21st January 2007, 23:43   #1  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
FlipHorizontal, Vertical, Turn180 alternative codes.

FlipHorizontal, Vertical, Turn180 alternative codes.

At beginning I've just tryied to see if these codes had better perfomance working in place; but in time I've finished developing
an almost whole compatible version for them with MMX, ISSE and SSE2 versions in some of them.
As always I haven't enough time to finish it so I decided to release it the way it is. So please, suggestions
bug reports etc. are welcome.

Here they are:

Rotate180() is an alternative version of the internal turn180().
The objetive of this version was to test the perfomance of working over source (in place);
when possible.
Speed improvement can arrive till *3; depending on source size, colorspace and machine.
In YUY2 colorspace for MMX machines I have adapted code from Dscaler 4.18
Copyright (c) 2002 Rob Muller. FLT_Mirror.asm,v 1.1
SSE2 and ISSE are my own versions

FHorizontal() is an alternative version of the internal fliphorizontal().
The objetive of this version was to test the perfomance of working over source(in place),
when possible.
Speed improvement can arrive till *3; depending on source size, colorspace and machine.
In YUY2 colorspace for MMX machines I have adapted code from Dscaler 4.18
Copyright (c) 2002 Rob Muller. FLT_Mirror.asm,v 1.1
SSE2 and ISSE are my own versions

FVertical() is an alternative version of the internal flipvertical().
The objetive of this version was to test the importance of working over source(in place),
instead of using BitBlt.
Speed improvement can arrive till *3; depending on source size, colorspace, and machine.

Under RGB24 in FHorizontal and Rotate180 there is not a real improved developed code(just a quick one),
except that works in place; so you should expect a small perfomance increase.

I hope you find this usefull

Version 1.5.1
Source Code and dll: http://www.iespana.es/Ardaversions/ROTATES_151.7z


ARDA

Last edited by ARDA; 11th June 2007 at 13:11. Reason: update version
ARDA is offline   Reply With Quote
Old 23rd January 2007, 07:15   #2  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Arda, a very neat set of alternate filters.

In place updating, where it can be achived, can indeed, be very fast. However one must be mindfull of the overhead if the source frame is not in fact writeable. In this case a call to MakeWriteable will copy the frame to a new VFB, so you must add the time of a full frame blit to the improved time of doing an inplace update.

Most simple MMX filters can approach the speed of BitBlt, (SSE2 version might be even faster). Various filters inside AviSynth, like GreyScale(), use the IsWriteAble call and choose between a very fast in place algorithm and a BitBlt like speed transcription algorithm. When correctly coded a transcription algorithm will always be faster than a BitBlt plus the most fast in place algorithm.

In simple linear scripts, in place algortihms will mostly win hands down.

In complicated scripts with multiple cross dependancies and temporal demands, the added overhead of MakeWriteable will surely negate any advantage.

Note! For temporal filters that wish to use an in place implementation, it is extremely important to choose the PVideoFrame to overwite that will not be need in subsequent GetFrame calls.
IanB is offline   Reply With Quote
Old 24th January 2007, 00:48   #3  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
@IanB
Quote:
Originally Posted by IanB
Arda, a very neat set of alternate filters.
Thanks for your words!

Quote:
Originally Posted by IanB
In place updating, where it can be achived, can indeed, be very fast.
Quote:
Originally Posted by ARDA
The objetive of this version was to test the perfomance of working over source (in place);
when possible
Quote:
Originally Posted by IanB
In simple linear scripts, in place algortihms will mostly win hands down.
Yes, that was the enviroment under which I've tested.

Quote:
Originally Posted by IanB
In complicated scripts with multiple cross dependancies and temporal demands, the added overhead of MakeWriteable
will surely negate any advantage.
Quote:
Originally Posted by IanB
if the source frame is not in fact writeable. In this case a call to MakeWriteable will copy
the frame to a new VFB, so you must add the time of a full frame blit to the improved time of doing an inplace update.
As when source is not writeable a bitblt is done; in such situation I guess you suggest to do
something like this:
Code:
 //false code
 if (src->IsWritable()) {
                        yes! ; so we read source, apply algo and write in place (source frame)
                        }
                        else
                        {
                        no! ; we create a new frame, read source, apply algo and write in desty frame
                        (transcription algorithm)
                        }
Various filters inside AviSynth, like GreyScale(), use the IsWriteAble call
and choose between a very fast in place algorithm and a BitBlt like speed transcription algorithm.
When correctly coded a transcription algorithm will always be faster than a BitBlt plus the most fast in place algorithm.

Speaking of greyscale, here a work where I added a kind of memset (not compatible), SSE2, ISSE and MMX.
http://forum.doom9.org/showthread.php?t=121182

If I'll try to make some test by choosing if IsWritable or not. You know I love numbers as a proof of
concepts. Remember this thread http://forum.doom9.org/showthread.php?t=121065
Soon I'll post results.

Note! For temporal filters that wish to use an in place implementation, it is extremely important to choose the PVideoFrame
to overwite that will not be need in subsequent GetFrame calls.
My apologizes in advance, if you have any time, can elaborate on this a little more?

@moderator move this to avisynth development if you consider appropiate.

Thanks ARDA

Last edited by ARDA; 24th January 2007 at 01:27.
ARDA is offline   Reply With Quote
Old 25th January 2007, 04:57   #4  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
@ARDA,

GreyScale is nice and simple to analyse but is not a good practicle target to optimise, even in C++ code it is fast so your return on investment is small. i.e. If greyscale takes 5ms per frame (200 fps) and you optimise it to be 10 times faster, 0.5ms per frame (2000 fps) but it is part of a filter chain that takes 100ms per frame (10fps) then the improvement is only 4.5 ms or 95.5ms per frame (10.47 fps). Spend your time optimizing the filter that takes 50+ms per frame!

For a GreyScale implemented using MakeWriteable, the two outcomes are :-

1. Blit the Luma and blit both Chroma planes, then memset both chroma planes.
or
2. Keep the Luma plane and memset both chroma planes.

A better implementation would use IsWriteable so the two outcomes become :-

1. Blit the just Luma plane, then memset both chroma planes.
or
2. Keep the Luma plane and memset both chroma planes.

Saving is the blits of 2 chroma planes in case 1.


For temporal filters you need to consider the cache.

If you in place overwrite a PVideoFrame that you will need next time, then it will no longer be valid in the cache and will need to be re-rendered.

If you choose to in place overwrite it when it is needed for the very last time, then you do not care if it is in the cache anymore.

There are other complications like the Reverse filter and the cache Protecting frames.
IanB is offline   Reply With Quote
Old 25th January 2007, 19:05   #5  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
@IanB

My answer to your last post in memset thread http://forum.doom9.org/showthread.php?t=121182
Soon I will update this plugin following your suggestions

Thanks ARDA
ARDA is offline   Reply With Quote
Old 3rd February 2007, 01:57   #6  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
Originally Posted by IanB
In simple linear scripts, in place algortihms will mostly win hands down.
Quote:
Originally Posted by IanB
if the source frame is not in fact writeable. In this case a call to MakeWriteable will copy
the frame to a new VFB, so you must add the time of a full frame blit to the improved time of doing an inplace update.
changelog ver 1.1

Added:
Open two branches now it works in place
or create a New Frame.

New ISSE and SSE2 codes for FHorizontal under YV12.

A lot of small changes.


In new SSE2 codes for FHorizontal I have added some comments for a friend that
is always asking me for them. I'll be updating comments.

I couldn't get too much gain in such codes (ISSE and SSE2) for FHorizontal against
a plain assembler code by using bswap instruction in my pentium4 and amd turion37
Suggestions are welcome.

Warning! Under development, but usable.

Version 1.5.1
Source Code and dll: http://www.iespana.es/Ardaversions/ROTATES_151.7z

Thanks ARDA

Last edited by ARDA; 11th June 2007 at 13:16. Reason: update version
ARDA is offline   Reply With Quote
Old 8th February 2007, 10:21   #7  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Changelog v 1.2:
Added:

Now FHorizontal works in place, creating a New Frame, and when writting to
a New Frame chooses between using Non-Temporal instruction (WC) or not.
This option is used depending on size frames and L2 cache.
It is mainly usefull when size frames are bigger than L2 cache.


Version 1.5.1
Source Code and dll: http://www.iespana.es/Ardaversions/ROTATES_151.7z

Thanks ARDA

Last edited by ARDA; 11th June 2007 at 13:15. Reason: update version
ARDA is offline   Reply With Quote
Old 9th February 2007, 15:34   #8  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Changelog v 1.3:
Added:

New ISSE and SSE2 codes for Rotate180 under YV12

Now Rotate180 works in place, creating a New Frame, and when writting to
a New Frame chooses between using Non-Temporal instruction (WC) or not.
This option is used depending on size frames and L2 cache.
It is mainly usefull when size frames are bigger than L2 chache.

Version 1.5.1
Source Code and dll: http://www.iespana.es/Ardaversions/ROTATES_151.7z

Thanks ARDA

Last edited by ARDA; 31st October 2007 at 20:49. Reason: update version
ARDA is offline   Reply With Quote
Old 31st March 2007, 00:25   #9  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Changelog v 1.4:

Added:

Now FVertical works in place; or creating a New Frame, and when writting to
a New Frame chooses between using Non-Temporal instruction (WC) or not.
This option is used depending on size frames and L2 cache.
It is mainly usefull when size frames are bigger than L2 chache.

BitBlt is used when size frame is bigger than L2 chache.
A SSE2 version of BitBlt is needed.
Anyone dares?

Version 1.5.1
Source Code and dll: http://www.iespana.es/Ardaversions/ROTATES_151.7z

Thanks ARDA

Last edited by ARDA; 31st October 2007 at 20:49. Reason: update version
ARDA is offline   Reply With Quote
Old 10th May 2007, 22:21   #10  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Version 1.5 changelog

Removed:
It does not include Camel - CPU Identifying Tool Copyright (C) 2002, Iain Chesworth anymore.

Added:
A new L2 cache size detection code. Updated for some new machines.
An error detection has been added to test this new code.

Version 1.5.1
Source Code and dll: http://www.iespana.es/Ardaversions/ROTATES_151.7z

Last edited by ARDA; 31st October 2007 at 20:53.
ARDA is offline   Reply With Quote
Old 11th June 2007, 13:17   #11  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Version 1.5.1 changelog


Added:
Updated L2 cache size detection for some new machines.
A lot of minor changes.

Version 1.5.1
Source Code and dll: http://www.iespana.es/Ardaversions/ROTATES_151.7z


Thanks ARDA
ARDA is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:36.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.