Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 25th August 2015, 16:09   #1  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
FlipVertical and New BitBlt

This is an update and different development for flipvertical, at least one part of it
that can work in place and also use avx instruction in new machines.
To avoid conflict names you must use as;

Fvertcal()

If you have my old flips.dll in your avisynth plugin folder you should replace it by
this new Vertical.dll

But actually this update was an excuse to develop a new bitblt, something I started many
years ago and has been sleeping in my disks and changing from one machine to another at least
for the last eight years, and as I never arrive to finish this project, I release it now the
way it is.
This new bitblt has only been tested in this plugin and others of my own use, never tested
deeply as a substitute of the internal one, only a few tests, so I cannot guarantee for now
full compatibility and free of bugs. It makes use of sse2, Ssse3 and avx instructions depending
on the machine on which is running.

This project includes four files from Agner Fog's libraries, cachesize32.asm, cputype32.asm,
instrset32.asm, unalignedfaster32.asm and some slightly modified subroutines from memcpy32.asm
You can find them in http://www.agner.org/optimize/asmlib.zip
All original Agner Fog's sources are also included in this file

Version 1.0 Fvertcal.7z
Version 1.0 Fvertcal.zip

Version 1.01 Fvertcal.7z

Version 1.002 Fvertcal.dll

Version 1.003 Fvertcal.dll

Version 1.004 Fvertcal.dll

Version 1005 Fvertical.dll


I hope this can be usefull
ARDA

Last edited by ARDA; 15th September 2015 at 01:46. Reason: update version
ARDA is offline   Reply With Quote
Old 25th August 2015, 16:32   #2  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quick test on my i5-2500K (Sandy Bridge):

Script:
Code:
blankclip(length = 1000, width = 5000, height = 3000, color=$005B8B).killaudio().assumefps(50, 1)
#flipvertical()
#fvertical()
Result with "flipvertical()"
Code:
[Runtime info]
Frames processed:               1000 (0 - 999)
FPS (min | max | average):      124.8 | 148.5 | 141.8
Memory usage (phys | virt):     121 | 120 MB
Thread count:                   1
CPU usage (average):            24%
Time (elapsed):                 00:00:07.050
Result with "fvertical()"
Code:
[Runtime info]
Frames processed:               1000 (0 - 999)
FPS (min | max | average):      121.3 | 141.3 | 135.9
Memory usage (phys | virt):     121 | 120 MB
Thread count:                   1
CPU usage (average):            23%
Time (elapsed):                 00:00:07.360
This is on XP, so no AVX optimizations used.
Groucho2004 is offline   Reply With Quote
Old 25th August 2015, 16:42   #3  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
If you have any doubt about the performance of any filter, I propose the following script

MPEG2Source("your source")
# or any source you like and use always the same to get a little more accurate benchmarks
# and test always the same frames each time. 9000 frames it is a good quantity for this script.

#TemporalSoften(4,8,8,15,2) # use this line to force a non writable src and test when a new video frame
# is created by your filter or not. It is just an example.

AvsTimer(frames=1000, name="ANYONE",type=3, frequency=x?, total=false, quiet=true)# use your cpu frequency

# Put here your filter to benchmark

#flipvertical()
#fvertical()

AvsTimer(frames=1500 ,name="ANYONE",type=3, frequency=x?, difference=1, total=false)# use your cpu frequency

Open the scipt in virtualdub, set direct stream copy, set an initial frame and an end frame.
Open debugview(google), set a filter highlight in debugview, in this example *ANYONE*
Go back to virtualdub and Run video analysis pass. You will see in debug view windows the results every 1500 frames.

If anyone knows and wants to propose any other more accurate method to benchmark, please post
here to discuss it.

I hope this can be usefull
ARDA
ARDA is offline   Reply With Quote
Old 25th August 2015, 16:46   #4  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
@Groucho2004

The variation that your benchmark shows is too small, and your test is measuring blanckclip as well, and the efect it has on memory
Please try the method I propose and tell me what results you have

Thanks ARDA
ARDA is offline   Reply With Quote
Old 25th August 2015, 16:51   #5  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by ARDA View Post
If you have any doubt about the performance of any filter, I propose the following script

MPEG2Source("your source")
# or any source you like and use always the same to get a little more accurate benchmarks
# and test always the same frames each time. 9000 frames it is a good quantity for this script.

#TemporalSoften(4,8,8,15,2) # use this line to force a non writable src and test when a new video frame
# is created by your filter or not. It is just an example.

AvsTimer(frames=1000, name="ANYONE",type=3, frequency=x?, total=false, quiet=true)# use your cpu frequency

# Put here your filter to benchmark

#flipvertical()
#fvertical()

AvsTimer(frames=1500 ,name="ANYONE",type=3, frequency=x?, difference=1, total=false)# use your cpu frequency

Open the scipt in virtualdub, set direct stream copy, set an initial frame and an end frame.
Open debugview(google), set a filter highlight in debugview, in this example *ANYONE*
Go back to virtualdub and Run video analysis pass. You will see in debug view windows the results every 1500 frames.

If anyone knows and wants to propose any other more accurate method to benchmark, please post
here to discuss it.

I hope this can be usefull
ARDA
I measured it with AVSMeter. Its timer is very accurate (particularly considering the timer peculiarities with multi-core CPUs) and I can't see the advantage in using AVSTimer for such a simple script.

I used "blankclip" instead of a "real" source because it's extremely fast and does not add any overhead.
Groucho2004 is offline   Reply With Quote
Old 25th August 2015, 16:59   #6  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by ARDA View Post
@Groucho2004

The variation that your benchmark shows is too small, and your test is measuring blanckclip as well, and the efect it has on memory
Please try the method I propose and tell me what results you have

Thanks ARDA
How about you post some results?
Groucho2004 is offline   Reply With Quote
Old 25th August 2015, 17:27   #7  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
In my haswell laptop. intel family 6 model 45h
Test done with the above method described.
with a size clip of 5000 x 3000 (Y8)
I will be doing more tests but please give some time.

with avstimer Fvertical()
VirtualDub.exe [91497] ANYONE = 302 fps
VirtualDub.exe [92997] ANYONE = 308 fps
VirtualDub.exe [94497] ANYONE = 309 fps
VirtualDub.exe [95997] ANYONE = 306 fps
VirtualDub.exe [97497] ANYONE = 289 fps
VirtualDub.exe [98997] ANYONE = 305 fps

with avstimer Flipvertical()
VirtualDub.exe [91499] ANYONE = 263 fps
VirtualDub.exe [92999] ANYONE = 268 fps
VirtualDub.exe [94499] ANYONE = 270 fps
VirtualDub.exe [95999] ANYONE = 270 fps
VirtualDub.exe [97499] ANYONE = 270 fps
VirtualDub.exe [98999] ANYONE = 268 fps

Thanks ARDA

Last edited by ARDA; 25th August 2015 at 18:05.
ARDA is offline   Reply With Quote
Old 25th August 2015, 18:04   #8  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by ARDA View Post
@Groucho2004

The variation that your benchmark shows is too small, and your test is measuring blanckclip as well, and the efect it has on memory
Please try the method I propose and tell me what results you have

Thanks ARDA
I get the same results with your method but only when I use the default frequency of my CPU (3300 MHz). I actually have it overclocked to 4000 MHz.
AvsTimer obviously uses the rdtsc instruction to measure time based on the CPU time stamp but this is highly unreliable, especially with modern CPUs.
Groucho2004 is offline   Reply With Quote
Old 25th August 2015, 23:05   #9  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
AvsTimer obviously uses the rdtsc instruction to measure time based on the CPU time stamp
but this is highly unreliable, especially with modern CPUs.
Extracted from avstimer docs:
Quote:
The default method type=2 is based on the windows functions GetThreadTimes,
which measures the time the filter chain spends within the thread, by which is
it is called. This is the most accurate method as long as no other threads are
used by the filter chain. Currently I know of no example which violates this
assumption. However, as hyperthreading becomes more and more fashionable times
may change. If one uses the option type=1, then time is measured with GetProcessTimes.
Then the time of the process, which runs the Avisynth script, is measured,
while the process is executing the filter chain. However, some threads of this
process, which have nothing to do with the filters, but run parallel may
artificially reduce the frame rate. If type=0 or type=3 is used, then instead of
process or thread time the absolute time is taken. Then also other processes may
influence the frame rates to the downside. If type=0 then the function
QueryPerformaceCounter is used, if type=3 then the cpu instruction RDTSC is used.
type=3 has certainly the smallest overhead, but during initialisation 1 second is
necessary to determine the cpu frequency and on some notebooks the cpu frequency
may change over time obliterating all the results. To avoid the delay one may
also specify the cpu frequency with the frequency option. This must happen with
the first type=3 timer or any other timer befor the first type=3 timer otherwise
the frequency option is ignored. Thus AvsTimer(type=3, frequency= 1300) defines
a type= 3 timer for a cpu with 1300 MHZ. One can get the exact cpu frequency by
running a type= 3 timer without the frequency option. Then the exact cpu frequency
is displayed in the debugview window. Timers, which are paired together by the
difference option, shouldbe of the same type, otherwise the results do not make
sense and may even yield negative frame rates. Timers of type!= 2 have an
additional advantage. With such timers the frame rate of the entire process
running the script - we call this the total frame rate - can be measured and
is displayed by default.

Yes in the method I proposed, type=3 difines that rdtsc wil be used, if you
prefer QueryPerformaceCounter counter use type=0. You will get different results
but almost sure with the same distance between them.
Anyway for more accurate tests we should have to set real time priority and set
a thread affinity in the source plugin code, but that is not real life and I think
it wouldn't run under Windows XP(not sure)
There are a lot of academic discussion here and there all over the net about which is
the most accurate method to benchmark codes, I wouldn't like this thread become
about this subject.
I did not do the benchmarks with the example in RGBB32 cause it would take to much time
and with this resolution(5000x3000) new bitblt is also applied with non temporal stores.
In my machine and with a y8 clip here are the results;
Code:
Source= 5000x3000 (Y8)
AvsTimer(frames=1000, name="ANYONE",type=0, frequency=1700, total=false, quiet=true)
fvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=0, frequency=1700, difference=1, total=false)

Use type=0 QueryPerformaceCounter
VirtualDub.exe	AvsTimer 0.8.1
VirtualDub.exe	AvsTimer 0.8.1
VirtualDub.exe	[91499] ANYONE = 306 fps
VirtualDub.exe	[92999] ANYONE = 305 fps
VirtualDub.exe	[94499] ANYONE = 305 fps
VirtualDub.exe	[95999] ANYONE = 305 fps
VirtualDub.exe	[97499] ANYONE = 310 fps
VirtualDub.exe	[98999] ANYONE = 310 fps

Use type=2 GetThreadTimes
VirtualDub.exe	[91499] ANYONE = 305 fps
VirtualDub.exe	[92999] ANYONE = 317 fps
VirtualDub.exe	[94499] ANYONE = 311 fps
VirtualDub.exe	[95999] ANYONE = 323 fps
VirtualDub.exe	[97499] ANYONE = 331 fps
VirtualDub.exe	[98999] ANYONE = 313 fps

Use type=3 RDTSC
VirtualDub.exe	[91497] ANYONE = 302 fps
VirtualDub.exe	[92997] ANYONE = 308 fps
VirtualDub.exe	[94497] ANYONE = 309 fps
VirtualDub.exe	[95997] ANYONE = 306 fps
VirtualDub.exe	[97497] ANYONE = 289 fps
VirtualDub.exe	[98997] ANYONE = 305 fps
 

************************************************************************************
Code:
Source= 5000x3000 (Y8)
AvsTimer(frames=1000, name="ANYONE",type=0, frequency=1700, total=false, quiet=true)
flipvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=0, frequency=1700, difference=1, total=false)

Use type=0 QueryPerformaceCounter
VirtualDub.exe	[91499] ANYONE = 262 fps
VirtualDub.exe	[92999] ANYONE = 267 fps
VirtualDub.exe	[94499] ANYONE = 267 fps
VirtualDub.exe	[95999] ANYONE = 266 fps
VirtualDub.exe	[97499] ANYONE = 267 fps
VirtualDub.exe	[98999] ANYONE = 267 fps

Use type=2 GetThreadTimes
VirtualDub.exe	[91499] ANYONE = 261 fps
VirtualDub.exe	[92999] ANYONE = 275 fps
VirtualDub.exe	[94499] ANYONE = 274 fps
VirtualDub.exe	[95999] ANYONE = 268 fps
VirtualDub.exe	[97499] ANYONE = 264 fps
VirtualDub.exe	[98999] ANYONE = 284 fps

Use type=3 RDTSC
VirtualDub.exe	[91499] ANYONE = 263 fps
VirtualDub.exe	[92999] ANYONE = 268 fps
VirtualDub.exe	[94499] ANYONE = 270 fps
VirtualDub.exe	[95999] ANYONE = 270 fps
VirtualDub.exe	[97499] ANYONE = 270 fps
VirtualDub.exe	[98999] ANYONE = 268 fps

All above test shows an increase in performance of around 18% for flip vertical
In the conditions of a Y8 clip of 5000*3000 the plugin is using the new bitblt
by using non temporal stores, at least in my machine in which the largest cache
is a L3 of 3MB.
Under other conditions the difference can arrive till 30% or more; maybe in a few days
if I have time I will published more tests.

Thanks ARDA
ARDA is offline   Reply With Quote
Old 26th August 2015, 08:44   #10  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,308
The link for asmlib.zip is not working, at least for me.
jpsdr is offline   Reply With Quote
Old 26th August 2015, 09:40   #11  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Try again please
ARDA is offline   Reply With Quote
Old 26th August 2015, 11:46   #12  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,308
It's working now.

EDIT :
Very interesting this asmlib !!!!
I didn't know about it, time to update all my projects...


Last edited by jpsdr; 26th August 2015 at 12:46.
jpsdr is offline   Reply With Quote
Old 26th August 2015, 16:53   #13  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
Originally Posted by jpsdr
Very interesting this asmlib !!!!
I didn't know about it, time to update all my projects...
Happy you find something usefull.
If your projects include something relative to new bitblt or memcpy in avisynth I
encourage you to include my new source, and improve it if you find something wrong, anyway it
is a good idea to read all manuals in Agner Fog's page and mainly the assembler optimization.

Thanks ARDA
ARDA is offline   Reply With Quote
Old 26th August 2015, 18:29   #14  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,308
Quote:
Originally Posted by ARDA View Post
If your projects include something relative to new bitblt or memcpy in avisynth I
encourage you to include my new source, and improve it if you find something wrong
Big problem is that i'm absolutely not an yasm guy...
Other problem is that your code seems 32 bits only (because i've tried to take a look to find this new bitblt...).
So, unless there is something i'm more able to understand, i think for now i'll stay only with asmlib and allready build libraries.
jpsdr is offline   Reply With Quote
Old 26th August 2015, 22:11   #15  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
Originally Posted by jpsdr
Big problem is that i'm absolutely not an yasm guy...
Other problem is that your code seems 32 bits only (because i've tried to take a look to find
this new bitblt...).
So, unless there is something i'm more able to understand, i think for now i'll stay only with
asmlib and allready build libraries.

Yes, all this project is 32bits only, new bitblt is all in BitBlt_SSE2_avs.asm file,
it is in the zip(see first post)
If you want a bitblt for 64 bits don't expect it soon from my side.
The day we have an avisynth for 64bits stable, faster than 32 bits and
reliable maybe I will think about it.
If I donnot remember wrong assembler codes in asmlib project are all
in yasm/nasm sintax it would be a good idea start looking at them without fear,
if you intend taking advantage of them. What a wonderful word is open source!

Thanks ARDA
ARDA is offline   Reply With Quote
Old 26th August 2015, 23:52   #16  |  Link
Reel.Deel
Registered User
 
Join Date: Mar 2012
Location: Texas
Posts: 1,664
Quote:
Originally Posted by ARDA View Post
The day we have an avisynth for 64bits stable, faster than 32 bits and reliable maybe I will think about it.
Sounds a lot like AviSynth+ r1576 .
Reel.Deel is offline   Reply With Quote
Old 27th August 2015, 13:12   #17  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Code:

Source= 1920x1080 (Y8)
AvsTimer(frames=1000, name="ANYONE",type=0, frequency=1700, total=false, quiet=true)
flipvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=0, frequency=1700, difference=1, total=false)

Use type=0 QueryPerformaceCounter
VirtualDub.exe	[91498] ANYONE = 2163 fps
VirtualDub.exe	[92998] ANYONE = 2263 fps
VirtualDub.exe	[94498] ANYONE = 2283 fps
VirtualDub.exe	[95998] ANYONE = 2298 fps
VirtualDub.exe	[97498] ANYONE = 2301 fps
VirtualDub.exe	[98998] ANYONE = 2305 fps

Use type=2 GetThreadTimes
VirtualDub.exe	[91498] ANYONE = 2595 fps
VirtualDub.exe	[92998] ANYONE = 2667 fps
VirtualDub.exe	[94498] ANYONE = 2400 fps
VirtualDub.exe	[95998] ANYONE = 2233 fps
VirtualDub.exe	[97498] ANYONE = 2909 fps
VirtualDub.exe	[98998] ANYONE = 2182 fps

Use type=3 RDTSC
VirtualDub.exe	[91499] ANYONE = 2047 fps
VirtualDub.exe	[92999] ANYONE = 2272 fps
VirtualDub.exe	[94499] ANYONE = 2267 fps
VirtualDub.exe	[95999] ANYONE = 2259 fps
VirtualDub.exe	[97499] ANYONE = 2265 fps
VirtualDub.exe	[98999] ANYONE = 2237 fps

Code:

Source= 1920x1080 (Y8)
AvsTimer(frames=1000, name="ANYONE",type=0, frequency=1700, total=false, quiet=true)
fvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=0, frequency=1700, difference=1, total=false)

Use type=0 QueryPerformaceCounter
VirtualDub.exe	[91498] ANYONE = 4825 fps
VirtualDub.exe	[92998] ANYONE = 4801 fps
VirtualDub.exe	[94498] ANYONE = 4804 fps
VirtualDub.exe	[95998] ANYONE = 4819 fps
VirtualDub.exe	[97498] ANYONE = 4870 fps
VirtualDub.exe	[98998] ANYONE = 4853 fps

Use type=2 GetThreadTimes
VirtualDub.exe	[91498] ANYONE = 6000 fps
VirtualDub.exe	[92998] ANYONE = 5333 fps
VirtualDub.exe	[94498] ANYONE = 5053 fps
VirtualDub.exe	[95998] ANYONE = 3840 fps
VirtualDub.exe	[97498] ANYONE = 5647 fps
VirtualDub.exe	[98998] ANYONE = 6400 fps

Use type=3 RDTSC
VirtualDub.exe	[91498] ANYONE = 4830 fps
VirtualDub.exe	[92998] ANYONE = 4823 fps
VirtualDub.exe	[94498] ANYONE = 4817 fps
VirtualDub.exe	[95998] ANYONE = 4829 fps
VirtualDub.exe	[97498] ANYONE = 4851 fps
VirtualDub.exe	[98998] ANYONE = 4843 fps

These tests were done to test different kind of methods to measure performance under avstimer
These tests shows that fvertical is around 100% faster, in fact we should say that new fvertical
in place is faster than internal avisynth bitblt, so it is not a fair comparation, but
this was the one of the objectives of this plugin, to get better performance for flip vertical
when posible.
More tests soon.

Thanks ARDA
ARDA is offline   Reply With Quote
Old 27th August 2015, 20:42   #18  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Code:
Source= 720x576 (Y8)
AvsTimer(frames=1000, name="ANYONE",type=0, frequency=1700, total=false, quiet=true)
fvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=0, frequency=1700, difference=1, total=false)

Use type=0 QueryPerformaceCounter
VirtualDub.exe	[91498] ANYONE = 30549 fps
VirtualDub.exe	[92998] ANYONE = 30837 fps
VirtualDub.exe	[94498] ANYONE = 30766 fps
VirtualDub.exe	[95998] ANYONE = 30768 fps
VirtualDub.exe	[97498] ANYONE = 30690 fps
VirtualDub.exe	[98998] ANYONE = 30673 fps

Use type=2 GetThreadTimes
VirtualDub.exe	[91498] ANYONE = 24000 fps
VirtualDub.exe	[92998] ANYONE = 24000 fps
VirtualDub.exe	[94498] ANYONE = 24000 fps
VirtualDub.exe	[95998] ANYONE = 16000 fps
VirtualDub.exe	[97498] ANYONE = 32000 fps
VirtualDub.exe	[98998] ANYONE = 48000 fps

Use type=3 RDTSC
VirtualDub.exe	[91498] ANYONE = 30725 fps
VirtualDub.exe	[92998] ANYONE = 30835 fps
VirtualDub.exe	[94498] ANYONE = 30936 fps
VirtualDub.exe	[95998] ANYONE = 30952 fps
VirtualDub.exe	[97498] ANYONE = 30488 fps
VirtualDub.exe	[98998] ANYONE = 30806 fps

Code:
Source= 720x576 (Y8)
AvsTimer(frames=1000, name="ANYONE",type=0, frequency=1700, total=false, quiet=true)
flipvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=0, frequency=1700, difference=1, total=false)

Use type=0 QueryPerformaceCounter
VirtualDub.exe	[91497] ANYONE = 9094 fps
VirtualDub.exe	[92997] ANYONE = 12647 fps
VirtualDub.exe	[94497] ANYONE = 12826 fps
VirtualDub.exe	[95997] ANYONE = 13012 fps
VirtualDub.exe	[97497] ANYONE = 12892 fps
VirtualDub.exe  [98997] ANYONE = 12935 fp

Use type=2 GetThreadTimes
VirtualDub.exe	[91498] ANYONE = 16000 fps
VirtualDub.exe	[92998] ANYONE = 8727 fps
VirtualDub.exe	[94498] ANYONE = 24000 fps
VirtualDub.exe	[95998] ANYONE = 8727 fps
VirtualDub.exe	[97498] ANYONE = 12000 fps
VirtualDub.exe	[98998] ANYONE = 10667 fps

Use type=3 RDTSC
VirtualDub.exe	[91498] ANYONE = 12403 fps
VirtualDub.exe	[92998] ANYONE = 13077 fps
VirtualDub.exe	[94498] ANYONE = 12927 fps
VirtualDub.exe	[95998] ANYONE = 12930 fps
VirtualDub.exe	[97498] ANYONE = 12985 fps
VirtualDub.exe	[98998] ANYONE = 12993 fps

This resolution shows and increase performance of around 100%, is almost the same
condition than previous tests
ARDA is offline   Reply With Quote
Old 27th August 2015, 21:17   #19  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
OK, I tested it properly now. In order to measure the very short time for each call of *vertical() I just call it several times. "colorbars()" is extremely fast and does not influence the results.

Here's the script:
Code:
colorbars(width = 1920, height = 1080, pixel_type = "yv12").killaudio().assumefps(25, 1).trim(0, 4999)
#test_flipvertical()
#test_fvertical()

function test_flipvertical(clip c)
{
  last = c
  flipvertical().flipvertical().flipvertical().flipvertical().flipvertical()
  flipvertical().flipvertical().flipvertical().flipvertical().flipvertical()
  return last
}

function test_fvertical(clip c)
{
  last = c
  fvertical().fvertical().fvertical().fvertical().fvertical()
  fvertical().fvertical().fvertical().fvertical().fvertical()
  return last
}
flipvertical:
Code:
Frames processed:               5000 (0 - 4999)
FPS (min | max | average):      195.1 | 201.1 | 199.1
Memory usage (phys | virt):     15 | 14 MB
Thread count:                   1
CPU usage (average):            25%
Time (elapsed):                 00:00:25.109
fvertical:
Code:
Frames processed:               5000 (0 - 4999)
FPS (min | max | average):      687.9 | 736.4 | 730.7
Memory usage (phys | virt):     12 | 12 MB
Thread count:                   1
CPU usage (average):            25%
Time (elapsed):                 00:00:06.843
Very impressive!
Groucho2004 is offline   Reply With Quote
Old 27th August 2015, 22:46   #20  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
now benchmark it against a bitblt that simply uses a memcpy from a modern runtime instead

I suspect the only reason that this possible to optimize is that Avisynth's ~optimized~ bitblt is an ancient piece of garbage written for P4's and ancient Athlons, which doesn't really produce great results on modern CPU's. Agner Fog's memcpy implementation was - by his own benchmarks - only barely faster than Microsoft's back in 2008. Replacing Avisynth's bitblt with a wrapper around memcpy and compiling with a modern runtime (haha, who am I kidding, this is Avisynth) would probably speed it up a lot.

Last edited by TheFluff; 27th August 2015 at 22:57.
TheFluff is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:54.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.