View Single Post
Old 25th August 2015, 23:05   #9  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
AvsTimer obviously uses the rdtsc instruction to measure time based on the CPU time stamp
but this is highly unreliable, especially with modern CPUs.
Extracted from avstimer docs:
Quote:
The default method type=2 is based on the windows functions GetThreadTimes,
which measures the time the filter chain spends within the thread, by which is
it is called. This is the most accurate method as long as no other threads are
used by the filter chain. Currently I know of no example which violates this
assumption. However, as hyperthreading becomes more and more fashionable times
may change. If one uses the option type=1, then time is measured with GetProcessTimes.
Then the time of the process, which runs the Avisynth script, is measured,
while the process is executing the filter chain. However, some threads of this
process, which have nothing to do with the filters, but run parallel may
artificially reduce the frame rate. If type=0 or type=3 is used, then instead of
process or thread time the absolute time is taken. Then also other processes may
influence the frame rates to the downside. If type=0 then the function
QueryPerformaceCounter is used, if type=3 then the cpu instruction RDTSC is used.
type=3 has certainly the smallest overhead, but during initialisation 1 second is
necessary to determine the cpu frequency and on some notebooks the cpu frequency
may change over time obliterating all the results. To avoid the delay one may
also specify the cpu frequency with the frequency option. This must happen with
the first type=3 timer or any other timer befor the first type=3 timer otherwise
the frequency option is ignored. Thus AvsTimer(type=3, frequency= 1300) defines
a type= 3 timer for a cpu with 1300 MHZ. One can get the exact cpu frequency by
running a type= 3 timer without the frequency option. Then the exact cpu frequency
is displayed in the debugview window. Timers, which are paired together by the
difference option, shouldbe of the same type, otherwise the results do not make
sense and may even yield negative frame rates. Timers of type!= 2 have an
additional advantage. With such timers the frame rate of the entire process
running the script - we call this the total frame rate - can be measured and
is displayed by default.

Yes in the method I proposed, type=3 difines that rdtsc wil be used, if you
prefer QueryPerformaceCounter counter use type=0. You will get different results
but almost sure with the same distance between them.
Anyway for more accurate tests we should have to set real time priority and set
a thread affinity in the source plugin code, but that is not real life and I think
it wouldn't run under Windows XP(not sure)
There are a lot of academic discussion here and there all over the net about which is
the most accurate method to benchmark codes, I wouldn't like this thread become
about this subject.
I did not do the benchmarks with the example in RGBB32 cause it would take to much time
and with this resolution(5000x3000) new bitblt is also applied with non temporal stores.
In my machine and with a y8 clip here are the results;
Code:
Source= 5000x3000 (Y8)
AvsTimer(frames=1000, name="ANYONE",type=0, frequency=1700, total=false, quiet=true)
fvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=0, frequency=1700, difference=1, total=false)

Use type=0 QueryPerformaceCounter
VirtualDub.exe	AvsTimer 0.8.1
VirtualDub.exe	AvsTimer 0.8.1
VirtualDub.exe	[91499] ANYONE = 306 fps
VirtualDub.exe	[92999] ANYONE = 305 fps
VirtualDub.exe	[94499] ANYONE = 305 fps
VirtualDub.exe	[95999] ANYONE = 305 fps
VirtualDub.exe	[97499] ANYONE = 310 fps
VirtualDub.exe	[98999] ANYONE = 310 fps

Use type=2 GetThreadTimes
VirtualDub.exe	[91499] ANYONE = 305 fps
VirtualDub.exe	[92999] ANYONE = 317 fps
VirtualDub.exe	[94499] ANYONE = 311 fps
VirtualDub.exe	[95999] ANYONE = 323 fps
VirtualDub.exe	[97499] ANYONE = 331 fps
VirtualDub.exe	[98999] ANYONE = 313 fps

Use type=3 RDTSC
VirtualDub.exe	[91497] ANYONE = 302 fps
VirtualDub.exe	[92997] ANYONE = 308 fps
VirtualDub.exe	[94497] ANYONE = 309 fps
VirtualDub.exe	[95997] ANYONE = 306 fps
VirtualDub.exe	[97497] ANYONE = 289 fps
VirtualDub.exe	[98997] ANYONE = 305 fps
 

************************************************************************************
Code:
Source= 5000x3000 (Y8)
AvsTimer(frames=1000, name="ANYONE",type=0, frequency=1700, total=false, quiet=true)
flipvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=0, frequency=1700, difference=1, total=false)

Use type=0 QueryPerformaceCounter
VirtualDub.exe	[91499] ANYONE = 262 fps
VirtualDub.exe	[92999] ANYONE = 267 fps
VirtualDub.exe	[94499] ANYONE = 267 fps
VirtualDub.exe	[95999] ANYONE = 266 fps
VirtualDub.exe	[97499] ANYONE = 267 fps
VirtualDub.exe	[98999] ANYONE = 267 fps

Use type=2 GetThreadTimes
VirtualDub.exe	[91499] ANYONE = 261 fps
VirtualDub.exe	[92999] ANYONE = 275 fps
VirtualDub.exe	[94499] ANYONE = 274 fps
VirtualDub.exe	[95999] ANYONE = 268 fps
VirtualDub.exe	[97499] ANYONE = 264 fps
VirtualDub.exe	[98999] ANYONE = 284 fps

Use type=3 RDTSC
VirtualDub.exe	[91499] ANYONE = 263 fps
VirtualDub.exe	[92999] ANYONE = 268 fps
VirtualDub.exe	[94499] ANYONE = 270 fps
VirtualDub.exe	[95999] ANYONE = 270 fps
VirtualDub.exe	[97499] ANYONE = 270 fps
VirtualDub.exe	[98999] ANYONE = 268 fps

All above test shows an increase in performance of around 18% for flip vertical
In the conditions of a Y8 clip of 5000*3000 the plugin is using the new bitblt
by using non temporal stores, at least in my machine in which the largest cache
is a L3 of 3MB.
Under other conditions the difference can arrive till 30% or more; maybe in a few days
if I have time I will published more tests.

Thanks ARDA
ARDA is offline   Reply With Quote