Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 21st June 2003, 23:53   #1  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
AvsTimer

AvsTimer is a small filter with virtually no overhead, which allows one to measure the performance of plugins or groups of plugins. For download and more information see www.AvsTimer.de.tf
I have uploaded version 0.2 to the above web site. Two new options have been added. With the type option the timer function can be chosen (type=0 : absolute times are measured, type=1 : only the time of the process running the filter is counted, type=2(the default) : only the time of the thread running the filter is counted). After the documentation was updated I have added also type=3, which measures also the absolute time but with the cpu instruction RDTSC instead of the OS function QueryPerformanceCounter. With accumulate=true, the frame rates are averaged over all frames processed so far and not only over the last 2000 frames (or whatever has been specified with the frames= option). Here I follow a suggestion of Kurtnoise13.
I have uploaded version 0.3 to the above web site. The difference option has been expanded, the output format has been simplified and the documentation has been rewritten.
version 0.4 has been released (13.09.03). The Shutdown filter has been added to the plugin.
Version 0.4.1 released (21.02.04).
Version 0.5 released (07.03.04) (see posting below for a change log)
Version 0.6 released (25.07.04) (see posting below for a change log)

Last edited by kassandro; 25th July 2004 at 13:06.
kassandro is offline   Reply With Quote
Old 22nd June 2003, 00:35   #2  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Nice one!

Is there a reason you are not using QueryPerformanceTimer (as Kronos)?

It should be the most precise way of getting timings.

You could place the nametag in the constructor Timer(PClip child) : GenericVideoFilter(child) { <<<here!>>> } instead of the filter query function.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 22nd June 2003, 21:02   #3  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
I will replace the simple clock() function by GetThreadTimes. It has the accuracy of QueryPerformanceTimer but only counts the time spent within a thread. I was wrong in blaming Microsoft for not having such a function. It was only hard to find.
kassandro is offline   Reply With Quote
Old 23rd June 2003, 01:00   #4  |  Link
Kurosu
Registered User
 
Join Date: Sep 2002
Location: France
Posts: 432
Quote:
Originally posted by kassandro
I will replace the simple clock() function by GetThreadTimes. It has the accuracy of QueryPerformanceTimer but only counts the time spent within a thread. I was wrong in blaming Microsoft for not having such a function. It was only hard to find.
Indeed, it's more accurate. I'll rewrite Kronos to use it.
Kurosu is offline   Reply With Quote
Old 25th June 2003, 02:19   #5  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
I didnīt know that a similar utility already exists. Kronos was not mentioned on avisynth.org. Though it is available on the WarpEnterprises web site, there was no description.
kassandro is offline   Reply With Quote
Old 25th June 2003, 07:30   #6  |  Link
WarpEnterprises
C64
 
WarpEnterprises's Avatar
 
Join Date: Apr 2002
Location: Austria
Posts: 830
http://forum.doom9.org/showthread.php?threadid=52617

Would be great if you could couple your efforts.
Thanks!
WarpEnterprises is offline   Reply With Quote
Old 25th June 2003, 09:15   #7  |  Link
Kurtnoise
Swallowed in the Sea
 
Kurtnoise's Avatar
 
Join Date: Oct 2002
Location: Aix-en-Provence, France
Posts: 5,191
This is great !!! Thanks....

Just one more suggestion : Could you add an option to calculate (or just display.. ) the average of "Total" numbers ??? It's to know the average FPS for my processings.
Kurtnoise is offline   Reply With Quote
Old 25th June 2003, 10:29   #8  |  Link
Kurosu
Registered User
 
Join Date: Sep 2002
Location: France
Posts: 432
I think that indeed merging our efforts rather than going each one on his own plugin would be smart. For reference and testing (I'm still waiting for a reply from Sh0dan on that matter), you can check the latest version of Kronos (GPL source code included), which doesn't use threads. The main difference I see with Avstimer is that it does two measures (at the starttimer() and stoptimer() points): it's more accurate and more meaningfull for my own purpose. I'll stop commenting on Kronos as it'll be hijacking Kassandro's thread and work.

@Kassandro
I managed to use GetThreadTimes, but whatever thread priority I set, the results seems a bit strange (they are close but far from constant - like 10% variation on a fixed amount of processing). Calculations aren't the matter (I haven't checked rounding), I guess, as they are done on quadwords.
On a side note, your filter has/will have a real advantage over mine: it doesn't run several threads at a time
That causes crashes whenever a filter chain requests more frames at starttimer() point than it it outputs at stoptimer() point with the version of Kronos using threads. It is probably a bug of mine, but as far as I can tell, I don't see any problem within the code and compile options (multithreaded DLL).

Last edited by Kurosu; 26th June 2003 at 01:08.
Kurosu is offline   Reply With Quote
Old 26th June 2003, 01:45   #9  |  Link
Kurosu
Registered User
 
Join Date: Sep 2002
Location: France
Posts: 432
@Kassandro
Please find the source code of Kronos using threads here.

<side note>
I was stupidly using an undebugged MMX version of DGBob (could have used decomb or a temporal smoother for that issue), that crashed on a MMX instruction in this version. Kronos was fine from the start.
</side note>

However, another issue appears: all measures for one frame are very close to multiples of 10ms on my system. So the code may still have a rounding issue. Or it's something else, and I don't know how it'll affect real measure. (other people interested are free to look this code).

Last edited by Kurosu; 27th June 2003 at 08:06.
Kurosu is offline   Reply With Quote
Old 27th June 2003, 04:41   #10  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
@Kurosu
The link to the new Kronos seems to be dead, but the link to the "old" Kronos is ok. Thanks!
I havenīt tested the "real" accuracy of any time function. Theoretically even the simple clock() function should have an accuracy of 1 ms. Using the RDTSC instruction hardwired in the cpu one could measure time with the accuracy of cpu cycles. On the hand -like the UNIX times sytem call - one could measure time by the number of timer interrupts. This is quite natural because the scheduler of the operating system is driven by the timer interrupt. Probably Windows has 100 timer interrupts per seconds, which results in an accuracy of 10 ms.
Being a mathematician by education (not by profession) I do not care much about the accuracy of a timer function as long as it is unbiased (equally wrong on the up and the down side). If one averages over a sufficiently large number of frames, the inaccuracy then simply disappears. Even with a very crude timer function with an accuracy of a 1 s one could measure a 100 fps filter accurately if one averages over, say, 10000 frames, though this seems to be strange because for most frames a duration of 0 s and for only a very few 1 s will be measured in this example.
It is much more important that the timer function excludes the time spent for other processes or threads, because this error cannot be averaged out. Thatīs why GetThreadTimes()is appropriate. It seems to retrieve its information directly from the scheduler. Hence the overhead of this function should be small. Because GetProcessTimes() uses an identical system interface I will allow also the optional usage of this function in the new version. This is useful if a filter uses several threads, which would be quite weird though.
I agree that one should merge the Kronos and Avstimer and users should decide about the features.
kassandro is offline   Reply With Quote
Old 27th June 2003, 11:14   #11  |  Link
Kurosu
Registered User
 
Join Date: Sep 2002
Location: France
Posts: 432
Quote:
Originally posted by kassandro
The link to the new Kronos seems to be dead, but the link to the "old" Kronos is ok. Thanks!
I've updated the link for the newer (forgot a level of folder in the path) code.

Quote:
Theoretically even the simple clock() function should have an accuracy of 1 ms. Using the RDTSC instruction hardwired in the cpu one could
measure time with the accuracy of cpu cycles. On the hand -like the
For some filters and some computers, that still sounds like an accuracy to take care of; even more if some optimizations you do should only bring you less. Of course, the averaging you describe later compensates for that.

Regarding RDTSC, yes, it is as effective as QueryPerformanceTimer, but the accuracy sounds now useless because it's like 10E-3 (1ĩs against >1ns for a CPU cycle) times less than what's needed. In the end, we would be far better off with QueryPerformanceTimer, weren't the multitasking causing some trouble.

Quote:
operating system is driven by the timer interrupt. Probably Windows has 100 timer interrupts per seconds, which results in an accuracy of 10 ms.
Then it's even worse (not considering the point further, but you'll see why later), as it could introduce a higher error (and maybe bias) compared to a task that runs for 98% of the CPU time.

Quote:
Being a mathematician by education (not by profession) I do not care much about the accuracy of a timer function as long as it is unbiased (equally wrong on the up and the down side). If one averages over a sufficiently large number of frames, the inaccuracy then simply disappears.
The accuracy is indeed inversely proportionnal to the number of frames .

Anyway, reading you (I have no knowledge about how an OS and/or Windows' familly do their scheduling), I now see a main drawback: wouldn't the times reported by GetThreadTimes always be rounded up? As it's based on scheduling, it sounds logical (erm...) that if a rounding due to that scheduling is done, it is due to the fact the thread end is checked (and execution time in Kernel/User mode) at every block of the above 10ms. (ie check 10...10*n ms after thread start).

Another concern is that the instantaneous result can get a bit to strange. I'm still wondering why I get *negative* values sometimes - the source code you see needs to be changed, as the start call should measure thread execution time, but the full time the system spent on getting the frame.

Quote:
Even with a very crude timer function with an accuracy of a 1 s one could measure a 100 fps filter accurately if one averages over, say, 10000 frames, though this seems to be strange because for most frames a duration of 0 s and for only a very few 1 s will be measured in this example.
That's even more true with such filters as dup, that may process up to 20 frames in one request, and 0 in the following ones (just returning an already stored pointer).

In that regard, Kronos already prints onscreen the average, moreover it's the result it reports to the user through the logs and messagebox.

Quote:
spent for other processes or threads, because this error cannot be averaged out. Thatīs why GetThreadTimes()is appropriate. It seems to
Indeed, even if the bias is always positive and its majorant has a direct relation to the CPU time used as seen in the task manager, it stays unknown. It was still reasonable to use it if the system was mostly busy at doing the process. In fact, even QueryPerformanceTimer kept a lot (if not all) of its accuracy with other applications in the background (yet their priority was purposedly set to idle/lowest).

Quote:
This is useful if a filter uses several threads, which would be quite weird though.
Although one must carefully consider what he needs to measure (see my point about call in StartTimer.

Quote:
I agree that one should merge the Kronos and Avstimer and users should decide about the features.
Well, that one is "us". Fortunately, both source code is available. I think we can keep the log and messagebox functions from Kronos. Unfortunately, I find more usefull (as said previously) to have a StartTimer() and StopTimer() around the block to profile, although it seemed to introduce some problems for me. On the other hand, I find extremely usefull to have named timers. But seeing how Kronos is built, I have yet to see (global parameters -> tab structure and an index in that tab for each instance of the start/stop classes, I guess) how to adapt it for several timers running at a time.

I think that at that point, it's up to the users to say what they want to keep as "features" from both filters.
Kurosu is offline   Reply With Quote
Old 2nd July 2003, 12:15   #12  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
For type=2 timers, this is the default, the first reported total frame rate is often false. If accumulate=true is used then this error propagates also to subsequent numbers. The before and difference frame rates are not affected. In the upcoming version 0.3 there will simply be no more total frame rates for type=2 timers. The difference option will become much more flexible. Because also the output format is simplified, a major overhaul for the documentation is necessary. The new version should be available by next sunday.
kassandro is offline   Reply With Quote
Old 2nd July 2003, 15:07   #13  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Having any timer filter measuring only selected filters is a must, IMO.

The "best" timing filter (the only in my dreams should be able to deliver more advanced statistics.

A good feature would be to keep all frame-times stored, and at a point (not on every frame!) be able to gather statistics. Things I'd like to know:

- Average time.
- Minimum time.
- Maximum time.
- Average difference from the average time (variation). A percentage here would also be nice.
- Average of the 10% of the slowest times.
- Average of the 10% fastest times.
- Average of the center 50% times.

The last 3 are quite interesting to me as a developer.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 2nd July 2003, 15:36   #14  |  Link
Wilbert
Moderator
 
Join Date: Nov 2001
Location: Netherlands
Posts: 6,364
I guess that when the "average time" increases (with increasing framenumber), it implies a memory leak of a filter/plugin? Or is it more difficult to recognize memory leaks?
Wilbert is offline   Reply With Quote
Old 2nd July 2003, 16:20   #15  |  Link
Kurosu
Registered User
 
Join Date: Sep 2002
Location: France
Posts: 432
I believe most measures not based on an average are very prone to error, too much to my taste; I believe Kassandros agrees with me. For instance, the very first frame can have a very different processing time; worst example being dup (I guess that's why Sh0dan requested such values to be available).

Concerning the processing time increasing, except the above case (where initial frame can take much less time to be generated), I guess there are very few reasons it should happen (memory fragmentation, increasing stored data, like, precisely a dynamic history record .

@Sh0dan / Kassandro
Why don't just emit all the frames' time to a plain text file, like Compare() does? You can then use any software (Excel? MatLab? Some open source software?) to output whatever measure you'd need. Although the measures Sh0dan proposed seem sufficiently generic, I don't think it's the job of the filter to produce all of those particular results, as anyone could have very particular need. Deviation [ = sqrt(avg(tē)-avg(t)ē) ] is however a typical value that's worth adding.

[edit]
Now that I think of it, why not set a variable with the last execution time? This way, you can write your own formulae to display whatever you'd like.

[edit3]
I was getting OOT.

Last edited by Kurosu; 2nd July 2003 at 21:15.
Kurosu is offline   Reply With Quote
Old 2nd July 2003, 23:18   #16  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
I agree with Kurosu that it doesnīt make much sense to collect information about single frames. The numbers, which sh0dan would like to have, would reflect more the characteristics of the operating system and how it measures time than the characteristics of the filter. There are essentially two methods to measure time, the crude one - preferred by operating systems and probably used by GetProcessTimes and GetThreadTimes - based on counting timer interrupts and the accurate one based on the RDTSC instruction probably used by QueryPerformanceCounter. With only relatively small overhead an operating system could deliver fairly accurate versions of GetProcessTimes and GetThreadTimes excluding even time spent for various hardware interrupts and not only task switches as they do currently, but they simply donīt care. Thus one can measure only absolute time for single frames using the RDTSC instruction. But with this method one has no control over task switches, hardware interrupts etc. resulting in a much higher standard deviation, especially for fast filters, than the filter would have if time could be correctly measured. For some filters like dup, which must process several frames simultaneously, measuring single frames even doesnīt make sense from the get go. Because perturbation caused by task switches or hardware interrupts is more of a periodic rather than a random nature, one should be able reconstruct the "real" standard deviation even from the fluctuation of averages. However, I cannot do this with a quick shot.
kassandro is offline   Reply With Quote
Old 3rd July 2003, 13:05   #17  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
The reason I would like the "limited range" averages is to rule out excessive peaks. When timing small and very intensive routines, a task switch IS important, because it flushes the cache, and make filter perform considerably slower.

Of course cache flushes occur in real life, but these cases "doesn't count" when optimizing inner loops. However if the top 10% is consistently slower than the center 50% execution time without any obvious reason it might be worth a look.

Furthermore nonlinear access and first frames are often slower - so if they were in the top 25% - these cases would also be taken out of the equation.

The reason I would avoid the lower 25% is for cache hits not to count in the average.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 4th July 2003, 10:15   #18  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
@sh0dan
I suggest to add the option logrames="file name" to version 0.4 - not the upcoming version 0.3. If this option is used, then instead of creating output for debugview, a binary file with name "file name" is created, which contains a sequence of __int64 numbers. The first number is the number of ticks per seconds of the timer. Subsequently the number of ticks for each frame is stored in the binary file. Thus for a 90 minute PAL movie the binary file will be not much larger than 1 mb. While such a raw data file contains maximal information, it hardly canīt be used directly by any other program to make the data visible. Thus I hopefully will write also a small utility, which together with a variable printf format string, can creat input files for Open Office or the Open Office clone of the Evil Empire.
kassandro is offline   Reply With Quote
Old 7th July 2003, 15:20   #19  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
@Kurosu: I cannot seem to download Kronos anymore???

Seems like each time I need it, it's gone.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 7th July 2003, 17:41   #20  |  Link
Kurosu
Registered User
 
Join Date: Sep 2002
Location: France
Posts: 432
[edit]Page updated[/edit]

Compared to the examples' page, here are the additions:
- varname option sets (it must exist prior to stoptimer call, as I've described it in the env->SetVar() thread) the variable to which the last execution time will be written (so that you can play around with it, and make your own calculation)
- the standard deviation is also computed, but I think it's off by one digit (ie 20.5 -> 2.05)

I don't remember if it uses a performance timer or threads, as I didn't really like the results with thread, and I hadn't any time to check for accuracy under load and several other filters running.

In the end, I believe AVSTimer provides the most precise benchmarking, while Kronos allows an easy-to-handle and quick mean for comparison.

Last edited by Kurosu; 7th July 2003 at 19:00.
Kurosu is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:24.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.