Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd September 2015, 15:34   #41  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
Quote:
Originally Posted by ARDA View Post
I don't have VapourSynth instaled so I cannot give you feedback in this case
Read the post. It's an Avisynth plugin. Benchmarks welcome.
TheFluff is offline   Reply With Quote
Old 2nd September 2015, 16:53   #42  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
Originally Posted by TheFluff
Read the post. It's an Avisynth plugin. Benchmarks welcome.
Sorry , quick read, I will do some benchmarks and publish here, but I donnot know if Myrsloik has updated BitBlt_SSE2_avs.asm. Let us wait
till he can confirm. I will also ask Myrsloik if he can release the code to see how he has done the integration, cause it is a little tricky.

@TheFluff
Maybe you can also post some benchmarks results, they will be welcome.

Thanks ARDA
ARDA is offline   Reply With Quote
Old 2nd September 2015, 17:12   #43  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,554
Quote:
Originally Posted by ARDA View Post
Sorry , quick read, I will do some benchmarks and publish here, but I donnot know if Myrsloik has updated BitBlt_SSE2_avs.asm. Let us wait
till he can confirm. I will also ask Myrsloik if he can release the code to see how he has done the integration, cause it is a little tricky.

@TheFluff
Maybe you can also post some benchmarks results, they will be welcome.

Thanks ARDA
I didn't touch any of your asm. It's all original code from me.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 2nd September 2015, 17:29   #44  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
I didn't touch any of your asm. It's all original code from me
I was asking if you updated the asm code with the link in this post ,it was specially released because of you
http://forum.doom9.org/showthread.ph...39#post1736739
Besides that if you take a look to fvertical.cpp you will see that it need some extra calls previously like.

Code:
Pass_sizeframe((dstframesize)+(dstframesizeU)+(dstframesizeU));
I will publish a whole explanation about the architecture of this new bitblt and how to use it.

Thanks ARDA
ARDA is offline   Reply With Quote
Old 2nd September 2015, 17:53   #45  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
@all developers

This version of bitblt-memcpy is still under heavy development, mainly cause the bypass values
relatives to each cpu must be fixed manually, I never could find a general formula, I would
ask to wait and help me with your cpus to find most exact possible value by doing some tests
like Groucho2004 has done. I release here fviertical. dll version 1.003 that I think it quite
near to be definitive(do benchmaks with it), after that I will release source code again

@all developers and users
Version 1.003 Fvertcal.dll

Thanks ARDA

Last edited by ARDA; 3rd September 2015 at 07:59.
ARDA is offline   Reply With Quote
Old 2nd September 2015, 18:47   #46  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
To discuss about bitlt-memcpy specifically please follow this new thread

http://forum.doom9.org/showthread.ph...45#post1736845

And we stay here with all related with fvertical
Thnaks ARDA

Last edited by ARDA; 2nd September 2015 at 19:00.
ARDA is offline   Reply With Quote
Old 2nd September 2015, 19:25   #47  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
Quote:
Originally Posted by ARDA View Post
I was asking if you updated the asm code with the link in this post ,it was specially released because of you
http://forum.doom9.org/showthread.ph...39#post1736739
Besides that if you take a look to fvertical.cpp you will see that it need some extra calls previously like.

Code:
Pass_sizeframe((dstframesize)+(dstframesizeU)+(dstframesizeU));
I will publish a whole explanation about the architecture of this new bitblt and how to use it.

Thanks ARDA
I guess we have to explain the joke since it apparently went completely over your head. Myrsloik hasn't looked at your code at all, he wrote his own flipvertical implementation using Vapoursynth's bitblt, which looks like this:
Code:
static inline void vs_bitblt(void *dstp, int dst_stride, const void *srcp, int src_stride, size_t row_size, size_t height) {
    if (height) {
        if (src_stride == dst_stride && src_stride == (int)row_size) {
            memcpy(dstp, srcp, row_size * height);
        } else {
            const uint8_t *srcp8 = (const uint8_t *)srcp;
            uint8_t *dstp8 = (uint8_t *)dstp;
            for (size_t i = 0; i < height; i++) {
                memcpy(dstp8, srcp8, row_size);
                srcp8 += src_stride;
                dstp8 += dst_stride;
            }
        }
    }
}
It's probably faster than Avisynth 2.6's implementation and insignificantly slower than yours. If you were sane, you would have benchmarked that, but I don't think you have.
TheFluff is offline   Reply With Quote
Old 2nd September 2015, 21:49   #48  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,316
Quote:
Originally Posted by TheFluff View Post
I guess we have to explain the joke since... It's probably faster than Avisynth 2.6's implementation...
You've copied what i've done i nnedi...
Code:
inline void PlanarFrame::BitBlt(uint8_t *dstp,int dst_pitch,const uint8_t *srcp,int src_pitch,int row_size,int height) 
{
	if ((height==0) || (row_size==0)) return;

	if ((height==1) || ((dst_pitch==src_pitch) && (src_pitch==row_size))) A_memcpy(dstp,srcp,(size_t)src_pitch*(size_t)height);
	else 
	{
		for (int y=0; y<height; y++)
		{
			A_memcpy(dstp,srcp,row_size);
			dstp+=dst_pitch;
			srcp+=src_pitch;
		}
	}
}
But you missed the case of height=1.
.... Ok, i'm going out...
jpsdr is offline   Reply With Quote
Old 3rd September 2015, 05:10   #49  |  Link
jmac698
Registered User
 
Join Date: Jan 2006
Posts: 1,867
fvertical:
Quote:
Frames processed: 5000 (0 - 4999)
FPS (min | max | average): 98.98 | 159.3 | 145.9
Memory usage (phys | virt): 16 | 12 MB
Thread count: 1
CPU usage (average): 49%

Time (elapsed): 00:00:34.272
flipvertical:
Quote:
Frames processed: 5000 (0 - 4999)
FPS (min | max | average): 18.97 | 49.62 | 45.19
Memory usage (phys | virt): 19 | 15 MB
Thread count: 1
CPU usage (average): 49%

Time (elapsed): 00:01:50.644
323% speedup, very impressive!

t7300@2GHz
fam 6 model f step b
mmx,sse,sse2,sse3,ssse3
l1 2x32k l2 4M
2c2t

benchmark as in
http://forum.doom9.org/showthread.ph...48#post1736148
jmac698 is offline   Reply With Quote
Old 3rd September 2015, 05:24   #50  |  Link
jmac698
Registered User
 
Join Date: Jan 2006
Posts: 1,867
AviSynth+ 0.1 (r1779, MT, i386) (0.1.0.0)
Quote:
Frames processed: 5000 (0 - 4999)
FPS (min | max | average): 15.85 | 49.96 | 45.00
Memory usage (phys | virt): 23 | 23 MB
Thread count: 3
CPU usage (average): 49%

Time (elapsed): 00:01:51.121
same speed as AviSynth 2.60, build:Mar 31 2015 [16:38:54] (2.6.0.6)
jmac698 is offline   Reply With Quote
Old 3rd September 2015, 05:40   #51  |  Link
jmac698
Registered User
 
Join Date: Jan 2006
Posts: 1,867
flipvfaster, avisynth+ (and all were win8.1 x64)
Quote:
Frames processed: 5000 (0 - 4999)
FPS (min | max | average): 18.00 | 48.78 | 40.75
Memory usage (phys | virt): 22 | 23 MB
Thread count: 3
CPU usage (average): 49%

Time (elapsed): 00:02:02.707
so, guess TheFluff is wrong, and this new bitblit is far superior to anything existing

also I'm getting a bug in flipvfaster, the V channel is always 0.

Last edited by jmac698; 3rd September 2015 at 05:45.
jmac698 is offline   Reply With Quote
Old 3rd September 2015, 09:28   #52  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
@jmac698
Thanks for your contribution, it is very important, mainly after this litlle storm that
has unfocus me from the task we have here, that is, for now to fix the bypass cache values for
any different architecture and configuration. But we must consider that these are tests that
sometimes measure some fvertical codes that work in place, against internal flipvertical that in
all cases uses bitblt-memcpy, and in the majority of cases by using backward block copy techniques
and finished with non temporal store, why all that? cause steady-Sh0dan bitblt-memcpy were written
in a time when most of cpus had a L2 of 256kb or 512kb(maximum) and the bypasses value were fixed for that
largest cache sizes, which made non temporal store the best option almost always and the hardware
predictors were not as good as are in nowadays machines, and by doing backward, bitblt avoided that
destination buffer were loaded in cache and the correspondant penalizations, cause caches lines
continous flushing.
For all that I ask you a little more effort by doing the following test, you can use this
script in which please post back your results, those frame sizes chosen are just examples
you can do any others if you want, but it is probably I will be asking you for two or three
more tests after analyzing your results

Code:

# Insert here
# Vendor   ,Family    ,Model    ,instruction set    ,Largest cache

#test320x256yv12
#colorbars(width = 320, height = 256, pixel_type = "yv12").killaudio().assumefps(25, 1).trim(0, 99999)
#flipvertical() #..... fps
#fvertical()    #..... fps

#test448x320yv12
#colorbars(width = 448, height = 320, pixel_type = "yv12").killaudio().assumefps(25, 1).trim(0, 99999)
#flipvertical() #..... fps
#fvertical()    #..... fps

#test640x480yv12
#colorbars(width = 640, height = 480, pixel_type = "yv12").killaudio().assumefps(25, 1).trim(0, 99999)
#flipvertical() #..... fps
#fvertical()    #..... fps

#test720x576yv12
#colorbars(width = 720, height = 576, pixel_type = "yv12").killaudio().assumefps(25, 1).trim(0, 99999)
#flipvertical() #..... fps
#fvertical()    #..... fps

#test1392x992yv12
#colorbars(width = 1392, height = 992, pixel_type = "yv12").killaudio().assumefps(25, 1).trim(0, 99999)
#flipvertical() #..... fps
#fvertical()    #..... fps

#test1392x1024yv12
#colorbars(width = 1392, height = 1024, pixel_type = "yv12").killaudio().assumefps(25, 1).trim(0, 99999)
#flipvertical() #..... fps
#fvertical()    #..... fps

#test1920x1080yv12
#colorbars(width = 1920, height = 1080, pixel_type = "yv12").killaudio().assumefps(25, 1).trim(0, 99999)
#flipvertical() #..... fps
#fvertical()    #..... fps

#test5000x3000RGB32
#colorbars(width = 5000, height = 3000, pixel_type = "rgb32").killaudio().assumefps(25, 1).trim(0, 9999)
#flipvertical() #..... fps
#fvertical()    #..... fps


Thanks in advance ARDA

Last edited by ARDA; 3rd September 2015 at 09:31. Reason: typo
ARDA is offline   Reply With Quote
Old 3rd September 2015, 13:12   #53  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,554
Did I really manage to put a bug in that simple code?

Here's an updated FlipVFaster. Don't get scared when you see the speed. Any previous speed tests done on subsampled planar formats were wrong in the previous version. So only HolyWu's tests show the objective truth.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 3rd September 2015, 14:22   #54  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
Originally Posted by Holywu
I simply pick 720p, 1080p and 4K resolution for common use cases. As FlipVFaster doesn't process
V plane correctly, I use a Y8 clip for benchmark.
Code:

#ColorBars(width=1280, height=720, pixel_type="YV12").ConvertToY8().KillAudio().AssumeFPS(25, 1).Trim(0, 99999)
#FlipVertical()              # your cpu 10693 fps   #my cpu 6456
#FVertical()                 # your cpu  15782 fps  #my cpu 9108
#FlipVFaster(newbitblt=true) # your cpu  18281 fps  #my cpu 8645

#ColorBars(width=1920, height=1080, pixel_type="YV12").ConvertToY8().KillAudio().AssumeFPS(25, 1).Trim(0, 99999)
#FlipVertical()               # your cpu  4624 fps   #my cpu 2726
#FVertical()                  # your cpu  7384 fps   #my cpu 3484
#FlipVFaster(newbitblt=true)  # your cpu  8619 fps   #my cpu 3107

#ColorBars(width=3840, height=2160, pixel_type="YV12").ConvertToY8().KillAudio().AssumeFPS(25, 1).Trim(0, 49999)
#FlipVertical()              #  your cpu 768.9 fps   #my cpu 520.4
#FVertical()                 #  your cpu 727.6 fps   #my cpu 597.4
#FlipVFaster(newbitblt=true) #  your cpu 681.9 fps   #my cpu 536.3

Your results are surprising for me and not fully coherents, that is why I will
prepared a special release taking into account your machine, if you agree I will
send you by pm. It seems to me that is the same problem I had with Groucho2004 cpu
around the bypass cache value.
And in the meanwhile I ask you please, to do the tests again
according to next post

Thanks ARDA
ARDA is offline   Reply With Quote
Old 3rd September 2015, 14:24   #55  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
In post http://forum.doom9.org/showthread.ph...23#post1735823 there is a proposal of
a method to do benchmarks without the influence of other parts of the script.

Here is the script I use
Code:

MPEG2Source("xxxxxx.d2v")#720,576  or any other form of real clip source
crop(8,72,700,432,align=false)
ConvertToY8()
#LanczosResize(1920,1080)
#LanczosResize(1280,720)
#LanczosResize(3840,2160)
AvsTimer(frames=1000, name="ANYONE",type=3, frequency=1700, total=false, quiet=true)
#fvertical()
#FlipVFaster(newbitblt=true)
#flipvertical()
AvsTimer(frames=1500 ,name="ANYONE",type=3, frequency=1700, difference=1, total=false)


An here the results in my cpu with the same sizes
Code:

1280,720
VirtualDub.exe	[91471] FlipVFaster = 4591 fps
VirtualDub.exe	[94471] FlipVFaster = 5761 fps
VirtualDub.exe	[95971] FlipVFaster = 5858 fps
VirtualDub.exe	[97471] FlipVFaster = 5887 fps
VirtualDub.exe	[98971] FlipVFaster = 5837 fps

VirtualDub.exe	[91498] fvertical = 12431 fps
VirtualDub.exe	[92998] fvertical = 12927 fps
VirtualDub.exe	[94498] fvertical = 12818 fps
VirtualDub.exe	[95998] fvertical = 12800 fps
VirtualDub.exe	[97498] fvertical = 12823 fps
VirtualDub.exe	[98998] fvertical = 12749 fps

VirtualDub.exe	[91498] flipvertical = 5546 fps
VirtualDub.exe	[92998] flipvertical = 5691 fps
VirtualDub.exe	[94498] flipvertical = 5751 fps
VirtualDub.exe	[95998] flipvertical = 5756 fps
VirtualDub.exe	[97498] flipvertical = 5693 fps
VirtualDub.exe	[98998] flipvertical = 5672 fps

1920,1080
VirtualDub.exe	[91498] FlipVFaster = 2188 fps
VirtualDub.exe	[92998] FlipVFaster = 2241 fps
VirtualDub.exe	[94498] FlipVFaster = 2229 fps
VirtualDub.exe	[95998] FlipVFaster = 2232 fps
VirtualDub.exe	[97498] FlipVFaster = 2258 fps
VirtualDub.exe	[98998] FlipVFaster = 2237 fps

VirtualDub.exe	[91498] fvertical = 4702 fps
VirtualDub.exe	[92998] fvertical = 4778 fps
VirtualDub.exe	[94498] fvertical = 4740 fps
VirtualDub.exe	[95998] fvertical = 4765 fps
VirtualDub.exe	[97498] fvertical = 4849 fps
VirtualDub.exe	[98998] fvertical = 4783 fps

VirtualDub.exe	[91498] flipvertical = 2213 fps
VirtualDub.exe	[92998] flipvertical = 2298 fps
VirtualDub.exe	[94498] flipvertical = 2283 fps
VirtualDub.exe	[95998] flipvertical = 2286 fps
VirtualDub.exe	[97498] flipvertical = 2298 fps
VirtualDub.exe	[98998] flipvertical = 2254 fps

3840,2160
VirtualDub.exe	[91499] FlipVFaster = 498 fps
VirtualDub.exe	[92999] FlipVFaster = 511 fps
VirtualDub.exe	[94499] FlipVFaster = 512 fps
VirtualDub.exe	[95999] FlipVFaster = 513 fps
VirtualDub.exe	[97499] FlipVFaster = 512 fps
VirtualDub.exe	[98999] FlipVFaster = 508 fps

VirtualDub.exe	[91499] fvertical = 552 fps
VirtualDub.exe	[92999] fvertical = 558 fps
VirtualDub.exe	[94499] fvertical = 568 fps
VirtualDub.exe	[95999] fvertical = 568 fps
VirtualDub.exe	[97499] fvertical = 567 fps
VirtualDub.exe	[98999] fvertical = 568 fps

VirtualDub.exe	[91499] flipvertical = 468 fps
VirtualDub.exe	[92999] flipvertical = 478 fps
VirtualDub.exe	[94499] flipvertical = 480 fps
VirtualDub.exe	[95999] flipvertical = 478 fps
VirtualDub.exe	[97499] flipvertical = 479 fps
VirtualDub.exe	[98999] flipvertical = 478 fps


As you can see, the results differ a lot from one kind of benchmar to another
I don't want to open a new polemic about benchmark methods, but just to confirm if you
also have differents results in your cpu

Thanks ARDA
ARDA is offline   Reply With Quote
Old 3rd September 2015, 14:42   #56  |  Link
Reel.Deel
Registered User
 
Join Date: Mar 2012
Location: Texas
Posts: 1,664
Here's my results using AviSynth+ r1576, FVertical v1.003, and the updated FlipVFaster:

Code:
ColorBars(width=1280, height=720, pixel_type="YV12").KillAudio().AssumeFPS(25, 1).Trim(0, 99999)
#FlipVertical()              # 4601 fps
#FVertical()                 # 9359 fps
#FlipVFaster(newbitblt=true) # 10160 fps
Code:
ColorBars(width=1920, height=1080, pixel_type="YV12").KillAudio().AssumeFPS(25, 1).Trim(0, 99999)
#FlipVertical()              # 1938 fps
#FVertical()                 # 4305 fps
#FlipVFaster(newbitblt=true) # 4756 fps
Code:
ColorBars(width=3840, height=2160, pixel_type="YV12").KillAudio().AssumeFPS(25, 1).Trim(0, 49999)
#FlipVertical()              # 468.3 fps
#FVertical()                 # 627.3 fps
#FlipVFaster(newbitblt=true) # 577.3 fps

Reel.Deel is offline   Reply With Quote
Old 3rd September 2015, 15:38   #57  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
Originally Posted by Reel.Dell
Here's my results using AviSynth+ r1576, FVertical v1.003, and the updated FlipVFaster
It seems the same situation in holywu cpu, luckily now I have data to fix bypass cache values. In the meantime if you have time, can you please do the benchmarks I asked holywu in the post previous yours.http://forum.doom9.org/showthread.ph...73#post1736973


thanks in advance ARDA

Last edited by ARDA; 3rd September 2015 at 16:25.
ARDA is offline   Reply With Quote
Old 3rd September 2015, 16:18   #58  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
@all

As I mentioned before, this bitblt-memcpy is still under heavy development, till I can fix all values in the TableByPass (at the end of BitBlt_SSE2_avs) benchmarks value must be considered provisory and as a reference for analysis; That is why I invite more users to join us as volunteers so we shall finish this quickly.

Thanks ARDA
ARDA is offline   Reply With Quote
Old 3rd September 2015, 17:17   #59  |  Link
jmac698
Registered User
 
Join Date: Jan 2006
Posts: 1,867
I had trouble getting avstimer to work. Here's how to do it:
http://www.avstimer.de.tf/ doesn't load for me. I got a version from
http://www.avisynth.nl/users/warpenterprises/
which needs
http://www.avsrecursion.de.tf/
which needs
http://www.dll-files.com/dllindex/dl....shtml?msvcr71

(the latter two need to be put in C:\Windows\SysWOW64 for windows 8.1 x64)
otherwise, you get this error:
Platform return code 126: The specified module could not found

Then I tried to open the benchmark script in virtualdub, but got "avi import errorUnknown) (80040154)"
I had to reinstall avisynth. I think the problem was that uninstalling avisynth+ didn't return my system to using avisynth.

Now I'll try to get some benchmarks.
jmac698 is offline   Reply With Quote
Old 3rd September 2015, 17:22   #60  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
@jmac698

sorry you you've gone into so many troubles, if you continue having problems with avstimer I think
i have statically compiled version that needs nothing to be used.

Thanks a lot for you effort, waiting results ARDA
ARDA is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 18:38.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.