View Full Version : SSE2 fixed -first tests
CruNcher
30th March 2003, 11:33
Thank's to Gomez and Radek SSE2 finaly works without crashing so i started testing the SSE2 optimizations in the current XviD unstable, at the moment there as good as no speed differences at least for my p4 1.8 (williamete core) the price for the 0.5x fps speed gain is a minimal higher filesize did not conducted 2 pass tests yet.
100frames / = NOSSE2 = 2,29 MB (2.411.878 bytes)
100frames / = SSE2 = 2,30 MB (2.413.260 bytes)
to continue...
sysKin
30th March 2003, 13:54
I also just fixed RefDivX's lumimasking to work with sse2. By the way, there were some related bugs which could make it working incorrectly - if you liked new lumimasking (available in Koepi's builds) - be warned, it might become better now ;)
Radek
bilu
30th March 2003, 15:36
@Radek
Is RefDivX's lumimasking available to other builds now? :rolleyes:
Bilu
CruNcher
31st March 2003, 16:25
This was a 100 frames test 640x480 Pal 25 fps
Lumimasking,Vhq ModeDecision,SoftMatrix,chroma me,chroma optimizer and
B-frames 3/150/50
ok round 2 idct_sse2 and fdct_sse2
---------------------------------------------------------------------------
nosse2-simpleidct-fdct_mmx.avi = 5.533 fps = 2,41 MB (2.538.046 bytes)
---------------------------------------------------------------------------
Minimum Average Maximum
Mean Absolute Deviation: 1.1977 2.9721 22.9059
Mean Deviation: -2.1609 +0.0828 +11.2064
PSNR: 16.8244 36.0688 43.0388
---------------------------------------------------------------------------
sse2-simpleidct-fdct_sse2.avi = 5.750 fps = 2,42 MB (2.540.196 bytes)
(currently used) and most i think also idct_sse2
---------------------------------------------------------------------------
Minimum Average Maximum
Mean Absolute Deviation: 1.2018 2.9743 22.9094
Mean Deviation: -2.1819 +0.0743 +11.2069
PSNR: 16.8240 36.0666 43.0299
---------------------------------------------------------------------------
nosse2-simpleidct-fdct_mmx.avi = 5.512 fps = 2,41 MB (2.527.902 bytes)
---------------------------------------------------------------------------
Minimum Average Maximum
Mean Absolute Deviation: 1.1977 2.9721 22.9059
Mean Deviation: -2.1609 +0.0828 +11.2064
PSNR: 16.8244 36.0688 43.0388
---------------------------------------------------------------------------
sse2-simpleidct-fdct_mmx.avi = 5.555 fps = 2,41 MB (2.527.902 bytes)
---------------------------------------------------------------------------
Minimum Average Maximum
Mean Absolute Deviation: 1.1977 2.9721 22.9059
Mean Deviation: -2.1609 +0.0828 +11.2064
PSNR: 16.8244 36.0688 43.0388
---------------------------------------------------------------------------
round 3 Experimental SSE2
sad16_sse2
dev16_sse2
---------------------------------------------------------------------------
sse2-simpleidct-fdct_mmx.avi = 5.275 fps = 2,40 MB (2.526.604 bytes)
---------------------------------------------------------------------------
Minimum Average Maximum
Mean Absolute Deviation: 1.3695 2.9948 22.9486
Mean Deviation: -2.0926 +0.0819 +11.0636
PSNR: 16.8161 36.0038 41.6404
---------------------------------------------------------------------------
RefDivX Lumimasking
---------------------------------------------------------------------------
nosse2-simpleidct-fdct_mmx.avi = 4.368 fps = 3,23 MB (3.388.334 bytes)
---------------------------------------------------------------------------
Minimum Average Maximum
Mean Absolute Deviation: 1.1868 2.7837 22.9078
Mean Deviation: -2.1631 +0.0837 +11.2176
PSNR: 16.8245 36.7148 43.0831
---------------------------------------------------------------------------
----------------------------------------------------------------------
-----
sse2-simpleidct-fdct_mmx.avi = 4.254 fps = 3,23 MB (3.388.334 bytes)
---------------------------------------------------------------------------
Minimum Average Maximum
Mean Absolute Deviation: 1.1868 2.7837 22.9078
Mean Deviation: -2.1631 +0.0837 +11.2176
PSNR: 16.8245 36.7148 43.0831
---------------------------------------------------------------------------
as you can see from the list the best thing for P4 users is too deactivate fdct_sse2,idct_sse2 and don't use Experimental SSE2
thx again to Radek for help with Psnr measurements
PS: Yeah i know the Psnr is really low but this is because Trbarrys SoftMatrix was used for testing purpose (compressibility), but shouldn't harm the endresult (i hope) :)
CruNcher
31st March 2003, 20:32
cbr 2-2 1000 kbps b-frames 5/150/50 same options as on prev post
3,22 MB (3.386.852 bytes) = PSNR 16.8244 <- Koepis IC7 (nosse2)(SimpleIdct_mmx2) (fdct?) (refdivxlumi )
2,41 MB (2.531.218 bytes) = PSNR 16.8246 <- Nics IC7 (SSE2) (SimpleIdct_? ) (fdct?) (standardlumi)
2,41 MB (2.527.902 bytes) = PSNR 16.8244 <- CruNchers IC7 (SSE2) (SimpleIdct_mmx2) (fdct_mmx) (standardlumi)
2,43 MB (2.552.832 bytes) = PSNR 16.8246 <- CruNchers MS (SSE2) (SimpleIdct_mmx2) (fdct_mmx) (standardlumi)
quant2
same options as on prev post except no bframes
3,80 MB (3.991.602 bytes) = PSNR 34.7405 = 4.033 fps <- Koepis IC7 (nosse2)(SimpleIdct_mmx2) (fdct?) (refdivxlumi )
3,22 MB (3.382.314 bytes) = PSNR 34.7477 = 5.333 fps <- Nics IC7 (SSE2) (SimpleIdct_? ) (fdct_sse2)(standardlumi)
3,22 MB (3.377.246 bytes) = PSNR 34.7670 = 5.121 fps <- CruNchers IC7 (SSE2) (SimpleIdct_mmx2) (fdct_mmx) (standardlumi)
3,26 MB (3.424.306 bytes) = PSNR 34.4381 = 4.904 fps <- CruNchers MS (SSE2) (SimpleIdct_mmx2) (fdct_mmx) (standardlumi)
3,80 MB (3.991.674 bytes) = PSNR 34.7405 = 4.084 fps <- CruNchers IC7 (SSE2) (SimpleIdct_mmx2) (fdct_mmx) (refdivxlumi)
every build is unique :P
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.