View Full Version : Help x264 development: run this test!
Dark Shikari
17th November 2007, 06:41
1. Download this program (http://senduit.com/e602ae). Extract the zipfile; the necessarily Cygwin DLL is included.
2. Run from the commandline bench_align.exe > output.txt
(pipe the output to a file).
3. Post the output.
Akupenguin: So we have determined that no code changes are needed for Penryn. AMD still doesn't have cacheline split issues, but Phenom does benefit from SSE2, so some new cpu detection is needed there. Please, no more bench_align results until a new cpu architecture comes out :)
kumi
17th November 2007, 06:49
AMD Sempron 2400+
nop: 110
movq8
469 673 673 673 673 674 674 674
446 515 515 516 515 516 516 515
446 516 517 516 517 516 516 516
445 540 540 539 540 539 539 538
452 514 515 514 514 515 515 515
445 514 514 515 515 515 514 515
445 515 515 514 515 515 515 515
445 514 515 514 515 515 515 514
460 674 673 673 674 674 674 674
446 515 515 516 515 516 516 515
446 517 516 516 517 517 516 517
445 538 539 539 539 539 539 539
452 514 514 514 514 515 515 515
446 515 514 514 514 515 514 515
446 514 514 515 514 514 514 514
446 514 515 514 514 514 514 515
468 674 673 674 673 673 673 674
445 517 516 516 515 515 516 515
movq16
680 874 874 874 874 874 874 874
685 933 933 932 933 933 933 933
690 869 870 869 869 869 869 869
687 870 870 870 870 870 870 870
691 863 863 863 863 863 863 863
685 869 869 869 869 869 869 869
687 932 931 931 932 931 931 931
683 927 927 927 927 927 927 927
680 874 874 874 874 874 874 874
681 933 933 933 933 932 932 932
692 869 869 869 869 869 869 869
688 870 870 870 870 870 870 870
691 863 863 863 863 863 863 863
685 869 869 869 869 869 869 869
688 931 932 932 931 931 932 932
684 927 927 927 927 927 927 927
680 874 874 874 874 874 874 874
687 933 933 932 933 933 932 933
Dark Shikari
17th November 2007, 07:21
Awesome, I got our Pentium 4 Prescott results. Apparently the instruction lddqu does work properly on the Prescott, which will allow the speed boost to be implemented for Prescotts also, not just Core2/Core/P-M.
Wishbringer
17th November 2007, 07:22
...NOT...
Core 2s...
Athlon 64...
Does it mean only the dualcore C2D or C2Q also? Athlon64 singlecore or X2 too?
Dark Shikari
17th November 2007, 07:23
Does it mean only the dualcore C2D or C2Q also? Athlon64 singlecore or X2 too?All variants, core number doesn't matter.
Right now we basically only need the P3, as its the only remaining SSE2-supporting CPU we have yet to test.
St Devious
17th November 2007, 07:31
Intel Pentium M Centrino 1.5 Ghz
http://i10.tinypic.com/6slb1uu.jpg
nop: 440
movq8
519 517 518 516 516 516 518 516 517 21197
20896 20764 20845 20874 20908 20770 515 515 515
515 515 515 515 515 515 515 515 515 515
515 515 515 515 515 515 515 515 515 515 515
515 515 515 515 515 515 515 515 515 515
515 515 515 515 515 515 515 515 515 515
515 515 515 515 515 515 515 515 515 515
515 515 515 2414 2416 2412 2414 2413 2416 2416
515 515 515 515 515 515 515 515 515 515
515 515 515 515 515 515 515 515 515 515
515 515 515 515 515 515 515 515 515 515
515 515 515 515 515 515 515 515 515 515 515
515 515 515 515 515 515 515 515 515 515
515 515 515515 515 515 2420 2414 2418 2411
2414 2415 2413
movq16
869 21090 21016 21035 21128 21193 21203 21181 869
21017 21070 20957 21046 21024 20958 20941 869 869
869 869 869 869 869 869 869 869 869 869 869
869 869 869 869 869 869 869 869 869 869
869 869 869 869 869 869 869 869 869 869 869
869 869 869 869 869 869 869 869 869 869
869 869 869 869 869 2713 2713 2713 2707 2713
2711 2710 869 2624 2620 2625 2621 2622 2621 2625
870 869 869 869 869 869 869 869 869 869 869
869 869 869 869 869 869 869 869 869 869
869 869 869 869 869 869 869 869 869 869
869 869 869 869 869 869 869 869 869 869
869 869 869 869 869 869 869 869 2711 2710
2712 2712 2711 2714 2708 869 2621 2617 2623
2622 2617 2624 2616
movdqu
1037 20435 20476 20337 20338 20320 20323 20342 1037
20363 20364 20358 20333 20246 20337 20448 1037 1037
1037 1037 1037 1037 1037 1037 1037 1037 1037 1037
1037 1037 1037 1037 1037 1037 1037 1037 1037 1037
1037 1037 1037 1037 1037 1037 1037 1037 1037
1037 1037 1037 1037 1037 1037 1037 1037 1037 1037
1037 1696 1037 1037 1037 1037 1037 1037 2336
2344 2346 2342 2341 2342 2346 1037 2271 2268
2268 2272 2267 2270 2268 1037 1037 1037 1037 1037
1037 1037 1037 1037 1037 1037 1037 1037 1037
1037 1037 1037 1037 1037 1037 1037 1037 1037 1037
1037 1037 1037 1037 1037 1037 1037 1037 1037
1037 1037 1037 1037 1037 1037 1037 1037 1037 1037
1037 1037 1037 1037 1037 1037 2345 2340 2342 2343
2348 2344 2345 1037 2263 2267 2265 2273 2264 2265
2268
Inventive Software
17th November 2007, 07:42
AMD Turion64 X2 @ 1.6:
nop: 109
movq8
461 672 672 673 672 672 672 672
439 541 541 541 542 541 541 541
443 497 496 496 496 496 496 499
438 498 498 498 498 498 498 498
448 498 497 497 497 497 497 497
435 496 496 496 496 496 496 496
439 496 496 496 496 496 496 496
435 496 496 496 496 496 496 496
460 673 673 672 672 672 672 672
439 542 541 541 541 541 541 541
443 496 498 499 499 499 499 499
438 498 498 498 498 496 496 496
435 497 497 497 497 497 497 497
435 496 496 496 496 496 496 496
435 496 498 498 498 498 498 498
438 498 498 498 498 498 498 498
459 672 672 672 672 672 673 672
439 541 541 541 541 541 542 542
movq16
690 836 836 836 836 837 838 836
711 826 826 826 826 826 826 826
695 837 837 838 837 837 837 837
687 836 836 836 836 837 837 837
697 838 838 838 838 838 838 838
697 838 838 838 837 839 837 837
701 837 837 837 836 836 836 836
692 846 846 849 848 846 846 846
687 836 836 836 836 836 836 836
711 826 826 826 826 826 826 826
695 837 837 837 837 837 838 837
699 836 836 836 836 836 836 836
694 837 837 838 837 837 837 837
699 836 837 836 836 836 836 836
699 836 836 836 836 836 836 836
692 846 846 846 846 846 846 846
690 836 836 836 837 836 836 836
711 826 826 826 826 826 826 826
movdqu
694 981 981 981 981 981 981 989
698 953 953 953 953 953 953 953
698 930 930 931 930 930 930 930
660 937 937 937 937 937 937 937
692 937 937 937 937 937 938 937
662 937 937 937 938 937 937 937
662 937 937 937 937 937 937 937
660 938 938 938 938 938 938 938
694 981 981 981 981 981 981 981
660 953 953 953 953 953 953 953
699 930 930 930 930 930 931 930
698 937 937 937 937 938 937 937
698 937 937 937 937 937 937 937
660 937 937 937 937 937 937 937
660 937 937 937 937 937 937 937
698 969 969 970 969 969 969 969
708 1020 1020 1020 1020 1020 1020 1020
698 953 953 953 953 954 953 953
lddqu
694 981 981 981 981 981 981 981
661 953 953 953 953 953 954 953
698 930 930 930 931 930 930 930
660 937 937 937 937 937 937 937
660 938 937 937 937 937 937 938
662 937 937 937 937 938 937 937
662 937 937 939 937 937 937 937
660 941 944 938 938 938 938 940
705 1020 1020 1020 1020 990 1003 1020
698 953 953 953 953 953 953 953
699 930 930 930 930 930 931 930
698 937 937 937 937 937 937 937
698 937 937 938 937 937 937 937
698 939 937 937 937 937 937 937
698 937 937 937 937 938 937 937
698 969 969 969 969 969 969 969
708 1020 1020 1020 1020 1020 1016 981
660 953 953 953 953 953 953 953
bkman
17th November 2007, 07:47
Do you care about Celeron (Northwood) results?
Dark Shikari
17th November 2007, 07:48
Do you care about Celeron (Northwood) results?It might be interesting, I haven't seen a Celeron result yet.
The main thing this is testing is the behavior of cache line and page line boundaries.
bkman
17th November 2007, 07:49
Here you go then:
nop: 841
movq8
702 700 700 700 700 700 700 700
700 13580 13510 13566 13563 13519 13578 13567
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 3538 3538 3538 3538 3538 3538 3538
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 700 700 700 700 700 700 700
700 3544 3544 3544 3557 3558 3544 3544
movq16
1340 14645 14624 14624 14626 14624 14624 14624
1340 14400 14402 14400 14400 14423 14430 14400
1340 1340 1340 1340 1340 1340 1340 1340
1340 1340 1340 1340 1340 1340 1340 1340
1340 1342 1340 1340 1340 1340 1340 1340
1340 1340 1340 1340 1340 1340 1340 1340
1340 1340 1340 1342 1340 1340 1340 1340
1340 1340 1340 1340 1340 1340 1340 1340
1340 3872 3872 3872 3883 3903 3872 3873
1340 3892 3891 3788 3788 3851 3892 3892
1340 1340 1340 1340 1340 1340 1340 1340
1341 1344 1344 1344 1340 1340 1340 1340
1340 1340 1340 1340 1340 1340 1340 1340
1340 1340 1340 1340 1340 1340 1340 1340
1340 1340 1340 1340 1340 1340 1340 1340
1340 1340 1340 1340 1340 1340 1340 1340
1340 3872 3872 3872 3872 3872 3872 3872
1340 3788 3788 3788 3788 3788 3788 3788
movdqu
920 13662 13661 13660 13684 13693 13698 13660
920 13672 13710 13695 13672 13673 13827 13973
920 920 920 920 920 920 920 920
920 920 920 920 920 920 920 922
920 920 920 920 920 920 920 920
920 920 920 920 920 920 920 920
920 920 920 920 920 920 920 920
920 920 920 920 920 920 920 920
920 3724 3725 3724 3724 3724 3724 3724
920 3644 3644 3762 3725 3644 3644 3644
920 920 920 920 920 920 920 920
920 920 920 920 920 920 920 920
920 920 920 920 920 920 920 920
920 920 920 920 920 920 920 920
920 920 920 920 920 920 920 920
920 920 920 920 920 922 920 920
920 3724 3724 3724 3724 3724 3724 3724
920 3644 3644 3644 3644 3644 3644 3681
Inventive Software
17th November 2007, 08:32
Oh, I got a Celeron (Coppermine) I can run this on too, give me a couple minutes. :)
EDIT: Results are in. Celeron 800 MHz, 128 KB cache, 100 MHz FSB. SSE only.
nop: 330
movq8
754 752 752 752 752 752 752 752
752 15898 15912 15889 15925 15966 15914 15961
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 2564 2564 2564 2564 2564 2564 2564
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 2564 2564 2564 2564 2564 2564 2563
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 2564 2564 2564 2564 2564 2564 2564
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 2564 2564 2564 2564 2564 2564 2564
movq16
1299 16393 16349 16353 16340 16340 16314 16376
1299 16032 16003 15989 15966 15986 15995 15957
1299 1299 1299 1299 1299 1299 1299 1299
1299 1299 1299 1299 1299 1299 1299 1299
1299 2916 2916 2916 2916 2916 2916 2916
1299 2881 2881 2881 2881 2881 2881 2881
1299 1299 1299 1299 1299 1299 1299 1299
1299 1299 1299 1299 1299 1299 1299 1299
1299 2916 2916 2916 2916 2916 2916 2916
1299 2881 2881 2881 2881 2881 2881 2881
1299 1299 1299 1299 1299 1299 1299 1299
1299 1299 1299 1299 1299 1299 1299 1299
1299 2916 2916 2916 2916 2916 2916 2916
1299 2881 2881 2881 2881 2881 2881 2881
1299 1299 1299 1299 1299 1299 1299 1299
1299 1299 1299 1299 1299 1299 1299 1299
1299 2916 2916 2916 2916 2916 2916 2920
1299 2881 2881 2881 2881 2881 2881 2881
alinhan
17th November 2007, 08:55
Mobile Pentium 4, 2.66 GHz:
nop: 842
movq8
703 701 701 701 701 701 701 701
701 13697 13616 13620 13582 13611 13576 13586
701 701 702 701 702 701 701 701
701 701 702 701 701 701 701 701
701 701 701 701 701 701 701 701
701 701 702 701 701 701 701 701
701 701 701 701 701 701 701 701
701 701 701 701 701 701 701 701
701 701 702 701 701 702 702 701
701 3539 3538 3539 3538 3538 3539 3538
701 701 701 701 701 701 701 701
701 701 701 701 701 701 701 701
701 701 702 701 701 701 701 701
701 702 701 701 701 701 701 701
701 701 701 701 701 702 701 701
702 701 701 701 701 701 702 701
701 701 701 702 701 701 701 701
701 3538 3539 3538 3535 3538 3538 3538
movq16
1341 14680 14723 14693 14703 14676 17309 14754
1340 14467 14497 14474 14472 14554 14436 14483
1341 1340 1340 1342 1340 1341 1340 1342
1340 1341 1340 1341 1340 1340 1341 1340
1341 1340 1341 1340 1341 1340 1340 1340
1341 1340 1340 1340 1341 1340 1340 1340
1340 1340 1340 1340 1340 1340 1340 1340
1342 1340 1340 1340 1342 1340 1340 1341
1340 3891 3873 3881 3873 3874 3879 3873
1340 3795 3790 3800 3790 3790 3802 3790
1340 1340 1340 1340 1340 1341 1341 1340
1340 1340 1340 1340 1340 1340 1342 1341
1341 1340 1340 1340 1341 1341 1340 1340
1341 1341 1340 1341 1341 1340 1340 1340
1340 1340 1340 1340 1340 1340 1341 1340
1340 1340 1340 1340 1341 1340 1340 1341
1340 3890 3873 3882 3886 3898 3873 3897
1340 3790 3806 3855 3790 3790 3798 3797
movdqu
921 13914 13893 13728 13724 13743 13741 13746
921 13794 13876 13822 13723 13859 13803 13747
921 921 921 921 921 921 921 921
921 921 921 921 920 921 921 921
921 921 921 921 922 921 921 921
921 921 921 921 921 921 921 921
921 921 921 921 921 921 921 921
921 920 921 921 921 921 921 921
921 3724 3724 3724 3724 3724 4130 3724
921 3647 3646 3646 3650 3646 3646 3646
922 921 921 921 921 921 921 921
921 921 921 921 921 921 921 921
921 921 921 921 921 921 921 933
921 921 921 921 921 923 923 922
922 922 922 921 923 921 922 920
921 921 921 921 921 921 921 921
921 3724 3724 3724 3724 3724 3724 3724
921 3646 3646 3646 3658 3650 3646 3652
celtic_druid
17th November 2007, 09:16
What about VIA C7's? The C7 supports SSE2/3.
TEB
17th November 2007, 10:43
Intel Mobile Core Duo T7300 2gz
nop: 1649
movq8
994 990 990 990 990 990 990 990
992 99609 99044 99584 99432 99244 99496 99438
991 990 992 990 990 991 990 991
990 990 990 991 990 990 990 992
991 990 990 990 990 991 990 993
992 990 990 990 990 990 990 990
990 990 990 991 990 991 990 990
990 990 990 990 990 990 990 991
992 992 990 990 990 990 990 990
990 6897 6968 6933 6963 7006 6918 6879
990 990 990 990 990 990 991 990
990 990 990 990 990 990 990 990
990 990 990 981 990 990 993 990
990 990 990 991 990 990 991 990
990 991 990 992 990 990 990 990
990 990 990 992 991 990 990 990
991 991 990 990 990 991 990 990
990 6880 6886 6879 6885 6896 6937 6971
movq16
1788 99454 99321 100024 99892 99812 99697 99389
1781 99247 99668 99556 99518 99463 99478 99733
1790 1789 1788 1788 1788 1788 1788 1788
1788 1791 1788 1788 1788 1788 1788 1788
1788 1788 1788 1788 1788 1789 1788 1788
1788 1791 1788 1788 1788 1787 1788 1790
1788 1790 1785 1788 1789 1788 1788 1790
1790 1790 1789 1788 1788 1788 1788 1788
1788 6886 6895 6883 6891 6980 6974 6918
1791 7054 7010 6938 6954 6939 6948 6946
1795 1780 1795 1784 1782 1795 1795 1795
1791 1786 1775 1785 1796 1788 1788 1791
1788 1789 1788 1788 1788 1788 1788 1788
1788 1788 1788 1788 1788 1788 1788 1788
1788 1789 1788 1788 1788 1788 1788 1788
1788 1785 1786 1785 1785 1788 1789 1788
1788 7000 6887 6892 6883 6887 6893 6916
1788 6949 6939 7044 6961 6942 6938 6960
movdqu
1424 97036 97732 97214 97761 97074 97414 97087
1423 97468 97797 97900 97692 97607 97505 97435
1423 1423 1423 1423 1424 1423 1423 1423
1423 1423 1423 1423 1423 1423 1424 1424
1423 1423 1423 1423 1423 1423 1424 1423
1423 1423 1423 1423 1423 1423 1423 1423
1423 1423 1423 1423 1423 1423 1423 1423
1423 1423 1423 1424 1415 1425 1425 1421
1425 6047 6125 6119 6046 6046 6055 6047
1419 6055 6061 6073 6066 6056 6053 6141
1424 1421 1423 1424 1423 1423 1423 1423
1423 1422 1423 1423 1423 1424 1423 1423
1423 1423 1423 1423 1423 1423 1423 1423
1422 1425 1420 1427 1421 1420 1425 1423
1424 1420 1423 1425 1424 1423 1423 1423
1423 1423 1423 1423 1423 1423 1423 1423
1423 6062 6052 6048 6046 6057 6053 6057
1423 6122 6105 6056 6054 6063 6057 6139
lddqu
1423 97293 97649 97428 97598 97617 97441 97329
1423 97688 98614 98610 98246 98664 97656 99034
1426 1417 1422 1425 1423 1422 1423 1423
1423 1423 1423 1423 1423 1423 1423 1423
1423 1423 1423 1423 1423 1423 1423 1423
1423 1423 1423 1423 1423 1425 1423 1423
1423 1423 1423 1423 1424 1423 1423 1423
1423 1423 1423 1423 1424 1423 1425 1423
1423 6118 6092 6119 6074 6136 6192 6133
1424 6120 6123 6212 6150 6143 6144 6127
1421 1422 1423 1423 1423 1423 1423 1422
1423 1423 1426 1423 1423 1423 1423 1424
1423 1423 1423 1423 1423 1423 1423 1423
1423 1425 1423 1423 1423 1423 1423 1423
1423 1425 1423 1423 1423 1423 1423 1423
1423 1423 1423 1423 1423 1423 1423 1423
1423 6206 6114 6106 6165 6136 6104 6121
1424 6156 6142 6181 6142 6174 6127 6105
palignr
1456 2395 2396 2395 2401 2395 2399 2395
1455 2395 2396 2401 2396 2395 2395 2395
1455 1455 1455 1455 1455 1455 1456 1458
1456 1455 1455 1455 1455 1455 1456 1455
1455 1455 1456 1455 1455 1455 1456 1455
1455 1455 1457 1455 1455 1455 1455 1517
1455 1455 1456 1455 1457 1455 1455 1455
1456 1455 1455 1455 1456 1458 1455 1455
1455 2396 2395 2396 2395 2397 2395 2395
1457 2395 2395 2396 2395 2395 2398 2395
1455 1456 1455 1455 1455 1455 1455 1455
1457 1455 1455 1455 1455 1456 1457 1455
1455 1456 1455 1455 1455 1457 1455 1455
1455 1456 1455 1455 1455 1455 1455 1455
1455 1456 1457 1455 1455 1455 1455 1455
1455 1456 1457 1455 1455 1455 1455 1455
1455 2399 2395 2399 2395 2395 2399 2395
1457 2396 2395 2396 2395 2395 2396 2379
GmorG McRoth
17th November 2007, 10:59
Name: Intel Pentium III EB
Codename: Coppermine
Stock frequency: 800 MHz
nop: 330
movq8
753 753 752 752 752 752 752 752
752 16004 15953 15949 15983 16037 15950 15995
752 752 752 752 752 753 752 753
752 752 752 753 752 752 752 752
753 752 752 752 752 752 752 752
753 2565 2564 2564 2564 2563 2566 2564
752 753 752 753 752 752 752 753
752 753 752 753 752 752 752 752
752 752 752 752 753 752 752 752
753 2563 2564 2566 2563 2566 2563 2565
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 753
752 752 753 752 753 752 752 752
753 2564 2564 2563 2563 2564 2564 2564
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 753
752 2563 2563 2567 2564 2567 2563 2564
movq16
1301 16737 16429 16401 16391 16386 16374 16456
1299 16001 15984 16026 16039 15999 15988 16007
1299 1300 1300 1299 1299 1300 1301 1299
1299 1299 1299 1301 1301 1299 1299 1299
1299 2920 2917 2915 2917 2916 2916 2916
1299 2883 2880 2883 2882 2880 2884 2881
1299 1299 1299 1300 1299 1301 1300 1299
1299 1300 1300 1299 1300 1299 1299 1299
1299 2917 2916 2916 2917 2915 2917 2916
1300 2881 2884 2883 2880 2880 2881 2883
1299 1299 1299 1300 1299 1299 1299 1299
1299 1299 1301 1300 1299 1299 1299 1300
1299 2916 2915 2917 2917 2916 2916 2919
1299 2880 2881 2880 2881 2881 2881 2880
1299 1299 1300 1300 1299 1299 1299 1300
1300 1299 1300 1299 1299 1299 1299 1300
1300 2916 2916 2917 2916 2915 2917 2917
1299 2882 2880 2880 2883 2881 2881 2883
akupenguin
17th November 2007, 13:21
Updated benchmark program: bench_align_v28.7z (http://akuvian.org/src/x264/bench_align_v28.7z)
Slightly improved palignr, and added mmx and sse2 functions. Please run the new version if you have an intel cpu without sse3.
@TEB: is that a Core, or a Core2? Because I'm interested in Core results, but wiki says that Core doesn't have SSSE3.
clsid
17th November 2007, 13:31
Link gives a 404.
clsid
17th November 2007, 13:34
(results from old bench)
AMD Athlon Thunderbird 1.33 GHz
nop: 110
movq8
469 673 674 673 674 674 674 674
445 516 516 516 515 516 516 515
446 517 516 516 516 517 516 516
445 539 539 538 538 538 538 538
448 514 514 514 514 515 515 515
446 514 514 514 515 514 514 514
445 515 514 515 515 515 514 514
446 514 515 515 515 515 514 514
457 674 673 673 673 673 673 673
445 516 515 516 515 516 515 516
446 516 517 517 517 516 516 517
445 539 538 538 538 540 539 540
451 514 515 515 515 514 514 514
445 514 515 515 514 514 515 514
446 515 514 514 515 514 514 514
446 515 514 515 515 515 515 515
471 674 674 673 673 673 674 674
446 516 516 516 515 516 515 515
movq16
678 874 874 874 874 874 874 874
689 933 932 933 932 933 932 932
686 869 869 869 869 869 869 869
686 870 870 870 870 870 870 870
691 863 863 863 863 863 863 863
682 869 869 869 869 869 869 869
686 931 932 932 931 931 932 932
684 927 927 927 927 927 927 927
682 874 874 874 874 874 874 874
681 932 932 932 932 933 932 933
685 869 869 869 869 869 869 869
691 870 870 870 870 870 870 870
691 863 863 863 866 863 863 863
692 869 869 869 869 869 869 869
685 931 932 932 931 932 932 932
682 927 927 927 927 927 927 927
678 874 874 874 874 874 874 874
690 932 933 932 933 933 932 932
GmorG McRoth
17th November 2007, 13:51
Updated benchmark program: bench_align_v28.7z (http://akuvian.org/src/x264/bench_align_v28.7z)
Slightly improved palignr, and added mmx and sse2 functions. Please run the new version if you have an intel cpu without sse3.
Name: Intel Pentium III EB
Codename: Coppermine
Stock frequency: 800 MHz
nop: 330
movq8: avg 12102
760 760 759 759 759 759 759 759
760 16008 15984 16040 15973 15949 15989 15978
759 759 759 759 759 759 759 759
759 759 760 759 759 759 759 759
759 760 759 760 759 760 759 760
759 2573 2575 2564 2564 2565 2568 2564
759 759 759 760 759 759 759 760
759 760 759 759 759 760 759 759
759 759 759 759 760 759 759 759
760 2565 2565 2563 2564 2565 2564 2563
759 759 759 759 760 759 760 759
759 759 759 759 759 759 760 759
759 760 759 759 759 759 759 759
759 2565 2563 2564 2566 2564 2564 2565
759 760 759 759 759 759 759 760
759 760 759 759 759 759 759 759
759 759 760 759 759 759 759 759
760 2563 2563 2565 2564 2564 2566 2564
psllq8: avg 10651
783 781 781 782 781 781 781 781
781 1264 1263 1263 1263 1263 1263 1264
781 781 781 781 782 781 782 781
781 782 781 781 781 781 781 781
781 781 781 782 781 782 781 781
781 2594 2592 2592 2595 2594 2593 2591
782 781 781 782 781 781 781 781
781 781 781 782 781 781 781 782
781 781 781 781 781 781 781 781
781 1265 1264 1263 1263 1264 1263 1263
781 781 781 781 781 781 781 781
781 781 781 782 781 782 781 781
781 781 781 781 781 782 781 782
781 2592 2594 2593 2592 2592 2593 2593
781 781 781 781 781 781 781 782
781 781 781 781 781 782 781 781
781 782 781 781 781 781 781 781
781 1264 1263 1264 1263 1263 1263 1263
movq16: avg 20686
1284 16389 16389 16382 16402 16415 16386 16391
1284 16017 15992 15987 16024 15981 15980 16004
1284 1286 1285 1285 1284 1284 1285 1285
1284 1286 1285 1284 1284 1284 1286 1284
1285 2931 2930 2931 2930 2933 2930 2930
1284 2865 2863 2863 2863 2868 2863 2863
1285 1284 1286 1285 1284 1284 1285 1284
1286 1284 1284 1284 1285 1284 1284 1285
1284 2930 2934 2930 2930 2931 2931 2930
1284 2863 2863 2863 2863 2865 2863 2863
1286 1285 1284 1284 1284 1285 1284 1285
1285 1284 1284 1284 1285 1284 1286 1285
1284 2929 2932 2930 2930 2933 2930 2929
1285 2865 2865 2862 2867 2865 2862 2864
1284 1285 1285 1284 1284 1284 1286 1285
1284 1285 1284 1284 1284 1286 1285 1284
1285 2933 2932 2931 2930 2930 2930 2929
1285 2865 2864 2863 2863 2864 2862 2865
Inventive Software
17th November 2007, 13:59
Intel Celeron 800 MHz (Coppermine core, MMX, SSE):
nop: 330
movq8: avg 12096
760 759 759 759 759 759 759 759
759 15943 15958 15932 15949 15949 15936 15939
759 759 759 759 759 759 759 759
759 759 759 759 759 759 759 759
759 759 759 759 759 759 759 759
759 2564 2564 2564 2564 2564 2564 2564
759 759 759 759 759 759 759 759
759 759 759 759 759 759 759 759
759 759 759 759 759 759 759 759
759 2564 2564 2564 2564 2564 2564 2564
759 759 759 759 759 759 759 759
759 759 759 759 759 759 759 759
759 759 759 759 759 759 759 759
759 2564 2564 2564 2564 2564 2564 2564
759 759 759 759 759 759 759 759
759 759 759 759 759 759 759 759
759 759 759 759 759 759 759 759
759 2564 2564 2564 2564 2564 2564 2564
psllq8: avg 10648
783 781 781 781 781 781 781 781
781 1263 1263 1263 1263 1263 1263 1263
782 781 781 781 781 781 781 781
781 781 781 781 781 781 781 781
781 781 781 781 781 781 781 781
781 2592 2592 2592 2592 2592 2592 2592
781 781 781 781 781 781 781 781
781 781 781 781 781 781 781 781
781 781 781 781 781 781 781 781
781 1263 1263 1263 1263 1263 1263 1263
781 781 781 781 781 781 781 781
781 781 781 781 781 781 781 781
781 781 781 781 781 781 782 781
781 2592 2592 2592 2592 2592 2592 2592
781 781 781 781 781 781 781 781
781 781 781 781 781 781 781 781
781 781 781 781 781 781 781 781
781 1263 1263 1263 1263 1263 1263 1263
movq16: avg 20681
1285 16365 16368 16367 16354 16362 16370 16433
1284 16037 16000 16029 16002 16024 15999 16015
1284 1284 1284 1284 1284 1284 1284 1284
1284 1284 1284 1284 1284 1284 1284 1284
1284 2930 2930 2930 2930 2930 2930 2930
1284 2863 2863 2863 2863 2863 2863 2862
1284 1284 1284 1284 1284 1285 1284 1284
1284 1284 1284 1284 1284 1284 1284 1284
1284 2930 2935 2938 2936 2932 2938 2930
1284 2863 2862 2863 2863 2871 2862 2863
1284 1284 1284 1284 1284 1284 1284 1284
1284 1284 1284 1284 1284 1284 1284 1284
1285 2930 2930 2934 2930 2930 2930 2930
1285 2863 2863 2862 2863 2862 2863 2863
1284 1284 1284 1284 1284 1284 1284 1284
1284 1284 1284 1284 1284 1284 1284 1284
1284 2930 2930 2930 2930 2930 2930 2930
1284 2862 2863 2863 2863 2863 2863 2863
AMD Turion X2 @ 1.6 GHz (MMX, SSE, SSE2, SSE3, both 3DNow extensions):
nop: 99
movq8: avg 5371
459 667 667 669 669 669 669 669
440 543 545 544 544 543 542 542
445 497 498 497 498 497 515 515
475 517 516 516 516 516 516 515
473 517 515 498 517 518 518 515
440 497 497 512 515 515 516 516
473 534 534 534 535 534 534 534
466 516 516 516 516 516 516 516
481 667 667 667 668 669 669 669
443 544 544 543 543 543 544 544
445 499 515 515 515 515 515 515
465 500 499 499 512 515 515 515
458 499 498 499 498 498 498 517
468 516 516 516 500 497 498 498
436 534 534 534 534 523 534 516
436 499 500 516 516 516 516 516
481 667 667 669 669 667 667 667
463 557 551 544 544 549 557 557
psllq8: avg 5783
461 669 668 669 670 669 668 670
466 774 778 776 769 749 752 752
442 525 523 537 533 523 523 534
467 548 536 536 541 532 523 524
446 538 548 553 541 541 538 552
470 536 536 554 551 540 540 543
471 542 551 542 552 545 541 541
434 543 556 557 554 554 555 557
469 670 670 669 670 670 670 669
435 752 752 750 751 752 751 752
439 523 535 529 530 536 537 528
434 529 526 523 523 532 554 536
465 548 541 552 537 537 548 547
472 536 548 551 535 523 532 537
434 542 546 544 542 549 551 543
462 554 555 557 553 553 553 558
470 669 669 669 670 669 669 670
462 775 776 775 776 776 777 755
movq16: avg 8373
700 837 837 837 837 837 837 838
712 829 842 842 838 841 842 842
720 852 854 863 838 838 838 838
691 839 839 839 839 840 840 840
696 839 839 839 839 839 840 839
694 848 848 848 857 860 849 849
692 842 853 839 837 837 837 837
693 837 837 837 837 837 844 853
720 851 851 851 851 851 851 852
733 842 842 842 842 842 842 842
718 852 853 854 852 852 852 852
719 851 851 851 852 851 851 851
719 852 852 852 852 853 838 838
700 847 847 847 848 847 847 857
714 852 852 851 837 837 837 837
687 838 838 837 837 838 839 837
695 837 837 845 851 851 851 849
712 827 842 842 842 842 834 827
movdqu: avg 9548
712 1028 1025 1007 1000 1008 1020 1028
721 972 978 978 979 980 979 980
705 965 972 978 979 974 981 980
711 979 979 978 980 979 980 979
708 982 982 982 981 981 980 981
707 964 978 979 963 980 980 979
706 985 967 965 966 965 965 966
703 993 993 999 992 991 992 991
733 1011 1000 1000 1000 1000 1000 1001
698 960 978 979 980 958 958 968
707 980 979 978 973 971 980 980
710 969 958 958 957 957 958 958
680 961 959 959 960 959 960 976
705 980 978 965 958 958 958 958
683 973 987 987 987 987 987 988
706 991 991 991 979 971 971 971
710 1001 1004 1000 1001 1030 1030 1030
719 980 980 980 979 978 978 979
lddqu: avg 9552
711 1008 1008 1019 1017 1006 1013 1006
701 979 977 978 974 958 957 958
680 957 957 958 958 957 958 958
703 978 979 978 964 957 968 979
710 980 982 973 974 981 980 964
679 958 974 979 978 979 961 960
706 986 986 987 988 972 988 987
710 991 972 986 992 992 986 971
710 1008 1024 1023 1027 1011 1000 1022
721 978 979 962 957 957 957 957
683 958 958 973 979 991 992 958
679 958 958 959 970 979 979 980
705 983 981 980 982 983 981 981
706 980 975 958 958 958 958 957
690 988 988 988 986 986 988 988
710 991 991 991 992 991 991 992
723 1008 1008 1008 1006 1029 1025 1026
720 979 978 980 969 966 978 978
pslldq: avg 11237
705 1720 1720 1717 1719 1719 1739 1723
719 1740 1726 1730 1738 1732 1715 1738
720 993 995 981 996 990 999 982
693 976 990 990 990 990 990 991
719 998 996 992 996 1001 999 992
719 996 990 990 996 999 982 973
692 978 978 978 981 982 984 984
693 980 975 970 970 983 984 979
701 1726 1721 1724 1738 1741 1748 1743
693 1719 1720 1735 1722 1742 1745 1718
695 994 974 984 999 999 999 996
703 990 993 996 996 990 990 990
719 980 977 975 978 992 989 986
693 974 971 971 971 971 974 976
692 978 978 980 980 984 990 982
693 970 984 993 1006 998 988 988
723 1722 1730 1746 1737 1728 1724 1727
693 1741 1741 1734 1741 1733 1724 1725
EDIT: I don't see how this is gonna help development.... can Dark Shikari or akupenguin elaborate a bit more as to why we're guinea pigs here? ;)
canTsTop
17th November 2007, 14:01
AMD Athlon XP 1700+ Thoroughbred
nop: 110
movq8: avg 5384
464 667 668 667 667 667 667 668
445 514 514 514 514 515 515 515
446 515 514 514 514 514 514 514
445 544 543 532 532 532 532 532
454 515 514 514 515 515 515 515
445 514 514 514 514 514 514 515
446 535 535 534 535 534 534 534
445 516 517 517 516 516 516 516
457 668 667 667 667 667 667 667
446 515 515 515 515 515 514 514
445 514 514 514 514 515 515 515
446 532 539 539 532 532 532 533
448 514 515 515 515 515 514 514
445 514 514 514 515 515 515 515
445 534 534 534 534 534 534 535
442 517 517 517 517 516 516 516
457 668 667 667 667 667 667 667
446 514 514 514 515 515 515 515
psllq8: avg 5807
463 662 662 662 662 662 662 662
451 838 837 837 837 837 837 837
445 515 515 515 515 515 515 517
451 532 532 532 534 539 537 536
448 537 537 533 535 532 532 532
451 517 517 518 518 518 518 518
447 526 526 526 526 526 528 528
451 564 564 564 565 564 564 564
473 662 662 662 662 662 662 662
451 838 837 837 837 837 837 837
445 515 516 518 518 518 518 516
451 539 539 539 539 533 532 537
448 537 536 532 532 532 536 537
445 515 515 515 515 518 516 515
445 526 526 526 526 526 526 527
451 564 564 564 564 565 565 564
465 662 662 662 662 662 662 662
451 837 837 837 837 837 837 838
movq16: avg 8773
687 869 869 869 869 869 869 869
682 931 931 931 931 931 931 931
682 869 869 869 869 869 869 869
687 871 871 871 871 871 871 871
690 863 863 863 863 863 863 863
692 870 870 870 870 870 870 870
678 937 937 937 937 936 937 937
682 927 927 927 927 927 927 927
682 869 869 869 869 869 869 869
689 932 932 932 931 931 931 931
682 869 869 869 869 869 869 869
682 871 871 871 871 871 871 871
690 863 863 863 863 863 863 863
686 870 870 870 870 870 870 870
684 936 936 937 937 937 936 936
685 927 927 927 927 927 927 927
682 869 869 869 869 869 869 869
690 932 932 932 931 931 931 931
celtic_druid
17th November 2007, 14:31
Via C7 @ 1.5GHz Under Xubuntu, code from above link with patch applied and linked against current SVN.
nop: 220
movq8
1022 1020 1020 1020 1020 1020 1022 1020
1020 2547 2548 2542 2564 2546 2546 2546
1023 1020 1020 1020 1022 1020 1020 1020
1020 1021 1021 1022 1020 1022 1020 1022
1020 1020 1022 1020 1020 1020 1022 1020
1020 2544 2546 2549 2546 2550 2547 2547
1020 1020 1020 1021 1022 1020 1022 1020
1022 1020 1020 1020 1020 1020 1052 1020
1020 1020 1020 1020 1020 1020 1020 1020
1020 2548 2542 2548 2543 2543 2542 2543
1022 1020 1020 1020 1022 1022 1020 1020
1022 1020 1020 1020 1021 1020 1020 1020
1020 1020 1020 1020 1020 1020 1022 1020
1020 2555 2546 2546 2547 2548 2545 2549
1020 1020 1020 1020 1022 1020 1021 1020
1020 1020 1020 1020 1020 1020 1021 1020
1020 1020 1022 1020 1020 1020 1020 1020
1020 2545 2544 2544 2547 2549 2544 2549
movq16
1662 3282 3262 3265 3267 3274 3263 3267
1660 3508 3509 3510 3515 3507 3505 3511
1660 1660 1665 1662 1662 1663 1660 1660
1662 1660 1660 1662 1663 1663 1662 1660
1663 3269 3271 3268 3269 3269 3268 3270
1660 3512 3511 3512 3511 3503 3512 3506
1662 1660 1667 1662 1660 1663 1662 1663
1662 1662 1664 1660 1662 1660 1662 1660
1662 3271 3266 3275 3269 3274 3271 3269
1660 3511 3511 3506 3509 3514 3501 3511
1660 1664 1660 1662 1660 1662 1664 1662
1660 1663 1662 1662 1666 1662 1660 1664
1660 3272 3266 3277 3270 3267 3271 3267
1668 3502 3506 3508 3507 3510 3515 3505
1662 1663 1664 1660 1668 1663 1662 1660
1662 1661 1660 1663 1662 1662 1662 1664
1660 3269 3267 3271 3270 3271 3271 3264
1662 3507 3510 3505 3515 3515 3510 3512
movdqu
1617 3196 3199 3193 3191 3202 3193 3197
1609 3200 3205 3199 3199 3196 3201 3201
1609 1609 1611 1611 1611 1611 1609 1611
1613 1609 1612 1613 1612 1609 1613 1613
1612 3199 3194 3190 3194 3199 3195 3196
1611 3198 3195 3199 3197 3203 3196 3197
1609 1609 1609 1616 1611 1609 1614 1611
1609 1611 1613 1609 1609 1611 1611 1611
1611 3195 3199 3199 3199 3199 3195 3195
1611 3198 3195 3195 3199 3197 3206 3201
1613 1609 1612 1609 1609 1609 1613 1611
1609 1609 1613 1611 1614 1611 1616 1609
1609 3199 3197 3193 3193 3197 3200 3192
1611 3194 3198 3196 3196 3203 3201 3194
1611 1611 1609 1613 1611 1609 1614 1610
1612 1609 1611 1609 1609 1609 1611 1611
1609 3198 3193 3197 3196 3192 3195 3198
1613 3195 3204 3191 3196 3193 3199 3196
lddqu
1612 3196 3199 3194 3197 3197 3200 3194
1613 3201 3200 3199 3195 3196 3203 3196
1609 1615 1609 1609 1611 1609 1611 1611
1609 1611 1611 1609 1615 1611 1609 1614
1611 3199 3191 3200 3198 3189 3196 3198
1611 3194 3194 3200 3195 3203 3204 3199
1613 1609 1613 1611 1612 1611 1610 1611
1609 1611 1609 1609 1611 1610 1614 1614
1611 3193 3195 3196 3201 3193 3200 3196
1613 3200 3195 3196 3194 3202 3195 3199
1612 1610 1611 1612 1617 1609 1611 1611
1611 1613 1612 1612 1611 1611 1612 1611
1612 3196 3197 3204 3200 3198 3199 3191
1609 3196 3194 3193 3203 3198 3198 3200
1611 1609 1618 1609 1610 1609 1611 1609
1613 1613 1612 1613 1611 1611 1609 1611
1609 3195 3201 3199 3193 3197 3198 3195
1611 3197 3196 3197 3198 3193 3196 3194
akupenguin
17th November 2007, 14:42
I don't see how this is gonna help development.... can Dark Shikari or akupenguin elaborate a bit more as to why we're guinea pigs here?
There are certain situations where memory accesses are slow. The total cost of these situations is about the same as the normal computation load of SAD. i.e. motion estimation spends up to half its time doing pathological memory loads rather than computing anything useful. There are several ways to avoid the pathological cases, but they have their own costs. The point of this exercise is to know which versions of the code are useful on which cpus.
The results so far are: AMD cpus have no pathological cases, they're just fine with x264-svn. Pentium 4D/4E and Core1 can eliminate the pathological cases using SSE3 (wow, it's good for something after all?). Core2 can't use SSE3, and has to use a different workaround. Pre-SSE3 cpus have to use yet another workaround.
There's also a generic workaround for all cpus, but it requires extra memory, so even if it makes SAD faster it doesn't necessarily make the whole encode faster.
bob0r
17th November 2007, 14:47
bench_align.exe: 69.3 KB (71,050 bytes) md5: 468d4829941c44f93ad0a356a4704327
Intel Core Duo D930 2x3.0GHz
http://x264.nl/bench_align/bench_align_run1.txt
http://x264.nl/bench_align/bench_align_run2.txt
http://x264.nl/bench_align/bench_align_run3.txt
http://x264.nl/bench_align/bench_align_run4.txt
http://x264.nl/bench_align/bench_align_run5.txt
shon3i
17th November 2007, 14:49
AMD Athlon64 2800+ @ 1.8Ghz
nop: 79
movq8: avg 5341
460 666 666 666 666 666 666 666
437 543 543 543 544 555 555 555
475 517 517 517 505 498 498 499
437 499 499 509 517 517 517 514
444 508 519 519 519 519 508 501
456 517 505 498 498 498 498 498
445 516 517 534 534 534 534 534
464 518 518 518 503 507 518 518
482 666 666 666 666 666 666 666
437 544 555 555 555 555 555 548
453 498 498 498 498 498 498 512
460 517 516 498 498 498 511 517
476 519 519 519 519 517 501 501
437 513 517 517 517 517 517 517
460 523 516 519 534 534 534 534
464 516 501 518 502 498 499 499
458 666 666 666 666 666 666 666
460 555 555 543 543 543 550 555
psllq8: avg 5752
461 664 664 664 664 664 664 664
468 771 750 750 752 753 753 753
442 538 538 538 538 538 538 538
455 537 537 539 548 548 548 548
464 549 549 540 538 538 538 522
441 529 548 548 548 542 521 521
458 544 542 541 544 544 544 550
451 543 543 543 543 543 559 563
465 664 665 664 664 664 664 665
434 753 773 770 751 753 753 771
470 538 529 521 521 536 537 537
441 541 543 545 534 521 521 528
434 538 538 547 549 549 545 539
445 537 534 521 521 525 521 521
441 544 544 544 544 541 541 542
468 563 563 563 563 563 558 553
466 664 664 665 665 665 665 664
441 757 778 778 778 765 752 753
movq16: avg 8358
696 849 851 843 837 837 837 837
727 856 852 850 841 841 840 835
696 837 837 846 851 849 837 837
709 837 837 837 839 851 847 849
724 851 838 838 838 838 850 839
710 846 846 846 847 861 861 861
705 835 835 836 849 835 835 835
687 844 845 859 859 859 855 844
696 837 851 851 851 843 838 837
708 835 839 838 838 850 850 848
710 837 837 840 846 837 837 837
698 837 840 846 837 849 845 845
702 838 838 838 849 852 845 838
696 846 846 857 861 859 856 846
691 835 846 835 847 847 835 835
693 844 844 856 859 977 844 848
719 837 849 851 841 837 840 851
727 855 856 846 835 839 850 850
movdqu: avg 9945
782 1023 1030 1023 1029 1029 1050 1050
789 1004 1002 1001 1005 1003 1001 1018
805 1029 1018 1003 1002 1018 1027 1009
780 1003 1002 1024 1029 1029 1020 1002
780 1027 1004 1024 1006 1006 1008 1031
806 1006 1004 1004 1006 1027 1027 1027
777 1009 1002 1004 1008 1028 1023 1004
776 1019 1022 1040 1032 1018 1019 1019
781 1046 1029 1048 1049 1047 1034 1029
791 1027 1004 1001 1015 1029 1029 1029
776 1004 1029 1023 1002 1002 1002 1002
781 1003 1004 1009 1027 1028 1003 1003
781 1004 1004 1004 1004 1006 1006 1006
781 1004 1002 1002 1002 1004 1004 1004
776 1003 1002 1002 1002 1002 1009 1028
802 1032 1019 1019 1023 1027 1014 1019
786 1047 1036 1029 1028 1029 1029 1028
779 1005 1001 1002 1001 1003 1001 1006
pslldq: avg 11536
801 1719 1717 1723 1720 1722 1717 1723
792 1723 1727 1715 1728 1715 1729 1718
792 1019 1032 1030 1020 1023 1032 1007
792 1006 1009 1022 1012 1006 1008 1000
792 1002 1002 1002 1005 1034 1034 1031
815 1011 1008 1000 1005 1029 1029 1010
791 1009 1014 1032 1032 1028 1017 1032
818 1051 1051 1047 1019 1030 1012 1016
792 1723 1720 1734 1738 1730 1734 1717
792 1725 1720 1723 1724 1720 1724 1715
792 1001 1012 1014 1020 1010 1032 1024
792 1015 1007 1010 1000 1017 1032 1032
815 1019 1002 1007 1011 1025 1032 1009
792 1022 1029 1018 1000 1008 1007 1031
818 1021 1009 1006 1006 1009 1013 1017
793 1019 1019 1030 1020 1025 1024 1030
810 1735 1724 1730 1738 1738 1716 1728
816 1735 1731 1737 1724 1717 1715 1716
Jerry_Sm@rt
17th November 2007, 14:50
Ahthlon XP 2000+@1.67G
nop: 110
movq8
458 674 674 674 673 673 674 674
446 516 516 516 515 516 516 516
446 517 517 517 517 517 517 517
446 540 540 539 538 538 538 538
448 514 515 515 515 515 514 514
445 514 514 514 514 514 514 514
446 515 515 515 515 515 515 515
446 514 515 515 514 514 514 514
457 673 673 673 674 673 673 673
445 515 516 516 516 516 516 516
446 517 516 516 516 516 516 516
446 540 540 540 540 540 540 540
454 515 515 515 515 515 515 515
445 514 514 514 514 514 514 514
445 515 514 514 514 514 515 515
446 515 514 514 514 515 515 515
473 673 674 674 674 674 674 674
446 516 516 516 516 516 516 516
movq16
678 874 874 874 874 874 874 874
681 932 933 933 933 933 933 933
682 869 869 869 869 869 869 869
684 870 870 870 870 870 870 870
690 863 863 863 863 863 863 863
692 869 869 869 869 870 869 869
688 931 931 931 932 931 932 932
682 927 927 927 927 927 927 927
682 874 874 874 874 874 874 874
688 933 933 933 933 933 932 932
692 869 869 869 869 869 869 869
682 870 870 870 870 870 870 870
692 863 863 863 863 863 863 863
682 869 869 869 869 869 869 869
682 932 932 932 932 932 931 931
685 927 927 927 927 927 927 927
683 874 874 874 874 874 874 874
681 933 933 933 933 932 932 932
akupenguin
17th November 2007, 15:01
OK, we have enough benchmarks now. If someone has a Penryn or a K10, those would be interesting, but if not that just means we won't optimize those cpus yet :) Everything else is covered.
bond
17th November 2007, 15:17
Intel Pentium IIIE, 866 MHz (6.5 x 133)
Coppermine, CuMine, A80526
nop: 330
movq8
754 752 752 752 752 752 752 752
752 15931 15940 15928 15915 15930 15933 15932
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 2568 2567 2567 2569 2564 2564 2571
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 2564 2564 2564 2567 2567 2569 2567
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 2567 2564 2567 2564 2567 2564 2567
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 752 752 752 752 752 752 752
752 2567 2567 2567 2565 2564 2570 2567
movq16
1299 16351 16346 16348 16401 16347 16476 16381
1299 15976 15989 15977 15997 15973 15969 15995
1299 1299 1299 1299 1299 1299 1299 1299
1299 1301 1299 1299 1299 1299 1299 1299
1299 2919 2919 2917 2922 2917 2919 2917
1299 2884 2885 2883 2881 2881 2884 2881
1299 1299 1299 1299 1301 1299 1301 1299
1299 1299 1299 1299 1299 1301 1299 1299
1299 2917 2922 2917 2921 2917 2924 2917
1299 2884 2885 2885 2884 2886 2881 2886
1299 1299 1299 1299 1299 1299 1299 1299
1299 1299 1299 1299 1299 1299 1299 1299
1299 2919 2919 2917 2917 2921 2919 2917
1299 2884 2886 2881 2881 2887 2881 2881
1299 1299 1299 1299 1299 1299 1299 1299
1299 1299 1299 1299 1299 1299 1299 1301
1299 2917 2922 2921 2919 2921 2919 2922
1299 2883 2881 2891 2885 2886 2881 2881
foxyshadis
18th November 2007, 03:28
bench_align.exe: 69.3 KB (71,050 bytes) md5: 468d4829941c44f93ad0a356a4704327
Intel Core Duo D930 2x3.0GHz
http://x264.nl/bench_align/bench_align_run1.txt
http://x264.nl/bench_align/bench_align_run2.txt
http://x264.nl/bench_align/bench_align_run3.txt
http://x264.nl/bench_align/bench_align_run4.txt
http://x264.nl/bench_align/bench_align_run5.txt
D930 would be Pentium D, but I do have a Core Duo I could put back in if you need it. I also emailed someone who can run Penryn benches, maybe they'll help.
Dark Shikari
18th November 2007, 03:30
D930 would be Pentium D, but I do have a Core Duo I could put back in if you need it. I also emailed someone who can run Penryn benches, maybe they'll help.That'll be very helpful; Intel claimed to improve the cache line performance on Penryn's, so it'll be nice to see by how much.
akupenguin
18th November 2007, 04:04
And now for the patch: x264_cachesplit.01.diff (http://akuvian.org/src/x264/x264_cachesplit.01.diff)
You all can do two things to verify the patch:
(*) Check x264's "using cpu capabilities" line. As per the results posted here, it should contain "Cache32" on a Pentium3, ViaC7, and whoever else has a 32byte cacheline (5 groups of large numbers per block in the original benchmark, as opposed to 3. Or run CPU-Z.). It should contain "Cache64" on other Intel cpus. AMD cpus shouldn't see any "Cache" entry. If anyone sees "Cache?", that warrants further investigation.
(*) Benchmark it relative to svn. Since it affects motion estimation, the difference should be most noticeable with umh, esa and/or large numbers of refs, but benchmarks with any settings are useful. The amount of benefit also depends on the video resolution: it helps the most if the frame width is already a multiple of a cacheline, whereas it has to pay a small penalty to pad other resolutions.
edit: I have been notified of some problems. Wait for an update before starting benchmarks.
burfadel
18th November 2007, 05:12
Just wondering, I got results for:
movq8
movq16
movdqu
iddqu
palignr
On the Core 2 Duo tests run so far, has iddqu come up? or is it just the core 2 duo results for iddqu not suitable?
Dark Shikari
18th November 2007, 05:13
Just wondering, I got results for:
movq8
movq16
movdqu
iddqu
palignr
On the Core 2 Duo tests run so far, has iddqu come up? or is it just the core 2 duo results for iddqu not suitable?The issue is that lddqu is supposed to solve the problem, but doesn't. That is, on Core 2s, it has the exact same cache line issue that movdqu has. However, on P4Es, it works correctly. The workaround for the cache line issue on Core 2s uses palignr, and on P4Es uses lddqu.
burfadel
18th November 2007, 05:21
Ah ok! I missed that you updated the programme, there's now:
movq8
psllq8
palignr8
movq16
movdqu
iddqu
pslldq
palignr16
I guess the other instruction results don't affect the purpose of the test?
akupenguin
18th November 2007, 08:33
x264_cachesplit.02.diff (http://akuvian.org/src/x264/x264_cachesplit.02.diff)
Now with cacheline size detection fixed for older cpus.
TEB
18th November 2007, 23:42
Aku, my bad, it was a typo,, its Intel Mobile Core 2 Duo T7300
MMX,SSE1,2,3,EM64T
AGDenton
19th November 2007, 17:38
akupenguin,
I'm getting "Invalid effective address" errors in the SAD16_CACHELINE_FUNC macro (line 500) from common/amd64/pixel-sse2.asm when PIC is enabled, with current yasm. I noticed it because I'm on x86_64-darwin, which makes PIC mandatory, but the same occurs on 64-bit Linux ; just try
yasm -f elf64 -D__PIC__ -m amd64 -DPREFIX -DHAVE_SSE3 -Icommon/amd64 -o common/amd64/pixel-sse2.o common/amd64/pixel-sse2.asm
Could you look into it ?
AG
akupenguin
19th November 2007, 19:36
x264_cachesplit.03.diff (http://akuvian.org/src/x264/x264_cachesplit.03.diff)
fixed pic.
AGDenton
19th November 2007, 21:10
I had to replace 'movd mm6, [sw_64]' by 'movd mm6, [sw_64 GLOBAL]' in SAD8_CACHELINE_FUNC to get past a "macho: sorry, cannot apply 32 bit absolute relocations in 64 bit mode" error (which is Darwin-specific)... Was that the right thing to do?
The rest compiled fine once I did that; however, running this build yields
x264 [info]: using cpu capabilities: MMX MMXEXT SSE SSE2 SSSE3
No mention of Cache64 or SSE3. I'm using Xeons 5160 (Core2, 3GHz). Is there anything I can do?
AG
akupenguin
19th November 2007, 21:56
what does cpuid.0.diff (http://akuvian.org/src/x264/cpuid.0.diff) print? (to be added on top of x264_cachesplit.03.diff)
AGDenton
19th November 2007, 22:28
Er... never mind... I messed up x264 and ./x264. That'll teach me... Now everything's working fine :
720x258 | threads 6 | ref 16 | me umh :
r694 : 19.1 fps
r694+cachesplit : 22.7 fps
That's quite a boost !
burfadel
19th November 2007, 23:02
Using 16 reference frames, the fast ref search I guess would make a nice speed boost on top of that again? don't know whether the patches are compatible though!...
Dark Shikari
19th November 2007, 23:47
Using 16 reference frames, the fast ref search I guess would make a nice speed boost on top of that again? don't know whether the patches are compatible though!...Completely compatible, any "incompatibility" would be simply a mismatch of source line numbers that would be easily correctable.
This patch is only a change to the ASM code, really.
salehin
19th November 2007, 23:56
The file has expired - can you please upload it again, Mr.Hunter :D
burfadel
20th November 2007, 00:15
Completely compatible, any "incompatibility" would be simply a mismatch of source line numbers that would be easily correctable.
This patch is only a change to the ASM code, really.
Thats good! a build with both patches (or when both patches are submitted to svn) should result in a very nice speed increase indeed!
bob0r
20th November 2007, 09:21
cache split available in x264 svn now:
------------------------------------------------------------------------
r696 | pengvado | 2007-11-20 07:07:17 +0100 (Tue, 20 Nov 2007) | 4 lines
avoid memory loads that span the border between two cachelines.
on core2 this makes x264_pixel_sad an average of 2x faster. other intel cpus gain various amounts. amd are unaffected.
overall speedup: 1-10%, depending on how much time is spent in fullpel motion estimation.
------------------------------------------------------------------------
r695 | pengvado | 2007-11-20 06:57:29 +0100 (Tue, 20 Nov 2007) | 2 lines
add cache info to cpu_detect. also print sse3.
Here is some nice info about it:
http://trac.videolan.org/x264/changeset/696
Core2 (Conroe) can load unaligned data just as quickly as aligned data...
unless the unaligned data spans the border between 2 cachelines, in which
case it's really slow. The exact numbers may differ, but all Intel cpus
have a large penalty for cacheline splits.
(8-byte alignment exactly half way between two cachelines is ok though.)
LDDQU was supposed to fix this, but it only works on Pentium 4.
So in the split case we load aligned data and explicitly perform the
alignment between registers. Like on archs that have only aligned loads,
except complicated by the fact that PALIGNR takes only an immediate, not
a variable alignment.
It is also possible to hoist the realignment to the macroblock level (keep
2 copies of the reference frame, offset by 32 bytes), but the extra memory
needed for that method makes it often slower.
sad 16x16 costs on Core2:
good offsets: 49 cycles (50/64 of all mvs)
cacheline split: 234 cycles (14/64 of all mvs. ammortized: +40 cycles)
page split: 3600 cycles (14/4096 of all mvs. ammortized: +11.5 cycles)
cache or page split with palignr: 57 cycles (ammortized: +2 cycles)
wiak
21st November 2007, 02:42
File has expired :(
wiak
8th April 2008, 19:09
Phenom 9850 Black Edition
bench_align_v28
nop: 683
movq8: avg 4612
385 386 384 387 385 385 386 384
385 494 491 490 492 457 457 454
376 376 377 377 377 376 382 379
379 391 391 390 391 391 391 391
377 377 378 383 376 376 377 377
382 393 393 391 390 392 392 392
384 382 379 382 377 378 383 378
384 390 392 389 392 389 390 391
377 381 384 380 380 384 384 381
378 393 391 391 395 392 392 390
383 383 383 378 376 376 383 384
378 392 393 392 391 392 392 392
381 379 381 379 384 383 384 384
383 394 395 396 393 394 392 391
377 383 383 378 377 382 379 377
377 444 444 445 445 446 445 445
387 384 383 384 383 386 387 386
385 491 489 490 489 457 457 458
psllq8: avg 5043
399 397 398 397 385 385 385 386
385 746 750 747 747 749 748 746
392 390 387 387 389 392 392 391
387 398 397 398 398 404 403 402
392 386 386 387 394 388 387 387
390 397 397 397 397 403 403 402
392 390 388 392 392 392 392 391
387 424 421 417 420 408 409 412
391 388 393 391 387 393 392 387
392 746 746 747 746 746 749 744
391 393 388 391 390 390 390 389
392 398 397 398 398 404 404 406
391 393 393 391 387 387 392 386
387 398 398 397 398 407 407 406
391 389 387 387 387 389 388 387
389 460 461 460 460 462 463 465
385 387 386 386 386 386 386 385
393 746 744 745 744 745 744 745
movq16: avg 7182
622 788 788 787 789 786 788 791
617 640 643 640 641 643 639 641
633 637 639 635 639 640 637 637
633 638 641 640 640 642 637 638
635 635 640 636 637 640 638 636
635 638 638 640 637 639 637 637
636 644 644 645 643 646 643 643
637 637 639 638 638 639 639 638
634 640 636 635 639 639 638 638
631 638 638 639 637 639 640 638
630 636 636 637 638 636 642 637
631 641 638 640 641 637 638 639
631 640 639 639 641 637 636 637
634 641 637 638 639 638 639 639
633 706 704 707 705 705 702 705
638 641 643 641 640 641 641 642
623 788 786 787 786 788 788 786
621 640 640 641 640 641 641 641
movdqu: avg 5286
368 508 508 507 507 507 507 507
508 508 507 510 507 508 508 507
358 458 457 457 458 457 458 457
457 460 458 457 457 458 458 457
357 457 457 457 457 457 457 457
457 457 458 459 458 457 457 457
358 458 458 459 459 459 459 461
459 459 459 459 459 458 458 458
358 462 462 463 462 462 462 462
461 462 463 462 462 462 462 462
360 459 459 458 458 461 458 459
460 458 458 458 460 458 458 459
359 457 460 457 457 457 457 457
457 457 457 458 457 458 459 457
358 480 481 478 479 477 477 477
477 477 477 477 477 476 477 477
366 507 507 507 507 507 507 508
507 507 508 507 507 507 507 507
lddqu: avg 5286
366 512 507 507 507 507 507 507
507 507 507 508 507 506 507 506
359 458 458 457 457 457 458 457
458 458 458 458 458 458 457 458
357 458 458 457 457 457 457 457
458 457 457 458 458 458 459 458
358 459 459 458 458 458 459 458
459 459 459 458 459 459 458 458
358 462 462 462 462 462 462 464
462 462 462 462 462 462 462 462
361 458 458 459 461 458 460 458
458 459 459 459 459 458 459 458
358 458 457 457 457 457 458 458
458 457 457 458 458 457 457 457
358 477 479 477 477 477 477 476
477 477 477 478 477 477 477 477
365 507 507 507 507 506 507 508
507 508 507 507 508 509 507 507
pslldq: avg 6840
382 1130 1131 1130 1130 1130 1129 1292
544 1130 1130 1131 1133 1130 1133 1130
377 477 478 501 478 477 477 477
477 477 477 477 478 476 476 476
377 477 476 477 477 502 477 477
477 479 477 496 477 477 476 476
377 473 474 474 473 474 474 473
473 473 473 473 474 472 473 477
378 1130 1130 1131 1131 1133 1110 1106
470 1107 1107 1105 1108 1109 1106 1105
378 477 478 479 503 489 477 477
477 477 478 478 478 476 476 478
378 476 477 476 476 476 476 475
476 476 476 477 476 476 476 478
378 522 522 523 523 523 523 523
524 523 523 523 522 520 520 520
380 1108 1106 1105 1107 1111 1105 1105
545 1107 1106 1108 1105 1107 1106 1108
Dark Shikari
8th April 2008, 19:11
Phenom 9850 Black Edition
bench_align_v28Not surprising.
Though this is a necro I guess, it would be useful to have a Penryn.
wiak
8th April 2008, 19:19
Not surprising.
Though this is a necro I guess, it would be useful to have a Penryn.
translate into readable non programing language or do we need a roseta stone? :p
Dark Shikari
8th April 2008, 19:22
translate into readable non programing language or do we need a roseta stone? :pA Penryn is the new Intel Core 2...
You called your Athlon a Phenom, so I figured you'd know what a Penryn is... :p
wiak
8th April 2008, 19:25
A Penryn is the new Intel Core 2...
You called your Athlon a Phenom, so I figured you'd know what a Penryn is... :p
athlon where?
Rodger
8th April 2008, 20:57
So where is this benchmark....
45nm Core2Duo E8400 is on it´s way to me ;)
To that I could deliver E6600 Result if needed.
lexor
8th April 2008, 21:28
A Penryn is the new Intel Core 2...
You called your Athlon a Phenom, so I figured you'd know what a Penryn is... :p
Phenom is Phenom, it's not a codename of Athlon64, so no he didn't. And in his further defense OP does ask for K10 tests (for which Phenom qualifies).
thetoof
9th April 2008, 07:18
It seems you don't have any tests on Quad-cores... I have a Intel® Core™2 Quad Processor Q6600/L2 Cache 8M/Clock Speed 2.40 GHz/Front Side Bus Speed 1066 MHz... want me to run the test? If yes, please upload it again since the file has expired.
Dark Shikari
9th April 2008, 07:20
It seems you don't have any tests on Quad-cores... I have a Intel® Core™2 Quad Processor Q6600/L2 Cache 8M/Clock Speed 2.40 GHz/Front Side Bus Speed 1066 MHz... want me to run the test? If yes, please upload it again since the file has expired.We don't care about cores; all we need now is a Penryn (any chip with SSE4).
Manao
9th April 2008, 07:40
And here's an hapertown (which is 45nm, so should count as a penryn with a cache on steroid) :
nop: 330
movq8: avg 7260
394 393 393 393 393 393 393 393
393 29393 29401 29399 29386 29402 29406 29402
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 2718 2720 2717 2717 2717 2727 2727
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 393 393 393 393 393 393 393
393 2717 2718 2717 2717 2717 2717 2722
psllq8: avg 4692
408 407 407 407 407 407 407 407
407 677 674 674 674 674 674 674
408 408 406 407 407 407 406 407
407 407 407 407 406 408 408 407
406 407 407 408 406 408 407 407
407 408 407 407 407 407 407 406
407 408 408 407 408 407 407 407
407 407 407 407 406 408 407 407
407 407 407 407 407 407 407 407
406 674 674 674 674 674 674 674
407 408 407 406 407 407 406 407
407 407 407 406 406 407 408 407
407 407 407 407 406 407 407 407
407 407 407 408 407 408 407 407
406 408 407 408 407 407 407 407
407 407 407 407 407 407 408 406
407 407 407 407 407 407 407 407
407 674 674 674 674 675 674 674
palignr8: avg 4639
409 409 410 410 407 407 407 407
407 615 615 615 615 615 615 615
407 406 407 406 407 408 407 406
407 407 407 407 408 408 407 407
407 408 407 407 407 407 407 407
407 407 407 408 406 407 408 407
408 408 407 407 408 407 407 407
408 407 407 407 407 407 408 407
407 407 407 407 407 407 407 407
407 615 623 628 628 628 628 628
408 407 407 406 407 407 406 407
407 407 407 408 407 408 407 407
407 407 407 407 407 407 407 407
408 407 407 406 407 408 407 408
407 406 407 407 406 407 407 408
408 407 407 407 408 406 408 407
407 408 407 407 407 407 407 407
407 622 628 628 628 628 628 628
movq16: avg 12934
715 29501 29495 29499 29495 29915 29516 29511
716 29462 29442 29454 29445 29447 29442 29824
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 716 715 715
715 715 715 715 715 715 715 715
715 2847 2815 2805 2804 2814 2836 2841
715 2789 2774 2783 2796 2795 2781 2782
715 715 715 715 715 715 715 715
715 715 715 715 713 715 715 715
715 715 715 715 716 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 716 715 715 715
715 2808 2820 2804 2803 2803 2804 2803
715 2746 2745 2745 2745 2745 2745 2745
movdqu: avg 10950
573 28604 28978 29437 29101 28640 28612 28623
565 28614 28633 29271 29407 28888 28635 28629
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 2374 2374 2374 2375 2374 2374 2374
573 2416 2416 2416 2417 2417 2416 2416
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 2374 2374 2374 2374 2374 2374 2374
573 2416 2416 2416 2416 2416 2416 2416
lddqu: avg 10945
578 28615 28618 28612 28624 28612 28620 28612
573 28620 28638 28628 28625 28617 28632 28614
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 2374 2374 2374 2374 2374 2374 2374
573 2437 2416 2439 2417 2416 2416 2416
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 573 573 573 573 573 573 573
573 2374 2374 2374 2374 2374 2374 2374
573 2416 2416 2416 2415 2417 2416 2417
pslldq: avg 6300
585 665 665 665 665 665 665 665
578 665 665 665 665 665 665 665
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 665 665 665 665 665 665 665
578 665 665 665 665 665 665 665
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 577 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 665 665 665 665 665 665 665
578 665 665 665 665 665 665 665
palignr16: avg 6191
578 615 615 615 615 615 615 615
578 615 615 615 615 615 615 615
576 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 615 615 615 615 615 615 615
578 615 615 615 615 615 615 615
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 577 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 615 615 615 615 615 615 615
578 615 615 615 615 615 615 615
Raere
10th April 2008, 03:47
[QUOTE=Manao;1123653]And here's an hapertown (which is 45nm, so should count as a penryn with a cache on steroid) :
FYI: Harpertown is the Xeon chip, whereas Penryn is the Core 2 chip. Close enough, though.
Manao
10th April 2008, 05:57
Beside the cache size, there are other differences between xeon and core 2 ?
Gabriel_Bouvigne
10th April 2008, 09:39
Beside the cache size, there are other differences between xeon and core 2 ?
the price :)
Selur
10th April 2008, 11:11
File is expired,..
could run a test with my Q9450 :)
Manao
10th April 2008, 18:29
http://akuvian.org/src/x264/bench_align_v28.7z
Rodger
10th April 2008, 20:51
Here is INTEL Core2Duo E8400 (Wolfdale) incl. SSE4.1
http://www.bilder-space.de/thumb/QH7RMput5DAEBYY.jpg (http://www.bilder-space.de/show.php?file=QH7RMput5DAEBYY.jpg)
Selur
11th April 2008, 08:56
Q9450 delivers:
nop: 329
movq8: avg 7259
397 396 396 396 396 396 396 396
396 29562 29426 29413 29384 29461 29432 29383
392 395 394 391 395 395 394 391
395 391 395 391 395 391 395 392
391 394 391 393 395 391 395 391
391 392 395 391 395 391 394 395
391 391 395 391 392 396 391 392
395 391 395 391 395 391 395 391
395 392 395 393 395 395 394 395
391 2717 2718 2717 2717 2717 2717 2717
391 391 395 395 391 393 391 391
395 395 391 391 395 395 391 394
395 391 395 395 391 391 395 395
392 391 391 393 392 391 395 395
391 395 394 391 391 395 395 391
391 395 395 391 391 395 395 391
395 395 394 395 394 395 394 395
395 2717 2717 2718 2717 2717 2717 2718
psllq8: avg 4697
409 407 407 407 407 407 407 407
407 679 679 679 679 679 679 679
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 408
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 679 679 679 679 679 679 679
407 407 407 407 407 407 407 407
407 407 407 407 407 408 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 679 679 679 679 679 679 679
palignr8: avg 4641
410 409 409 409 409 409 409 409
409 596 624 638 638 638 637 638
407 407 407 407 407 407 407 407
407 407 407 408 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
408 634 638 638 638 638 638 638
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 407 407 407 407 407 407 407
407 618 618 618 618 618 618 618
movq16: avg 12902
715 29516 29513 29507 29504 29504 29503 29509
715 29489 29435 29473 29461 29520 29456 29437
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 717 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 2803 2803 2803 2803 2803 2803 2803
715 2745 2744 2744 2745 2744 2744 2746
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 715 715 715
715 715 715 715 715 716 715 715
715 715 715 715 715 715 715 715
715 2804 2803 2804 2803 2810 2814 2816
715 2751 2757 2757 2755 2744 2744 2744
movdqu: avg 10943
575 28618 28616 28641 28585 28616 28719 28616
569 28656 28613 28609 28635 28627 28628 28628
575 567 567 567 567 567 575 575
568 575 575 575 575 575 575 575
575 567 575 575 575 575 567 575
575 575 575 575 575 575 567 575
575 568 575 574 575 568 575 575
575 575 575 575 575 574 575 567
575 2373 2373 2374 2374 2373 2373 2373
575 2415 2417 2416 2418 2415 2415 2415
575 575 571 571 575 575 575 567
575 575 574 567 575 575 575 575
575 575 572 575 575 575 567 575
575 575 572 575 575 567 575 575
575 575 568 575 575 575 567 575
575 575 567 575 575 575 567 575
575 2375 2373 2374 2373 2375 2374 2373
575 2415 2416 2415 2416 2416 2415 2416
lddqu: avg 10911
566 28622 28596 28596 28598 28596 28604 28616
567 28660 28584 28582 28583 28582 28583 28582
567 567 559 564 567 567 559 567
567 567 559 567 567 567 567 567
559 567 567 567 567 561 567 559
567 559 567 567 567 559 567 566
567 567 567 560 567 567 567 559
567 567 567 559 567 567 567 567
569 2374 2373 2373 2376 2375 2374 2373
575 2415 2416 2415 2428 2416 2415 2416
575 575 567 575 567 575 575 575
567 575 575 575 567 575 575 575
567 575 575 567 575 575 575 567
575 575 575 575 575 575 567 575
575 575 567 575 575 575 567 575
575 575 567 575 575 575 567 575
575 2373 2374 2373 2373 2373 2373 2374
566 2423 2416 2416 2416 2416 2417 2417
pslldq: avg 6308
581 668 668 668 668 668 668 668
578 668 668 668 668 668 668 668
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 579
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 668 668 668 668 668 668 668
578 668 668 668 668 668 668 668
578 578 578 578 578 578 578 578
578 578 578 579 579 579 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 579 579
578 671 663 671 671 671 663 663
578 671 671 671 664 669 666 679
palignr16: avg 6191
578 615 615 615 615 615 615 615
578 615 615 615 615 615 615 615
578 578 578 578 578 579 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 615 615 615 615 615 615 615
578 615 616 615 615 615 615 615
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 579 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 578 578 578 578 578 578 578
578 615 615 615 615 615 615 615
578 615 615 615 615 615 615 615
Cu Selur
gswudi
11th April 2008, 09:15
AMD Athlon 64 X2 Dual 4400+
nop: 99
movq8: avg 5283
458 667 667 667 667 667 667 667
443 542 542 542 542 542 542 543
444 497 497 497 497 497 497 497
436 497 497 497 498 497 497 497
447 498 498 498 498 498 498 498
438 499 499 498 497 497 497 497
447 516 516 516 516 516 516 516
436 497 498 497 497 497 497 497
457 669 669 669 669 669 669 669
443 544 544 544 544 543 543 543
444 497 497 497 497 498 499 499
437 497 497 497 498 499 499 499
443 501 501 501 501 499 498 498
436 497 497 497 497 497 497 497
436 516 516 516 517 532 534 534
470 517 517 517 517 517 517 517
481 667 669 669 669 669 669 669
436 543 543 542 543 543 543 544
psllq8: avg 5711
467 669 669 669 669 669 669 669
436 752 749 750 749 749 749 749
442 537 537 537 537 537 537 537
434 537 537 537 537 537 524 523
436 524 524 524 524 524 534 538
434 537 535 523 523 523 523 523
445 528 528 528 530 541 541 534
436 547 547 547 543 543 543 543
451 670 670 670 670 669 669 669
436 750 749 749 749 749 750 749
436 523 523 523 523 523 523 523
436 523 536 537 537 537 537 537
434 538 538 538 538 538 538 538
435 523 531 537 537 537 537 537
434 541 541 541 541 541 541 541
435 547 547 547 547 547 547 547
465 669 669 669 669 669 669 669
436 749 749 749 749 749 749 749
movq16: avg 8415
700 837 837 837 837 837 837 837
713 827 828 827 827 827 827 827
696 838 838 838 838 838 838 838
691 839 839 839 839 840 840 840
698 839 839 839 839 839 839 840
698 848 848 848 849 849 849 849
689 838 846 854 853 853 853 853
713 852 852 852 852 852 852 852
724 851 851 851 852 851 851 851
732 842 842 842 842 842 842 842
717 852 852 852 852 852 852 852
724 851 851 851 851 851 852 851
724 852 852 852 852 852 852 852
724 862 862 862 862 862 862 862
714 853 853 853 853 853 853 853
713 852 852 852 852 853 852 852
724 851 851 851 851 851 851 851
732 842 842 842 842 842 842 842
movdqu: avg 9475
711 1000 1000 1000 1000 1000 1000 1000
698 957 958 957 957 957 957 957
683 957 957 957 957 957 957 957
683 958 958 957 957 957 957 957
682 960 959 960 960 960 960 960
679 958 958 958 958 958 958 958
679 966 966 966 967 985 989 989
706 991 991 991 991 991 991 991
733 1018 1017 1017 1017 1006 1000 1002
697 957 957 957 957 957 973 979
708 979 979 979 979 980 979 979
708 979 979 979 979 979 973 957
683 960 960 959 959 959 960 959
683 957 957 957 958 958 958 958
679 966 966 966 966 966 966 966
679 972 971 971 971 971 971 971
711 1008 1008 1008 1008 1008 1008 1008
696 959 958 966 979 979 979 979
lddqu: avg 9595
710 1004 1000 1000 1000 1000 1000 1000
698 957 973 979 979 979 979 979
708 979 979 979 979 979 979 979
706 979 979 979 979 979 979 979
706 981 981 981 981 981 981 981
706 979 979 980 979 979 979 979
706 989 989 989 980 966 975 989
706 991 991 991 991 991 991 991
733 1019 1019 1019 1019 1019 1019 1019
721 979 979 979 979 979 979 979
706 979 979 979 979 980 979 960
683 957 958 958 958 957 958 958
679 960 960 960 960 960 967 981
706 979 979 979 979 979 979 979
706 989 989 989 989 989 989 989
708 991 991 991 991 991 991 991
733 1019 1019 1019 1017 1017 1017 1017
721 979 980 979 979 979 979 979
pslldq: avg 11189
706 1724 1715 1724 1715 1724 1715 1724
693 1716 1715 1724 1715 1724 1715 1726
693 971 971 974 976 976 976 976
693 971 971 971 971 971 971 971
693 975 978 978 978 978 978 979
692 977 990 995 995 995 995 994
719 1007 1007 1007 1007 1007 1007 1007
719 994 993 996 1001 1001 981 970
703 1724 1724 1724 1724 1724 1723 1715
693 1715 1715 1715 1716 1724 1722 1715
692 976 977 977 976 976 976 976
692 976 976 976 976 976 976 976
692 978 978 978 978 978 978 978
692 976 976 976 976 976 976 976
692 978 978 978 978 978 978 978
692 984 984 984 984 984 984 984
701 1715 1715 1735 1730 1730 1730 1730
720 1730 1735 1735 1730 1733 1739 1739
akupenguin
11th April 2008, 18:54
So we have determined that no code changes are needed for Penryn. AMD still doesn't have cacheline split issues, but Phenom does benefit from SSE2, so some new cpu detection is needed there.
Please, no more bench_align results until a new cpu architecture comes out :)
Yoshiyuki Blade
12th April 2008, 23:07
Phenom is Phenom, it's not a codename of Athlon64, so no he didn't. And in his further defense OP does ask for K10 tests (for which Phenom qualifies).
Yep the codename for Phenom is Barcelona, Agena, Winsor etc (AMD tends to use city names for some reason). Athlon pretty much got bumped away kinda like how Pentium did on the Intel side of things.
Penryn is basically the successor of the Core series (subdivided to Kentsfield for quads, Conroe for dual). The subdivisions of Penryn are Yorkfield (Quad core) and Wolfdale (Dual core). Damn these confusing codenames :D.
VempX
14th April 2008, 14:04
Core2Duo E6550 @ 490x7 3430MHz
nop: 660
movq8: avg 7875
409 407 407 408 408 408 408 406
393 39821 39840 39791 39958 39922 40100 39850
402 401 402 401 401 401 402 402
402 402 401 401 402 401 402 401
401 402 401 402 401 401 402 401
402 401 401 401 401 401 401 401
401 401 401 401 402 401 401 401
401 401 402 401 401 401 401 401
408 408 408 408 408 408 408 408
402 2730 2737 2726 2735 2727 2729 2732
401 401 401 401 401 401 401 401
401 401 401 401 401 401 401 401
402 401 401 402 401 401 401 401
401 401 401 401 401 401 402 401
401 401 401 401 401 401 401 401
402 402 401 402 402 402 402 402
408 408 408 407 408 407 407 407
402 2726 2749 2757 2754 2753 2763 2752
psllq8: avg 5094
417 415 415 415 415 415 415 415
414 680 680 683 683 681 680 684
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 415 414 414 414 414 414
414 414 414 414 414 414 414 414
414 415 414 414 414 414 414 415
414 414 414 414 414 414 414 414
415 415 414 414 414 414 414 414
414 681 683 681 684 682 681 681
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 683 683 682 682 685 684 685
palignr8: avg 5009
414 414 414 414 414 415 414 414
414 605 604 604 604 604 605 604
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
415 415 414 414 414 414 414 414
414 415 414 414 414 414 414 414
414 604 604 604 604 605 604 604
414 414 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 415 414 414 414 414 414 414
414 414 414 414 414 414 414 414
414 414 414 414 414 414 415 414
414 414 414 414 414 414 414 415
415 415 415 415 414 415 415 415
414 604 604 604 604 604 604 604
movq16: avg 13670
719 40473 40051 39972 40019 40426 40189 40017
717 39797 40033 39986 39891 40062 39928 39881
717 717 717 717 717 717 717 717
717 717 717 717 717 717 718 717
717 717 717 717 717 717 717 717
717 717 717 717 717 717 717 717
717 717 717 717 717 717 717 717
717 717 717 717 717 717 717 717
719 2823 2824 2844 2842 2855 2801 2805
717 2784 2773 2790 2771 2771 2768 2775
717 717 717 717 718 717 717 717
717 717 717 717 717 717 717 717
717 717 717 717 717 717 717 717
717 717 717 717 717 717 717 718
717 717 717 717 717 717 717 717
717 717 717 717 717 717 717 717
718 2833 2818 2823 2838 2856 2839 2840
717 2787 2807 2767 2770 2772 2781 2778
movdqu: avg 11707
570 39116 39085 39197 39089 39115 39038 39709
570 39207 39300 39179 39094 39134 39034 39439
570 570 570 570 570 570 570 570
570 571 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 2404 2378 2445 2434 2428 2415 2412
570 2454 2454 2466 2456 2460 2460 2461
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
571 570 570 570 570 570 571 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
571 2381 2409 2418 2420 2419 2415 2439
570 2480 2475 2478 2479 2473 2470 2433
lddqu: avg 11715
570 39327 39117 39236 39224 39075 39066 39146
570 39532 39225 39511 39316 39178 39194 39188
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 571 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 2419 2417 2417 2417 2389 2392 2427
570 2463 2458 2438 2430 2465 2460 2475
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 570 570 570 570 570 570 570
570 2436 2439 2439 2432 2434 2424 2425
570 2501 2473 2473 2474 2479 2489 2427
pslldq: avg 7049
600 850 825 825 830 827 825 827
585 847 825 825 825 882 809 808
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 586 585 585
585 859 827 829 828 840 813 828
585 849 822 825 825 842 825 825
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 586 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 853 825 825 825 853 828 825
585 849 825 825 825 826 827 831
palignr16: avg 6665
585 664 662 657 657 660 657 657
585 657 657 657 657 657 657 657
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 586
585 585 585 585 585 585 585 585
585 585 586 585 585 585 585 585
585 585 585 585 585 585 585 585
585 586 585 585 585 585 585 585
585 658 657 657 657 657 657 657
585 658 657 648 657 657 658 657
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 585 585 585 585 585 585 585
585 657 658 657 658 657 664 660
585 657 657 657 657 619 642 657
Inventive Software
14th April 2008, 17:08
So we have determined that no code changes are needed for Penryn. AMD still doesn't have cacheline split issues, but Phenom does benefit from SSE2, so some new cpu detection is needed there.
Please, no more bench_align results until a new cpu architecture comes out :)
:stupid:
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.