Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th December 2007, 23:16   #81  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Code:
int foo(int abc) {
  int def;
__asm {
    mov eax, abc  ;; abc is probably [ebp+8]
    mov def, eax  ;; def is probably [ebp-20]
Have you worked out how to get ASM listings from the compiler yet?

Always look at the ASM listing from the compiler, your __asm code must fit in with the code from the compiler. i.e. the whole code must be consistant.


And the magic word you need to reference a code label is offset
Code:
add     ebx, offset jumper
To find these magic words look in the ASM listing, write some C that will use the concept you need and see how the compiler does it.
IanB is offline   Reply With Quote
Old 16th December 2007, 23:32   #82  |  Link
gioowe
Registered User
 
Join Date: Jun 2007
Posts: 95
Quote:
Originally Posted by redfordxx View Post
but this would be slow, wouldn't it?
IIRC, in the AMD appendix there is written latency 1 for most of short jumps...but I assume it is only the evalulation of the condition and then the jump takes some time...

...same as memory reads... they announce latency 2 but they take much longer as sh0dan mentioned
The condition is calculated before. A jump only checks flags.

A correctly predicted jump or an unconditional jump has a latency of 2, assuming that the next instruction is in the L1 code cache. If it was mispredicted then the CPU has to flush all decoded instructions and start again. This takes about 42 cycles. The AMD processor (I don't care about Intel) has a branch prediction as follows: A conditional branch is assumed as non-taken the first time. The second time (address) it is assumed as the same as last time. All further times (address) it is assumed as the second time.
gioowe is offline   Reply With Quote
Old 17th December 2007, 19:38   #83  |  Link
foxyshadis
Angel of Night
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
Quote:
Originally Posted by IanB View Post
And the magic word you need to reference a code label is offset
Code:
add     ebx, offset jumper
To find these magic words look in the ASM listing, write some C that will use the concept you need and see how the compiler does it.
Thanks, I was scratching my head over that when I was trying to test and get the code actually working yesterday.
foxyshadis is offline   Reply With Quote
Old 17th December 2007, 20:32   #84  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Quote:
Originally Posted by IanB View Post
Have you worked out how to get ASM listings from the compiler yet?
Yeah, almost reading like a book;-)
Of course I am learning from that.

I read also one funny thing:the compiler translated my jnz to some other kind of conditional jump;-)

Quote:
Always look at the ASM listing from the compiler, your __asm code must fit in with the code from the compiler. i.e. the whole code must be consistant.
...having consistent code with the code from the compiler...this topic I probably leave to the horse for now...he has bigger head than me;-)
redfordxx is offline   Reply With Quote
Old 17th December 2007, 20:41   #85  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally Posted by redfordxx View Post
I read also one funny thing:the compiler translated my jnz to some other kind of conditional jump;-)
Well, jne (not equal) and jnz (not zero) are the same instruction, since for both you just check whether the zero bit in the flags register is set - maybe that's what happened?

np: Yello - Daily Disco (1980-1985: The New Mix In One Go)
__________________
now playing: [artist] - [track] ([album])
Leak is offline   Reply With Quote
Old 17th December 2007, 21:02   #86  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
@Fizick

I changed the title...d'you like it better?

First I borrowed your Bytes2Float routine and changed it into Bytes2Words...Then I completely replaced it with scalar asm and it was significant speedup...I will add MMX and then I post it...maybe it would be useful for you if it is possible change it back to float...
redfordxx is offline   Reply With Quote
Old 17th December 2007, 23:06   #87  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
i have also SSE optimized bytes to float routine in Vaguedenoiser
(code written by Kurosu though)
But it take very small percent of time.

When (if) you make fast MMX dct 16x16 (16x8, 8x16), I will try add it to MVTools.
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 18th December 2007, 19:19   #88  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Hello,

here is new area to explore for me:

Packed integer division on mmx registers... afaik it is not in MMX set, is there any common instruction set with divisions? Or it is only AMD specific when I have AthlonXP? Where should I focus my learning efforts?
redfordxx is offline   Reply With Quote
Old 18th December 2007, 19:57   #89  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
redfordxx: What is the C-equivalent of what you want to do?

MMX cannot do division, but you can use inverse multiply to achieve a division if your division is constant.

For example:
Code:
y = x / 5;
==
y = x * (256 / 5) / 256
==
y = (x * 51) >> 8
Increase 256 to any power of two to get better precision.
__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 18th December 2007, 21:13   #90  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
OK, I try it short and fast, coz this hotel internet in Brussels is constantly disconnecting me.
The mul-division is nice, I heard of it but didn't know what is it exactly...will use later...tnx
redfordxx is offline   Reply With Quote
Old 18th December 2007, 21:16   #91  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
My case:
Code:
a=IDCT(Quantize(DCT(x))
b=IDCT(Quantize(DCT(y))
c=a/b

I have
x=[0,65535]
y=[0,255]
a,b scaled to signed DW
c=[0,255] unsigned saturated
x,y are values from video...so definitely not constant...

So I see two options:
1)do it scalar
2)use other instruction set? There is no common packed integer division?

Last edited by redfordxx; 18th December 2007 at 21:18.
redfordxx is offline   Reply With Quote
Old 19th December 2007, 04:00   #92  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Code:
  unsigned short Reciprocal[65536]; // 65536/i

c=(a*Reciprocal[b])>>16;
Look for the extract word/insert word instructions pinsw/pextw (sp?) and pmulhw
IanB is offline   Reply With Quote
Old 19th December 2007, 08:48   #93  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
So, iiuc I should prepare some lookuptable for all numbers [0,65535]...
redfordxx is offline   Reply With Quote
Old 19th December 2007, 13:16   #94  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
In the general case, Yes, a 65K table. But if you know your data you can pull some tricks to increase accuracy or reduce the table size.
IanB is offline   Reply With Quote
Old 19th December 2007, 22:18   #95  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
It seems, that maybe all this insert/extract gymnastics takes so much time, that it is better to do in in normal scalar asm...

Or maybe I will do this part is scalar version for compatibility and then I'll do 3DNowPro version with normal division... however, I am not sure yet, whether 3DNowPro can on xmm do something so nice as PMADDWD
redfordxx is offline   Reply With Quote
Old 19th December 2007, 23:42   #96  |  Link
Sulik
Registered User
 
Join Date: Jan 2002
Location: San Jose, CA
Posts: 216
The most efficient way to achieve this is probably to temporarily convert the data to floating point and use single-precision floating-point SSE to perform the final division on 4 values at a time.
Sulik is offline   Reply With Quote
Old 20th December 2007, 02:41   #97  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Look at how many cycles all the division instructions take on all of the processors. In this many cycles you could almost rebuild the pyramids.

Do not do division unless there is no other way around the problem.

Post your existing code for this portion and I will see what can be done.
IanB is offline   Reply With Quote
Old 20th December 2007, 03:09   #98  |  Link
Sulik
Registered User
 
Join Date: Jan 2002
Location: San Jose, CA
Posts: 216
Not true. You should be able to issue a DIVPS instruction operating on 4 values with only ~20 cycle latency on a Core2 duo.
This should end up faster than 4 scalar lookup and avoids trashing L1.
Sulik is offline   Reply With Quote
Old 20th December 2007, 03:37   #99  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
The code I am thinking about would be 4 streams of 2 fast instructions plus a pmulhuw all up maybe 12 to 16 cycles on a Core2 and will do almost as well on most other CPU's

You also have to include the to and from float conversion as well to use the DIVPS.
IanB is offline   Reply With Quote
Old 23rd December 2007, 04:44   #100  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
OK, back to switch code and jumps:
jmp ecx is already working...
jg ecx does not...
is there any trick?
redfordxx is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:17.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.