Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 25th December 2010, 23:25   #41  |  Link
saint-francis
too much lurking
 
saint-francis's Avatar
 
Join Date: Sep 2006
Location: Valhalla
Posts: 668
DS have you have the ability to see results of x264 on the new SB? Is there going to be any gain? I'm not really sure what the point of this up coming SB is.
saint-francis is offline   Reply With Quote
Old 25th December 2010, 23:31   #42  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,275
Quote:
Originally Posted by saint-francis View Post
DS have you have the ability to see results of x264 on the new SB? Is there going to be any gain? I'm not really sure what the point of this up coming SB is.
AVX (Advanced Vector Extensions) should help x264, I guess - if new AVX-optimized assembly is written.

But AFAIK the first Sandy Bridge generation will only support AVX with 256-Bit registers, rather than the full 512-Bit. Still that's twice the size of the SSE registers.

Also you'll need Windows 7 with SP-1 to be able to use AVX. Or some recent Linux kernel
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 25th December 2010 at 23:34.
LoRd_MuldeR is offline   Reply With Quote
Old 26th December 2010, 00:52   #43  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,669
Quote:
Originally Posted by LoRd_MuldeR View Post
AVX (Advanced Vector Extensions) should help x264, I guess - if new AVX-optimized assembly is written.

But AFAIK the first Sandy Bridge generation will only support AVX with 256-Bit registers, rather than the full 512-Bit. Still that's twice the size of the SSE registers.

Also you'll need Windows 7 with SP-1 to be able to use AVX. Or some recent Linux kernel
I think D.S. said AVX wasn't going to be useful for x264

Quote:
Quote:
What about 256bit AVX? Each module can process only one at time (AFAIK) and x264 will surely support these new instructions (I hope eheh)
Float-only, thus a useless pile of tripe.
http://doom10.org/index.php?topic=514.0
poisondeathray is offline   Reply With Quote
Old 26th December 2010, 01:07   #44  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,275
Didn't know that AVX is FP-only. That's a pity...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊
LoRd_MuldeR is offline   Reply With Quote
Old 26th December 2010, 05:08   #45  |  Link
deadrats
Banned
 
Join Date: Oct 2010
Posts: 119
Quote:
Originally Posted by poisondeathray View Post
I think D.S. said AVX wasn't going to be useful for x264.
if i had a dollar for every time his darkness has said something that didn't make any sense i would be a rich man by now.

he says avx is "Float-only, thus a useless pile of tripe" yet what he fails to mention is that he could convert the code to floating point, there's nothing that says it must be integer based.

as a very simple example if you have the following code snippet:

for ( a = 1; a < 100001; a++ )
for ( b = 1; b < 100001; b++ )
{
ab = a * b;
}

and use the following variable declaration:

int a, b, ab;

you cause the above to be executed on the alu (integer unit), if however you do this:

float a, b, ab;

it's executed using the floating point registers.

depending on the compiler you can even do something like this:

_m128i a, b, ab;

and perform a scalar calculation using the sse registers (there's a bit more code required than just that, but you get the idea).

yes, it would be a lot of work to rewrite the code to take advantage of the new avx registers and it's contingent on gcc supporting the required assembler instructions (he could always spend the dough and buy a copy of intel's compiler, though he would also need a copy of visual c++), but there's nothing inherently integer based about the code (other than that's they way he wants it) and there's nothing really standing in his way from changing it to take advantage of the sandy bridge's capabilities. (<--in all fairness, if he did do this he would need to maintain 2 versions of x264, one for cpu's that support avx and one for those that don't and he may not be willing to do that).
deadrats is offline   Reply With Quote
Old 26th December 2010, 05:12   #46  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by deadrats View Post
he says avx is "Float-only, thus a useless pile of tripe" yet what he fails to mention is that he could convert the code to floating point, there's nothing that says it must be integer based.
Of course. You could convert it to floating point, and it would be at least 2 times slower, even with AVX, because you can only fit 8 floating point values in an AVX register, compared to 16 pixel values in an SSE register. It might be worthwhile if the AVX registers were 512-bit or 1024-bit, but at only 256-bit, it's not enough to justify a switch. There's also the additional problem of floating point instructions generally being slower than integer instructions, but even ignoring that, it's probably not worth it.

Anyways, since you're such a genius, write me a 16x16 SAD function that uses AVX and floating point input and runs in under 35 clock cycles (the speed of the SSE implementation on a Core i7). Here's the code for the C (with uint8_t converted to float):

Code:
static float sad_16x16( float *pix1, int stride_pix1, float *pix2, int stride_pix2 )
{
    float sum = 0;
    for( int y = 0; y < 16; y++ )
    {
        for( int x = 0; x < 16; x++ )
            sum += fabs( pix1[x] - pix2[x] );
        pix1 += stride_pix1;
        pix2 += stride_pix2;
    }
    return sum;
}
I'll make it easy on you; you can assume the inputs are aligned (they aren't in reality).

Last edited by Dark Shikari; 26th December 2010 at 05:34.
Dark Shikari is offline   Reply With Quote
Old 26th December 2010, 16:41   #47  |  Link
Sharktooth
Mr. Sandman
 
Sharktooth's Avatar
 
Join Date: Sep 2003
Location: Haddonfield, IL
Posts: 11,768
pwned...
FP math is ALWAYS slower than INT math, unless your have a CPU with ridiculously big FP registers.
also FP math leads to precision problems over time... unless you use a ridiculously high FP precision...
that said, INT math is a way better solution.
Sharktooth is offline   Reply With Quote
Old 26th December 2010, 16:55   #48  |  Link
deadrats
Banned
 
Join Date: Oct 2010
Posts: 119
Quote:
Originally Posted by Dark Shikari View Post
Anyways, since you're such a genius, write me a 16x16 SAD function that uses AVX and floating point input and runs in under 35 clock cycles (the speed of the SSE implementation on a Core i7). Here's the code for the C (with uint8_t converted to float)
i just realized you're using 8 bit int's, all this time i thought you were using 32 bit int's, now the various objections you have raised make sense.

as for the homework assignment, i will gladly admit that i can't do it, but i will throw you a bone and use an excuse that you are fond of: company x hasn't sufficiently documented technology y, so it's their fault not mine.

simply replace x with intel and y with avx.

and here's some more excuses: i don't own an avx enabled cpu and i don't have a compiler that supports that instruction set (i don't think gcc supports it yet).

but i'm man enough to admit that like most people i don't know how to code with avx instructions...yet.
deadrats is offline   Reply With Quote
Old 26th December 2010, 17:03   #49  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by deadrats View Post
i just realized you're using 8 bit int's, all this time i thought you were using 32 bit int's, now the various objections you have raised make sense.
Exactly -- that's the real power of integer SIMD -- the fact that you can get more throughput by using smaller data types.

Pixels are only 8-bit, and transform intermediates (as well as DCT coefficients) are only 16-bit. x264 makes very minimal use of anything larger than 16-bit.

Ironically, with integer SIMD, the variety of instructions available for 32-bit is actually rather lacking. For 16-bit, for example, you have pmulhw, pmullw, pmulhrsw, and pmaddwd for multiplication, providing a pretty good variety of instructions. For 32-bit, you basically only have pmulld and pmuldq -- the former of which is slow and SSE4-only, and the latter of which only does two multiplies, hardly justifying SIMD at all.
Dark Shikari is offline   Reply With Quote
Old 26th December 2010, 22:09   #50  |  Link
deadrats
Banned
 
Join Date: Oct 2010
Posts: 119
Quote:
Originally Posted by Dark Shikari View Post
Exactly -- that's the real power of integer SIMD -- the fact that you can get more throughput by using smaller data types.

Pixels are only 8-bit, and transform intermediates (as well as DCT coefficients) are only 16-bit. x264 makes very minimal use of anything larger than 16-bit.

Ironically, with integer SIMD, the variety of instructions available for 32-bit is actually rather lacking. For 16-bit, for example, you have pmulhw, pmullw, pmulhrsw, and pmaddwd for multiplication, providing a pretty good variety of instructions. For 32-bit, you basically only have pmulld and pmuldq -- the former of which is slow and SSE4-only, and the latter of which only does two multiplies, hardly justifying SIMD at all.
you know, it seems i owe you an apology, all this time i was reading various posts you made as well as some of the things you had written in your "diary of an x264 developer" and a lot of what you said struck me as absurd. now it all makes sense.

on the topic of gpu accelerated encoding, is one of the reasons you have claimed that for any given number of threads a cpu will be faster because with the cpu you can use 8 bit and 16 bit int's and with cuda (and it's brethren) you have to use 32 bit int's minimum?

is it also safe to assume that bulldozer, with it's 2 128 bit alu's per core, will be THE cpu to get for x264 encoding?

2 more quick questions: you recently signed a licensing agreement with pegasys and reading some of the press releases it seems that you created a parallel "commercial friendly" license under which you licenses x264 llc (that is the name of the commercial variant, is it not?). does this not violate the spirit, if not the letter, of the gpl?

i know many companies consider the gpl an "infectious" license, but doesn't the gpl explicitly forbid taking gpl'd code and making closed source? does it not also require that any derivative work also be gpl'd?

by creating a parallel licensing scheme haven't you a) created a derivative that's not gpl'd, b) opened the door for companies to create derivatives that are not gpl'd, c) opened the door for companies to close source the x264 code they license from you, d) and perhaps most importantly open the door for a company to make some simple changes and try and claim copyright to that, a claim that they could use to prevent you from making similar changes to the gpl'd version of x264?

lastly, i'm wondering what ide do you use during the development of x264, i'm assuming you use gcc to build the executables but do you use a front end like code blocks or dev-c++? also what, if any optimization options do you use? do you target any specific architecture, simply use -O3, a combination?

thanks.
deadrats is offline   Reply With Quote
Old 26th December 2010, 22:15   #51  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,275
Commercial usage is perfectly fine for GPL'd software. The license clearly says that you are allowed to use the software for any purpose, explicitly including commercial purposes.

Moreover commercial development/distribution and OpenSource are not necessarily contradictory. Just think about commercial Linux distributions, like RHEL.

Last but not least, the authors of x264 could decide to continue the development of their software under some CloseSource license at any time, because they own the copyright.

However they certainly do not have to do this in order to be able to license their software commercially. And there's absoloutely no indication of such a plan at this time.

(I think the "commercial" license of x264 is more related to patent issues and/or support contracts. Something that is important for companies who use x264 in their products)
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 26th December 2010 at 22:27.
LoRd_MuldeR is offline   Reply With Quote
Old 26th December 2010, 22:46   #52  |  Link
J_Darnley
Registered User
 
J_Darnley's Avatar
 
Join Date: May 2006
Posts: 957
The commercial license for x264 just frees companies from releasing all code they link with x264 as GPL. Each company that wishes to do this must obtain a license. It explicitly does not cover the AVC patent license. All useful code changes will be committed as GPL into x264 therefore letting everyone use them.

tl;dr LURK MOAR
__________________
x264 log explained || x264 deblocking how-to
preset -> tune -> user set options -> fast first pass -> profile -> level
Doom10 - Of course it's better, it's one more.
J_Darnley is offline   Reply With Quote
Old 26th December 2010, 22:59   #53  |  Link
deadrats
Banned
 
Join Date: Oct 2010
Posts: 119
Quote:
Originally Posted by LoRd_MuldeR View Post
Last but not least, the authors of x264 could decide to continue the development of their software under some CloseSource license at any time, because they own the copyright.
see i'm not entirely sure that they do. as i understand it, x264 was originally created by someone else and then the two lead developers took ever and continued development. it's also my understanding that x264 was originally released under the gpl and the terms of the gpl dictate that any derivative work also be gpl'd.

lastly, my view of gpl'd software has always been that once it's gpl'd it's the same as being put into the public domain, copyright laws do not allow one to take something out of the public domain, not even whoever put it there in the first place.

http://www.gnu.org/licenses/gpl.html

i interpret the gpl to mean that you are not permitted to take a gpl'd product and release it under an alternate licensing scheme, not even if you're the person who gpl'd it in the first place.

i'm interested in hearing DS' take on this...
deadrats is offline   Reply With Quote
Old 26th December 2010, 23:04   #54  |  Link
kieranrk
Registered User
 
Join Date: Jun 2009
Location: London, United Kingdom
Posts: 707
Copyright owners can license code however they like. All the copyright holders were contacted and agreed to the commercial licence.
kieranrk is offline   Reply With Quote
Old 26th December 2010, 23:54   #55  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,275
Quote:
Originally Posted by deadrats View Post
see i'm not entirely sure that they do. as i understand it, x264 was originally created by someone else and then the two lead developers took ever and continued development. it's also my understanding that x264 was originally released under the gpl and the terms of the gpl dictate that any derivative work also be gpl'd.

lastly, my view of gpl'd software has always been that once it's gpl'd it's the same as being put into the public domain, copyright laws do not allow one to take something out of the public domain, not even whoever put it there in the first place.

http://www.gnu.org/licenses/gpl.html

i interpret the gpl to mean that you are not permitted to take a gpl'd product and release it under an alternate licensing scheme, not even if you're the person who gpl'd it in the first place.

i'm interested in hearing DS' take on this...
If all copyright owners, i.e. all people who contributed code (that is still used), agree, then the code can be released under a different license, of course. This still applies, even if the software was released under the GPL before. So the authors could decide to move to a CloseSource license, if they wanted to. Still in that case all the code that had previously been released under GPL would remain under the GPL. So even if the "main" development would continue under a ClosedSource license, anybody could simply take the last GPL'd version of the software and continue using it as-is. You would even be free to start your own development (fork) from that version - under the restrictions of the GPL. And to make this clear again: Offering a software under a commercial license doesn't require or imply that the software is going to be CloseSource. And currently there is absolutely no indication that the x264 developers have any plans for moving to a CloseSource license. So what is the point of all this off-topic discussion please ???
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 27th December 2010 at 00:02.
LoRd_MuldeR is offline   Reply With Quote
Old 27th December 2010, 02:48   #56  |  Link
b66pak
Registered User
 
b66pak's Avatar
 
Join Date: Aug 2008
Location: The Land Of Dracula (Romania - EU)
Posts: 934
when money was invented the platonic love was gone...so anything is possible...
_
__________________
if you ask a question and somebody give you the correct answer don't forget to leave a "thank you" note...
Visit The Land Of Dracula (Romania - EU)!
b66pak is offline   Reply With Quote
Old 27th December 2010, 03:03   #57  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by deadrats View Post
you know, it seems i owe you an apology, all this time i was reading various posts you made as well as some of the things you had written in your "diary of an x264 developer" and a lot of what you said struck me as absurd. now it all makes sense.
All is forgiven

Quote:
Originally Posted by deadrats View Post
on the topic of gpu accelerated encoding, is one of the reasons you have claimed that for any given number of threads a cpu will be faster because with the cpu you can use 8 bit and 16 bit int's and with cuda (and it's brethren) you have to use 32 bit int's minimum?
This is one of the reasons that GPUs tend to have less advantage than one would otherwise expect. The real problems are more obnoxious than that; this is just a reason why the raw performance gain is lower.

Note some of this has changed recently; ATI added an instruction to do a SAD of 4 8-bit integers, for example.

Quote:
Originally Posted by deadrats View Post
is it also safe to assume that bulldozer, with it's 2 128 bit alu's per core, will be THE cpu to get for x264 encoding?
"2 128-bit ALUs per core" doesn't seem to be a very big selling point to me, considering that the i7 had 3!

Quote:
Originally Posted by deadrats View Post
2 more quick questions: you recently signed a licensing agreement with pegasys and reading some of the press releases it seems that you created a parallel "commercial friendly" license under which you licenses x264 llc (that is the name of the commercial variant, is it not?). does this not violate the spirit, if not the letter, of the gpl?

i know many companies consider the gpl an "infectious" license, but doesn't the gpl explicitly forbid taking gpl'd code and making closed source? does it not also require that any derivative work also be gpl'd?

by creating a parallel licensing scheme haven't you a) created a derivative that's not gpl'd, b) opened the door for companies to create derivatives that are not gpl'd, c) opened the door for companies to close source the x264 code they license from you, d) and perhaps most importantly open the door for a company to make some simple changes and try and claim copyright to that, a claim that they could use to prevent you from making similar changes to the gpl'd version of x264?
1. The GPL is "infectious" in that if you link GPL software to any software, that software must become GPL.

2. Any copyright holder is free to release their work under any license. Releasing something as GPL does not mean you can't release it as something else too. Many popular software programs are available under multiple licenses: a popular example is Firefox, which I recall is triple-licensed. A popular example of commercially-licensed GPL software is MySQL.

3. Companies are required under our license to (if we ask) give us all of their changes to x264 back to us. Furthermore, they sign over their rights to those changes -- we get co-ownership of them, allowing us to do whatever we want with them -- including release them as GPL along with the rest of x264. This means there won't be proprietary forks. I would not have gotten agreement from the other developers without this promise -- nor would I have supported the plan myself. This is why I consider it in the spirit of the GPL: it still ensures that all improvements make it back to the community, which is what the GPL is really all about.

Note there may be patches we don't release, but only because we don't consider them useful. If someone asks, we'll probably still be happy to go get it anyways. An example is a patch that adds UTF-16 path support for statsfiles, something I consider utterly useless.

Quote:
Originally Posted by deadrats View Post
lastly, i'm wondering what ide do you use during the development of x264, i'm assuming you use gcc to build the executables but do you use a front end like code blocks or dev-c++? also what, if any optimization options do you use? do you target any specific architecture, simply use -O3, a combination?

thanks.
Notepad++ is my "IDE". Optimization options are just the default x264 builds with.

Last edited by Dark Shikari; 27th December 2010 at 03:07.
Dark Shikari is offline   Reply With Quote
Old 27th December 2010, 04:48   #58  |  Link
deadrats
Banned
 
Join Date: Oct 2010
Posts: 119
Quote:
Originally Posted by Dark Shikari View Post
"2 128-bit ALUs per core" doesn't seem to be a very big selling point to me, considering that the i7 had 3!
the core i7 has three 128 bit alu's?!? are you sure? i know the P4 had 3 alu's, 2 were double pumped, 1 ran at cpu clock speed and the single pumped one was the one that handled the boolean decisions, such as if/else, case/switch and the like, the double pumped alu's were strictly for math purposes.

i know that starting with the core 2 intel went to a single cycle sse engine and a 4 wide architecture but i thought the biggest difference that the core i7 brought, other than the cache improvements, was that it extended the core 2's ability to fuse 32 bit instructions and treat them as one to 64 bit instructions, i never heard anything about it having 3 128 bit alu's.

to hear amd say it bulldozer's 128 bit alu's are something never before seen in a desktop cpu.

Quote:
Notepad++ is my "IDE". Optimization options are just the default x264 builds with.
notepad++, huh? that's pretty hard core, kind of like my old linux/unix instructors used make us write our code in vi and point gcc to the saved .c file, lol.

this is going to sound like an amateur question but it's been a while since i built a project like x264 without using make on a linux system, how would i go about building x264 on a vista system just with gcc? i want to run a couple of experiments with various optimization options, just to see what kind of speed up, if any, is possible.

i'm also thinking of using c to pascal, c to fortran and c to basic translators to port the code over to the respective languages, so that i may see a) what it would look like in said languages and b) what the relative performance of a good pascal, fortran and basic compiler would be in relation to gcc.
deadrats is offline   Reply With Quote
Old 27th December 2010, 05:16   #59  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by deadrats View Post
the core i7 has three 128 bit alu's?!? are you sure? i know the P4 had 3 alu's, 2 were double pumped, 1 ran at cpu clock speed and the single pumped one was the one that handled the boolean decisions, such as if/else, case/switch and the like, the double pumped alu's were strictly for math purposes.

i know that starting with the core 2 intel went to a single cycle sse engine and a 4 wide architecture but i thought the biggest difference that the core i7 brought, other than the cache improvements, was that it extended the core 2's ability to fuse 32 bit instructions and treat them as one to 64 bit instructions, i never heard anything about it having 3 128 bit alu's.

to hear amd say it bulldozer's 128 bit alu's are something never before seen in a desktop cpu.
The Core 2 and Core i7 have three arithmetic units: execution units 0, 1, and 5. On the Core i7, 0 and 5 can do SIMD add/sub/shuffle and 1 can do SIMD multiplies. All three can do bitmath and all three can do most scalar operations. I omitted some things they can do (float, etc) that I don't care about for simplicity. Check Agner's site for more details.

Quote:
Originally Posted by deadrats View Post
this is going to sound like an amateur question but it's been a while since i built a project like x264 without using make on a linux system, how would i go about building x264 on a vista system just with gcc? i want to run a couple of experiments with various optimization options, just to see what kind of speed up, if any, is possible.
Download Cygwin or MinGW, use make. You won't build x264 without a configure/make script combo.
Dark Shikari is offline   Reply With Quote
Old 27th December 2010, 13:00   #60  |  Link
imcold
pencil artist
 
imcold's Avatar
 
Join Date: Jan 2006
Posts: 202
Quote:
Originally Posted by deadrats View Post
i'm also thinking of using c to pascal, c to fortran and c to basic translators to port the code over to the respective languages, so that i may see a) what it would look like in said languages and b) what the relative performance of a good pascal, fortran and basic compiler would be in relation to gcc.
I doubt you'll go far just by translator, but if you want to see how an encoder in it would "look like" in pascal, there is an encoder written in it already: fevh264. Coding style is similar, but compared to x264 it's just a toy. As for relative performance, from my exp.: mpeg1-like decoder (with some assembly) translated from pascal to c was ~5-10% faster; gcc vs. Freepascal/fpc, 32bits. In case of x264, it would be worse (fpc can't inline assembly funcs etc.).
__________________
fevh264 - open-source baseline h.264 encoder
imcold is offline   Reply With Quote
Reply

Tags
media engine, x.264

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:28.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.