PDA

View Full Version : H.264 & Hardware Solutions Beyond Just x86 for Greater Speed


AZCoder1
4th June 2005, 21:59
Has anyone considered porting the open source AVC to the Blackfin, which runs a fairly nice Linux port already ( http://blackfin.uclinux.org/ )? It would seem, to a newbie like me, a way to get to some real time performance from this codec, which seems like it is a long way off under any of the current X86 or similar platforms that folks on this forum are discussing. There is some neat open source hardware with this Blackfin port as well, and if there were a decent h.264 codec to with that...mmm...could be lots of fun. They seem to be able to hit 30 fps in the attached ref camera design.

AZCoder1
6th July 2005, 21:25
Guess this must must be a hard question or a dumb one, not sure which.

:confused:

Latexxx
6th July 2005, 23:10
The problem with Blackfin and such is that if you just throw some generic code to it you don't see any speed boost, and in fact, I suspect that Blackfin isn't able to run generic video encoding code at acceptable speed. If you want to get some boost using this kind of dedicated hardware, you need to optimise your code by rewriting most of it using assembly and hardware specific instructions. This leads us to the problem that nobody has motivation to do this as such devices aren't generally available and porting takes great amounts of time.

AZCoder1
16th July 2005, 07:02
Your points are very well taken, I am very new at this, so I am sure my questions are not all that well directed. I hear you saying that the current world of DSP/FPGA/ASIC speed-up simply involvers too much one-time coding, and is not portable easily to platforms with wide availability, such as the x86 world.

Makes sense.

However, it would seem to be the all this talk of speeding up h.264 2 pass encodes to 13 fps by throwing dual water-cooled xenon’s at it seems like a architectural balance problem between hardware and software, i.e., too much of the problem is being solved using x86 logic, and more of it needs to be solved using ASIC / FPGA or DSP hardware better suited to the task. There must be a better way that does not involve a major redo of code or of the whole platform. We should not need a 400+ Watt power supply to crunch video. Guys are starting to do this in cell phones now in Japan, I am told.

The Open Source Cheap Coprocessor: Here are two thoughts - I know that it is quite possible to build a very low cost generic dsp coprocessor device, either as a capture agent, or even a usb 2.0-dongle, for well under $100. For people serious about real time h.264 work, I suspect such hardware could have a wide interest, especially if it was open source hardware. But I do not know if such a "coprocessor" approach, where a small number of the most processor-intensive functions are moved away from the main CPU, would make sense, because I do not have a feel for how atomic the codec truly can be made. I can imagine some other cool apps that could benefit from a generic dsp module, such as software define radio, GPS, fingerprint recognition, etc. But video encoding is the one that might interest me enough to design such a board, if it made technical sense.

The hybrid processor that dines on native gcc code: Perhaps a second and even wilder approach would be to do a Linux port to the RICA system (http://www.spiralgateway.com/products/rica.html) which can provide dsp-like performance using just gcc compiler output (or so claims its maker, Spiral Gateway).

Comments on either of these two half baked notions?

bill_baroud
16th July 2005, 18:59
usb, even 2.0, is way too slow, think about the amount of data that you need to transfer to the "special" device.
Go buy a pci-express FPGA dev kit (2000-5000$ with a virtex II), learn vhdl and implement dct+me on it ;)

AZCoder1
17th July 2005, 04:51
Bill, I am sure you are right about all this “need for speed”; I am certainly no hardware engineer. But one or two things continue to make me wonder.

For one, I have had some good luck with a real time hardware-based MPEG4- 2 encoder from Plextor, which hits 30+ fps at D1, and 120 at CIF, and it interfaces via USB 2.0. I am sure the wis 7007 chip they use is a rather special ASIC, but somehow it would seem that a subsystem based at 480 Mits USB 2(or 1Gbit Ethernet) two common and inexpensive standards, might be something cheap and easy to work with, and still be perhaps mated with more generic FGPA or DSP devices that could accommodate code updates over time. PCIe is all well and good, but still a bit rare, I fear.

That said, if I can find a widely accepted hardware model, which could be an PCIe FGPA, for all I know, I would love to help scare up some funding to build a hardware widget that could be put out into the open source hardware world. Just need to have a design concept flexible enough to meet lots of needs, so that it can be built in enough volume to get wide adoption at a sub $100 price point for a bunch of applications.

I have seen a paper or two that indicates general purpose DSP's might work well for h.264. Here is one example: http://www.da.isy.liu.se/pubs/diwu/diwu-ssocc2005.pdf.

bill_baroud
18th July 2005, 09:25
well AFAIK, that plextor encoder take analog as input and output a compressed mpeg stream ... in this case, usb 2.0 is fast enough to send the data to the computer, it would even let you stream compressed HD...
But if you think of your device as a "coprocessor", you need to send some data to it and retrieve some other data, which in case of video compression is bandwith consuming. Although a card on a PCI bus could already help... but don't think you can get something in low quantities for 100$ .....

AZCoder1
19th July 2005, 07:39
I am sure you are right, as I know so little about this stuff. But it would seem that with some pretty beefy TI DSP's down to $20, an interesting device could be built.

BTW, if you are mot hot on USB 2.0, how about 1GB ethernet as a medium?



"TMS320C64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $19.95. In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing. " (http://dspvillage.ti.com/docs/catalog/dspplatform/overview.jhtml?templateId=5154&path=templatedata/cm/dspovw/data/c6000_ovw)"

bill_baroud
19th July 2005, 09:52
The problem is not the chip price (you could go for fpga too, that would be more general purpose, if you have the skills that it ;) ), the problem is to put it on a PCB .... you need at least a 4-layers design etc, and that won't cost you 20$ (which is a price per 10.000 units btw...) and you won't build it in your garage ;)

AZCoder1
19th July 2005, 16:57
Bill - Although you are right that you can spend a lot doing a pcb, these days the vast amount of that effort is design labor, not production cost.

I can go right now to pcbexpress, which provides free layout software, and get small boards for $17 in lots as low as 25 boards in 4 layers, and that assuming 3 day turn. They don't even charge tooling. If I am willing to be patient and order in 100 lots, we can get that down even more, even with some bells and whistles. (http://www.pcbexpress.com/products/express4.php#pricing). And of course, if we are real patient and spin boards offshore, it goes way down.

Then assembly…if we keep things to machine pick and place, with a small component count, all the cost is in first time machine setup, and that costs is in the $500 range for a board like this, perhaps. Actually, placement, $5 or so per board (proto qty's), small run, $1 or less large run, is a good average, excluding setup, from my experience for small boards. The passives these days in 0402 or less, tape and reel, are basically free. The other components that drive cost might be some board-based ram, and any special connectors, and if we stick to mainly to small chunks of SRAM and 1G Ethernet, those are not too costly. This is what makes me think we could hit $100 these days even on a small qty board, if one is not interested in profit, and did it as an open source thing, and got some help from the rest of the world on form and function.

The real cost is in design and debug of the board (most of which is really and software exercise these days) and then in getting enough coders to make use of it so that the exercise is meaningful. And it would seem to me if such a widget could 4x the speed of this x264 codec, it might generate some interest, but I really don't know what turns people on, hence the trail balloon

bill_baroud
20th July 2005, 09:20
didn't know that website ... the reference i had is the prototype board they build where i work, and it looks damn expensive from what i see now.
But well yeah the problem is, out of the development cost, you still need an electronics engineer willing to work on a design like this... and i don't know any :/

AZCoder1
21st July 2005, 15:52
It is certainly possible to spend a lot of money doing boards. However, in the few I have done (though I am not an EE, I play one sometimes) there are some ways to dramatically lower cost (something a lot of EE’s don’t really want to talk about much). The easiest is just to get either open source reference designs or reference designs from the chip makers directly. And I do know a lot of EE's just now (some with time on their hands) it would be good to get this group more involved in the open source community. Let me see if I can dig up some links and data to demonstrate these points better.

Here's one interesting example: http://sbc.twibright.com/. This guy has done an nice SBC with a Linux port, and including an FPGA on the board. He has taken everything needed to build the board, schematic, gerbers, drills, netlist/bom, stencils, etc, and GPL'd 'em all He even includes costs for most of the little passive components, which if I am doing my currency conversion right, adds up to about $9 per board (before FGPA and ARM cost). Anyway, this board building stuff can be done for reasonably low cost, especially if enough people who really need it on the firmware and application software side get interested. And this x264 codec is brilliant but man; but will a little hardware magic applied, it would really rock in real time.

Even some of the commercial players like Ateme seem to be using FGPA-type co-processing solutions, such as http://www.ateme.com/products/h264_5.php. Not sure if their model is good or not (the 3 way split) but it shows the possible application of co-processing technology to the problems associated with this difficult algorithm, If, as you say, a way can be found to make it very cost effective to acquire good hardware.

Here is one firm that is out to make it inexpensive to get into FPGA: XESS. Their basis board for Spartan II, with 50,000 gates and 8 MB of RAM, is $59, and that includes the dev tools, http://www.xess.com/prod027.php3. For $199, the same outfit can do 1M gates on a dev board, also full a full software dev environment and a 70+ pin header, out of which, in software, one could build some pretty slick i/o, I would bet.

tellman
17th August 2005, 18:10
have you looked at: http://www.opencores.org/projects.cgi/web/video_systems/overview