Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
5th November 2006, 09:19 | #1 | Link |
Registered User
Join Date: Nov 2003
Location: San Diego, CA
Posts: 325
|
x264farm: distributed encoding
Well, after about a year of working on this, let's see how it works in the real world!
I have been making a parallel encoder for x264 which can be distributed across many computers in a network. I have not done any serious benchmarking for it, but it is faster than running it on 1 computer. It was designed to be extremely portable; the only things you need on most of the computers is the agent program (supplied) and x264. All the AVIsynth processing is done by a central computer. I have personally tested it running on a Windows x64 computer, a 64-bit Linux box, and an Intel Mac Mini. The idea is similar to tobias's ELDER, but designed from the ground up to be used on the network. It also supports fun stuff like resuming encodes (sometimes with a big penalty, but it's better than starting over) A few notes:
Eh... I guess that's all for the announcement. Have fun! Readme DOWNLOAD VERSION 1.15:
You also need the following:
CHANGELOG: 1.15 (2007-10-26): Added ad-hoc agent discovery (although disabled by default). Agents may be added while the encoding is running, even if they are not specified in the controller's config.xml file.1.14 (2007-10-03): Fixed an inconsistency with running x264 with spaces in the path.1.13 (2007-07-03): Added the --batchmult option to help with one agent checking out a big job at the end and making everything else wait for it.1.12 (2007-05-30): Added a heartbeat thread to each agent when agent-based encoding starts. Agents will send a signal to the controller every 10 seconds. If the controller does not receive a signal within 30 seconds, that agent is given up for dead. This should greatly reduce on the amount of stalled encodes.1.11 (2007-04-28): Fixed an overflow in the Matroska timecode calculation.1.10 (2007-04-25): Implemented per-pass compression, so that the first and second passes have different compression priorities1.09 (2007-03-10): Agent now deletes temp files which are older than a specifiied amount of time (by default 1 week).1.08 (2007-02-07): Redesigned first pass. It should pick jobs more intelligently. The occasional problem with credits taking a very long time to encode is minimized.1.07 (2007-01-05): Fixed an issue with the controller computer rejecting network connections after an hour or so1.06 (2006-12-25): 2nd pass resuming works again (sorry, guys! )1.05 (2006-12-12): Added optional compression for controller-based encodes. Currently extremely slow, though, in order to check for validity.1.04 (2006-11-24): Made agent-based encoding.1.03-152 (2006-11-20): Fixed an error when the number of frames per second was an integer.1.02 (2006-11-17): MAJOR rewrite that honestly shouldn't have needed to happen.1.01 (2006-11-09): Made the --rerc option in order to change the re-ratecontrol frequency during the third pass.1.00 (2006-11-04): Initial release. Rampant bugs and programmer hacks predominate. WHEN REPLYING: If you post really long chunks of text (like from out_dump.txt), use the [ code ] tag and it will make it scrollable. That will prevent the page from overflowing with a bunch of verbose garbage that x264farm likes to output
__________________
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2 Last edited by omion; 29th October 2007 at 23:50. |
5th November 2006, 15:26 | #3 | Link |
Registered User
Join Date: Aug 2004
Posts: 211
|
Wow, thanks for your effort in distributed encoding!
I gave it a quick try and didn't get it working. The agent throws an error: 2006-11-05~15:12:13.01 Doing the following: x264.exe --fps 10000000/333667 --pass 1 --stats "C:\agent-win32\tmp\0 893 969666.txt" -o NUL - 720x480 2006-11-05~15:12:14.34 Agent croaked with exception "Sys_error(\"Broken pipe\")"; restarting 2006-11-05~15:12:14.35 DOING! This is the controller config.xml <config> <temp>C:\controller-win32\tmp</temp> <agents> <agent name="local"> <ip>192.168.0.1:50722</ip> </agent> </agents> </config> And the agent config.xml: <config> <temp>C:\agent-win32\tmp</temp> <!-- <affinity>01</affinity>--> <port>50722</port> <x264>x264.exe</x264> <nice>00</nice> </config> starting the controller with this cmd controller.exe -b 1500kbps --avs test.avs -o test.mkv I'm using the latest x264 build r598A If you need more information, just tell me. |
5th November 2006, 15:52 | #4 | Link |
Retired, but still around
Join Date: Oct 2001
Location: Lone Star
Posts: 3,058
|
Sounds exciting. Suggest you, or someone else consider running through a hypothetical setup using 2 or 3 windows XP PCs. PC-master, PC-slave1, PC-slave 2 in a step 1,2,3 format
Once I can get it working, perhaps I could do a how-to that the average Joe-tooluser like me could follow. My set up is 7 XP machine on a gig backbone, with the master being a OC'ed 4.5 Gh Intel D930, slave1 - hyper-threaded 3.2 P4, slave 2 - Athlon XP 3200, plus various others along those lines. I'm happy to help others if you can help me get going.
__________________
How to Optimize Bitrate for CCE multipass |
5th November 2006, 18:01 | #5 | Link | |
Registered User
Join Date: Nov 2003
Location: San Diego, CA
Posts: 325
|
Quote:
I forgot to mention one thing: x264 compiled for Windows with gcc won't work currently. I made a build with MSVS that should solve your problem. There's nothing special about my build, it just uses straight SVN. You can get build 598 here @crypto: Basically, that's how it works. For the first pass, the video is broken up into large segments, where the last frame is determined by a simple scene-detection algorithm. The second pass is split up by scene, which is already known since they're in the stats file. @DDogg: I made a test with my one computer, and x264farm was actually around 15% slower than not using it, due to all the overhead. Unfortunately, all of my computers are different, so I can't make a scalability test, but I have noticed that even adding my wimpy Core Solo Mac Mini makes it faster than just one computer. It would be nice if somebody did some tests with it. However, note that, by default, x264farm will re-encode 5% of the scenes that came the farthest from the ratecontrol prediction. Anybody interested in a comparison should turn that off with "--3ratio 0" In terms of setting up, it is a bit difficult. Here's what I would do: 1. Copy the agent directory to any computer that you want doing the encoding. 2. On each computer, make as many copies of the config.xml file as you have cores (dual-core computers should have 2 different xml files, etc...) 3. Edit the config files to point to the right x264 executable, and each one must have a different port. 4. Run the agent on each computer, once for each config file, with "agent --config <some_file.xml>" 5. Extract the controller directory to the main computer 6. Edit the controller's config.xml file to point at all the agents you have running. You need to know the computer's IP address and the port that you assigned each one. After that, the setup is done. Now you can start encoding. The basic command line is: controller --first "<first_pass_options>" --second "<second_pass_options>" -b <target_bitrate>kbps --avs <input_file.avs> -o <output_file.mkv> Look at the HTML file included in the release for what you should and shouldn't include in --first and --second. Also note that the -b needs to be in the form "-b 123kbps" or "-b 123%", where the second form indicates a percentage of the first pass bitrate to use.
__________________
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2 Last edited by omion; 5th November 2006 at 19:01. |
|
5th November 2006, 18:30 | #6 | Link |
Registered User
Join Date: Aug 2004
Posts: 211
|
Many thanks for the x264 build, its working now.
I tried encoding a file on the host and on my laptop with your framework and it kinda worked. The laptop only got used about 20% of the encoding time, but its slow as hell anyway. I sometimes get the error 'Agent croaked with exception "End_of_file"; restarting' on my laptop, that might be the reason for it not being used that much. And I can't start a second encode of the same file without deleting the tmp files manually. I will not go out and benchmark this tool, as I don't have that powerfull machines here. if you provide a linux 32bit build, I might try it at the university some time. But then I realise it still needs avisynth I do not have there, so I need to host it on a laptop... *edit* I just tried to encode with my laptop being the only client, and I get the same error as before and its not encoding the whole time, I would say the laptop is idle about 30% of the encoding time now. Last edited by Disabled; 5th November 2006 at 18:33. |
5th November 2006, 18:53 | #7 | Link | |||
Registered User
Join Date: Nov 2003
Location: San Diego, CA
Posts: 325
|
Quote:
Quote:
Quote:
Now, when you say it's idle, does it do a job for 30% of the time and wait for the other 70%, or does it say it's doing something but it's only using 30% of the CPU?
__________________
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2 |
|||
5th November 2006, 20:19 | #8 | Link | ||
Registered User
Join Date: Aug 2004
Posts: 211
|
Quote:
I'm not sure if I read the (messy) log correct, but I think after an idle phase the above mentioned error is shown and then it continues to encode. I just started a second run to verify the results. I did two tests with both machines running an agent, the first time the laptop was used just once, the second time I noticed four phases the CPU was maxed out on the laptop. Quote:
*edit* Now it happened again. Both machines were idle for about 2 minutes, then the "End_of_file" Error and after a few secs its encoding again... Last edited by Disabled; 5th November 2006 at 20:21. |
||
5th November 2006, 20:47 | #9 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
@omion
great work, but this wouldn't work with ABR 1pass encodes would it ?
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 5th November 2006 at 21:14. |
5th November 2006, 21:28 | #10 | Link | |
Registered User
Join Date: Nov 2003
Location: San Diego, CA
Posts: 325
|
Quote:
Yup. It's only 2-pass because that's all I use It would be quite difficult to turn it back into a 1-pass, as both passes are set up quite differently.
__________________
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2 |
|
5th November 2006, 21:39 | #11 | Link | |
Registered User
Join Date: Aug 2004
Posts: 211
|
Quote:
In addition to that I would say there was no traffic at all during that time... (doing a third run...) *edit* The controller sends Data to the agent. The Laptop starts encoding and it finishes ("Exited 0") on the Laptop. The Agent then shows "lap recieving stats" for a little over a minute. And then contines with "lap done recieving stats", but the Agent then shows the EOF exception and continues to encode... I guess I'll send you the whole log. I have the feeling the "revieving stats" message is just once in the log file, while "done receiving stats" is seven times, and I counted five idle times... Encoding time: 20 Mins, with about 5 mins idle Last edited by Disabled; 5th November 2006 at 22:02. |
|
6th November 2006, 01:39 | #13 | Link |
Registered User
Join Date: Nov 2003
Location: San Diego, CA
Posts: 325
|
I had initially made the program to distribute everything, but that requires being able to have the input files available to all computers, and also to have AVIsynth and all the filters available to all computers.
Synchronizing all the computers and especially making it work with non-Windows computers proved to be just about impossible. It was far too much work for its usefulness, so I basically got rid of it as soon as I got it working.
__________________
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2 |
6th November 2006, 01:51 | #14 | Link | |
Registered User
Join Date: Jan 2006
Posts: 101
|
Quote:
|
|
6th November 2006, 05:17 | #15 | Link | |
Registered User
Join Date: Oct 2001
Posts: 169
|
Quote:
I'm having the same problem, and I've replaced x.264 with the correct version from this thread. Here is my output: --------------------- D:\TEMP\agent-win32>agent --config config.xml Started up the print mutex Using config file ".\config.xml" Temp dir: "D:\\TEMP\\cpu1" Affinity: "" Port: 50722 x264: "x264.exe" nice: 10 Connecting... Listening... DOING! Got a connection from 127.0.0.1:1287 Doing stuff Doing the following: nice -n 10 x264.exe --bframes 3 --b-pyramid --direct auto --filter -2,-1 --subme 1 --ana lyse none --vbv-maxrate 25000 --me dia --merange 12 --progress --no-psnr --fps 24000/1001 --pass 1 --stats "D:\TEMP\cpu1\0 1298 6414bb.txt" -o NUL - 1280x608 x264 [warning]: VBV maxrate specified, but no bufsize. x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2 Agent croaked with exception "Sys_error(\"Broken pipe\")"; restarting DOING! Got a connection from 127.0.0.1:1290 Doing stuff Doing the following: nice -n 10 x264.exe --bframes 3 --b-pyramid --direct auto --filter -2,-1 --subme 1 --ana lyse none --vbv-maxrate 25000 --me dia --merange 12 --progress --no-psnr --fps 24000/1001 --pass 1 --stats "D:\TEMP\cpu1\0 1298 5a6db2.txt" -o NUL - 1280x608 x264 [warning]: VBV maxrate specified, but no bufsize. x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2 Agent croaked with exception "Unix.Unix_error(An existing connection was forcibly closed b y the remote host. , "recv", "")"; restarting DOING! --------------------- I don't think it's an x264 setting as it looks as though it's starting. I'll try without the VBV maxrate and see what happens. |
|
6th November 2006, 05:44 | #16 | Link |
Registered User
Join Date: Nov 2003
Location: San Diego, CA
Posts: 325
|
That sounds sort of like something went wrong on the controller side. Does the controller output say anything about an error or exception?
__________________
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2 |
6th November 2006, 05:44 | #17 | Link |
Registered User
Join Date: Oct 2001
Posts: 169
|
By removing the --vbv-maxrange stuff it got much further, I have two machines, one with dual cpu and one with dual core (controller running two agents also). It kept sending all frames to one machine, so I added "--batch 2500 --split 250" and it sends data to the other machine, but my two local agents both die with:
DOING! Got a connection from 127.0.0.1:1453 Doing stuff Doing the following: nice -n 10 x264.exe --bframes 3 --b-pyramid --direct auto --filter -2,-1 --subme 1 --ana lyse none --me dia --merange 12 --progress --no-psnr --fps 24000/1001 --pass 1 --stats "D: \TEMP\cpu1\0 12720 31d72a.txt" -o NUL - 1280x608 x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2 Agent croaked with exception "Sys_error(\"Broken pipe\")"; restarting So now I'm kinda stuck...any ideas? Here is my avs: LoadPlugin("D:\$VIDEO_WORK$\bin\AviSynth Plugins\DGDecode.dll") LoadPlugin("D:\$VIDEO_WORK$\bin\AviSynth Plugins\TIVTC.dll") MPEG2Source("Hero.d2v") Crop(16,134,-16,-142) TFM(d2v="Hero.d2v") #TDecimate(cycleR=3) TDecimate() LanczosResize(1280,608) Here is my controller startup line: controller -b 4500kbps --first "--bframes 3 --b-pyramid --direct auto --filter -2,-1 --subme 1 --analyse none --me dia --merange 12 --progress --no-psnr" --second "--ref 3 --bframes 3 --b-pyramid --weightb --direct auto --filter -2,-1 --subme 6 --trellis 1 --analyse all --8x8dct --me umh --merange 12 --progress --no-psnr" --avs "D:\$VIDEO_WORK$\WORKING\Hero.avs" -o "D:\$VIDEO_WORK$\WORKING\Hero-farm.mkv" --batch 2500 --split 250 --preseek 5 --config config.xml |
6th November 2006, 06:38 | #19 | Link |
Registered User
Join Date: Nov 2003
Location: San Diego, CA
Posts: 325
|
So the local machine croaks, but the agents on the other machine work as expected? That is strange.
Try out this controller, and see if that helps. If it doesn't, see if the line Code:
Local CPU 2 sent 4096 bytes Code:
Local CPU 2 sent first pass options > LOCAL CPU 2 LOCKED SPLITTER_LIST_MUTEX FROM EXCEPTION "Unix.Unix_error(_, \"send\", \"\")" I must admit, I don't really know why it's dying, but I think I can figure it out given enough time (and patience!) PS. You can delete the controller output now. I copied it to my computer, and I don't think anybody else will be able to comprehend it
__________________
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2 |
6th November 2006, 06:44 | #20 | Link |
Registered User
Join Date: Oct 2001
Posts: 169
|
With the new controller, here's what I got:
Local CPU 1 sent zone string "" Local CPU 1 sent range (0,2582) Local CPU 1 sent first pass options Local CPU 1 sent 4096 bytes > LOCAL CPU 1 LOCKED SPLITTER_LIST_MUTEX FROM EXCEPTION Unix.Unix_error(2522890,recv,) Local CPU 1 added (0,2582) back to the split list, since it caught an exception > LOCAL CPU 1 UNLOCKED SPLITTER_LIST_MUTEX |
Thread Tools | Search this Thread |
Display Modes | |
|
|