Log in

View Full Version : 2 Pass for multi bitrate httpStreaming


majin
27th July 2011, 16:18
Hi!

I am working with conversion for httpStreaming flash. I use mencoder.

I'm wondering if it's possible do first pass only one time and then use the file stats for the 2 pass of all bitrate. My bitrates start with 150 kbps until 2,5 Mb for a total of 7 conversions. The output videos differ also for resolution, not only bitrate.

The standard 2 pass will almost twice encoding time, but on the other side the quality is better than only one pass mode.

I've tried using the stats file for highest bitrate, but the quality for 150 kbps is simply awful: in fact the avarage bitrate I obtain is 33 kbps againt 139 of 1 pass encoding. Other bitrates look good.

Any advice?

Thank's a lot!

nm
27th July 2011, 16:31
I'm wondering if it's possible do first pass only one time and then use the file stats for the 2 pass of all bitrate.

It's better to update the stats file during each encode with --pass 3: http://doom10.org/index.php?topic=1787

majin
27th July 2011, 17:07
Thank you for the quick reply.

Very interesting read.

benwaggoner
4th August 2011, 00:43
The key thing about adaptive streaming is for all the bitrates to have aligned IDR frames for seamless bitrate switching.

In the VC-1 Adaptive Streaming SDK, we were able to do a single first pass and figure out the proper parameters for the parallel 2nd pass encodes, but it was a lot of work and code to tune it right. It paid off, though, implementing the following:

Adjustment of chunk size to align the IDRs starting chunks with cuts. This made bitrate switching less obvious and improved compression efficiency by avoiding lots of extra keyframes.
VBR encoding if needed, allowing whatever degree of variability in chunk size is desired (including none)
Dynamic frame resizing, where the encoded frame size would get reduced if required to avoid visible artifacts. This was adjusted per chunk based on the motion in the chunk. For example, a 720p chunk with a lot of horizontal motion might be compressed to 640x720, while one with a lot of vertical motion might be compressed to 1280x352. This had the added benefit of reducing the pixels needed to be decoded on frames that would otherwise be harder to decode in software.

I'd love to see something similar implemented for H.264. The stronger in-loop deblocking makes the dynamic resizing somewhat less valuable, but having a way to accurately align all bitrates with scene-detected chunk starts would be a be a huge help.

As it is, the fixed-IDR H.264 Smooth Streaming encoding generally doesn't look any better overall than VC-1 Smooth, and offers a lot faster software decoding. It'd be nice to combine the VC-1 Smooth features with the other advantages of H.264 High Profile!

nm
4th August 2011, 00:56
The key thing about adaptive streaming is for all the bitrates to have aligned IDR frames for seamless bitrate switching.

In the VC-1 Adaptive Streaming SDK, we were able to do a single first pass and figure out the proper parameters for the parallel 2nd pass encodes, but it was a lot of work and code to tune it right. It paid off, though, implementing the following:

Adjustment of chunk size to align the IDRs starting chunks with cuts. This made bitrate switching less obvious and improved compression efficiency by avoiding lots of extra keyframes.
VBR encoding if needed, allowing whatever degree of variability in chunk size is desired (including none)

The x264 pass 3 method does all of this except for completely parallel 2nd pass encodes.

Dark Shikari
4th August 2011, 07:43
The key thing about adaptive streaming is for all the bitrates to have aligned IDR frames for seamless bitrate switching.Which x264 will do by default with no changes.

but having a way to accurately align all bitrates with scene-detected chunk starts would be a be a huge help.It's called "encoding with x264". Try it someday!

As it is, the fixed-IDR H.264 Smooth Streaming encoding generally doesn't look any better overall than VC-1 SmoothWell, besides having 2-3x better compression, but it's not like Microsoft would have any idea what "video compression" meant, since they haven't produced a single competitive encoder in the past 25 years.

benwaggoner
5th August 2011, 00:49
Which x264 will do by default with no changes.
I will give that --pass 3 mechanism a try!

That said, there was one subtle optimization that proved to be of some use in the Smooth SDK implementation: we could set a "target" chunk size as well as minimum and max. So, for example, we could say a chunk duration of 2 seconds, but it could go down to 1 second or up to 4 seconds in order to get GOP alignment. But it'd be every 2 seconds mid-shot. This would up providing a somewhat better mix of visual quality and rapid stream switching than specifying just min/max chunk size, since that wounded up providing

(Don't take those numbers as blessed Best Practices. For example, I think an average of 3 seconds is generally superior for current H.264 adaptive delivery).

Well, besides having 2-3x better compression, but it's not like Microsoft would have any idea what "video compression" meant, since they haven't produced a single competitive encoder in the past 25 years.
Well, I suppose it all gets down to what the goals of the product are. I'd say, for their target markets, these were very competitive Microsoft encoder technologies

PEP/CineVision PSE. It made most of the top-quality HD DVD and Blu-ray titles in the first couple waves. Great segment reencoding, ROI sub-frame compression tweaking, and awesome 10-bit to 8-bit conversion.
The IPTV VC-1 encoder implementation used in Inlet's Spinnaker, with Lookahead Rate Control and Dynamic Complexity.
Vista's Windows DVD Maker had a quite impressive consumer-targeted MPEG-2 encoder; great balance of encoding speed and quality for the target audience. (I haven't tried the Win 7 version)
Expression Encoder has been a very useful compression tool usable by devolopers and designers, not just video professionals. Its user interface, implicit high quality preprocessing, rich captioning and metadata support, and functionality makes it much more productive than any other high-touch compression tool I've used.
Smooth Streaming Encoder SDK. Nothing available at its release could offer similar quality or decode complexity for adaptive streaming solutions. It still beats current x264 in some shots when using CBR bitrates by using dynamic resizing to avoid blocking artifacts.


I'm well aware that there have been plenty of other products available that overlaped with the functionality and time frame of the above and had their own unique advantages. But I think all those those were quite competitive encoders for their eras, and the best tools available at the time for some important scenarios.

FWIW when I teach compression classes these days, the tools I use to illustrate concepts and workflow are Expression Encoder, Carbon, and MeGUI.

Blue_MiSfit
5th August 2011, 06:54
I'd agree that as a pure encoder core, nothing touches x264 for nearly any application. This is a good thing, and is quite impressive. I've been staunchly pro for years, and remain so!

However, the fully built-out solutions that Ben mentions are definitely worth noting. Dynamic resizing, muxing, and DRM for the many adaptive streaming systems, segment re-encoding, closed captioning, and TS muxing features (to name just a few) are all quite important in certain spaces. When a company or professional individual is looking for a solution, sometimes a pure encoder (like x264) with supporting tools (like ffmpeg, avisynth, maybe some basic scripting etc) is by far the best solution. However, sometimes the case can be made for off the shelf products.

When I entered the industry I laughed at the shortcomings of some tools like Carbon Coder, Digital Rapids, Spinnaker etc... but as I've learned more and had more experience I've come to appreciate the fully developed solutions these products can offer. To be fair, support is often slow and atrocious, the severity of bugs can be astonishing, and the cost is typically astronomical. But it's not all bad. Deep automation and integration with other products you already own comes in handy!

I guess all I'm getting at here is that you always have options. For a simple transcoding or streaming project it's almost always a fantastic idea to cook up an x264 workflow. When integrating this with more complex projects where proper software development would be necessary, it's often preferable (and much faster) to deploy an existing solution.

Just my two cents, after being in the industry for a few years.

If some of the "big guys" would license x264, things would be a little different :devil:

Derek

Dark Shikari
5th August 2011, 07:03
If some of the "big guys" would license x264, things would be a little different :devil:Oh how I'd love to show you our licensee list :p

Blue_MiSfit
5th August 2011, 07:13
That's really excellent news, especially considering I scream quite loudly at each company I possibly can to do just this.

*raises glass* Here's to hoping we see Carbon Coder and Digital Rapids with x264 in the near future!

kieranrk
5th August 2011, 18:54
PEP/CineVision PSE. It made most of the top-quality HD DVD and Blu-ray titles in the first couple waves. Great segment reencoding, ROI sub-frame compression tweaking, and awesome 10-bit to 8-bit conversion.


Seriously? The initial VC-1 discs were mediocre at best.

benwaggoner
5th August 2011, 20:30
Seriously? The initial VC-1 discs were mediocre at best.
They were better than the inital H.264 and MPEG-2 discs, though! And continued to get better during the life of HD DVD (once Blu-ray won, the team refocused on Smooth Streaming and Windows 7).

But that's getting pretty off topic :).

I've made my first script attempting to do multirate multirez x264 adaptive, following the above. This is where I am right now for the top two bitrates (and yes, the command line is longer than it needs to be; I wanted to make sure exactly the parameters that were being used for debugging purposes). I'll then use this to compare net results with the VC-1 VBR Smooth SDK.

x264.exe --profile high --level 4.0 --tune film --pass 1 --bitrate 5000 --keyint 72 --min-keyint 24 --bframes 5 --b-adapt 2 --ref 4 --slices 1 --vbv-bufsize 25000 --vbv-maxrate 5000 --no-mbtree --merange 24 --me umh --direct auto --subme 11 --partitions all --trellis 2 --aud --nal-hrd vbr --thread-input --stats "MatchPoint-current.stats" -o "MatchPoint-1920x1080-5000.mp4" MatchPoint_1080p.avs
copy MatchPoint-current.stats MatchPoint-1920x1080-5000-1p.stats
x264.exe --profile high --level 4.0 --tune film --pass 3 --bitrate 5000 --keyint 72 --min-keyint 24 --bframes 5 --b-adapt 2 --ref 4 --slices 1 --vbv-bufsize 25000 --vbv-maxrate 5000 --no-mbtree --merange 24 --me umh --direct auto --subme 11 --partitions all --trellis 2 --aud --nal-hrd vbr --thread-input --stats "MatchPoint-current.stats" -o "MatchPoint-1920x1080-5000.mp4" MatchPoint_1080p.avs
copy MatchPoint-current.stats MatchPoint-1920x1080-5000-3p.stats
x264.exe --profile high --level 3.2 --tune film --pass 3 --bitrate 3000 --keyint 72 --min-keyint 24 --bframes 5 --b-adapt 2 --ref 4 --slices 1 --vbv-bufsize 15000 --vbv-maxrate 3000 --no-mbtree --merange 24 --me umh --direct auto --subme 11 --partitions all --trellis 2 --aud --nal-hrd vbr --video-filter resize:width=1280,height=720 --thread-input --stats "MatchPoint-current.stats" -o "MatchPoint-1280x720-3000.mp4" MatchPoint_1080p.avs
copy MatchPoint-current.stats MatchPoint-1280x720-3000-3p.stats

A few goals/notes

The streams target compatiblity with both software and hardware decoders
I'm using a 1 sec min 3 sec max GOP as a compromise between the 2 sec average 4 sec max the VC-1 SDK uses. That should offer pretty equivalent bitrate switching functionality.
4 refs since that's the max for 1080p at High 4.0. Would this technique work using different slice counts for different streams?
Lack of MBTree is going to be a quality hit here. Is there any way to generate a new one via lookahead during the encoding pass while reusing the frame decisions from the initial pass? Seems theoretically possible even if not currently implemented.
I'm making a backup copy of the log file from each pass for later analysis.
Having AUD and HRD make muxing into the final fMP4 or transport stream easier.
Should I worry about that "[swscaler @ 018897e0] full chroma interpolation for destination format 'yuv420p' not yet implemented" message? Any workaround?


Any thoughts or recommendations?

kieranrk
5th August 2011, 20:34
Having AUD and HRD make muxing into the final fMP4 or transport stream easier.

Any thoughts or recommendations?

HRD is mandatory in fact for compliant TS muxing. However, most of the iPhones and other devices that use smooth streaming don't bother with spec compliance - legal streams fail to play.

Use the presets etc...

Dark Shikari
5th August 2011, 20:50
Any thoughts or recommendations?

Use a preset, and if you're running multiple resolutions, you can't guarantee that your keyframes will match. That guarantee only applies with the same resolution.

You can get around this using a qpfile.

Also, use MB-tree and just run a different first pass for each resolution.

benwaggoner
5th August 2011, 21:26
Use a preset, and if you're running multiple resolutions, you can't guarantee that your keyframes will match. That guarantee only applies with the same resolution.
Oh! That's an important limitation for Smooth Streaming at least (Adobe's HTTP Dynamic Streaming doesn't promise seamless resolution switching, and Apple is unclear on this point. We know that the public Mac OS H.264 GPU decode API doesn't support arbitrary resolution changes).

But since we commonly target a huge range of bitates (like 250-8000 Kbps in 8-10 increments), good adaptive streaming has a hard requirement for variable frame sizing.

Also, the GPU decoder limitations listed above are one of the reasons I'm still sweating software decoder performance for adaptive streaming. Silverlight 5 will have GPU decode for arbitrary resolution changes depending on OS, GPU, driver, etcetera. But for 2012, the majority of adaptive streaming on PCs will be using software decoding.

VC-1 with dynamic resizing provides a pretty flat peak decode complexity. H.264 can obviously be tuned quite a bit for efficiency/decodability sweet spots (like using CAVLC for higher bitrates and CABAC at lower bitrates), which will need to be done.

You can get around this using a qpfile.

Also, use MB-tree and just run a different first pass for each resolution.
How would that work in practice? First pass would use --pass 1 and the qpfile to generate a new .stats and mbtree, and then the second use --pass 3 refining the first pass .stats, using the mbtree, and ignoring the qpfile?

Could you give an example command line?

benwaggoner
5th August 2011, 21:29
HRD is mandatory in fact for compliant TS muxing. However, most of the iPhones and other devices that use smooth streaming don't bother with spec compliance - legal streams fail to play...
Apple's HTTP Live Streaming is probably the worst documented video technology I've ever seen. They just revised their documentation, and are still recommending 30 fps > 12 fps and 24 fps > 15 fps as viable decimation ratios! And it's impossible to tease out what's a hard requirement, versus just what they're doing for the time being.

Since Smooth Streaming targets over-the-top STB as well as Silverlight playback, we work hard to make sure that our streams are spec compatible so arbitrary decoders can play the concatenated elementry streams without any hiccups.

Dark Shikari
5th August 2011, 21:39
How would that work in practice? First pass would use --pass 1 and the qpfile to generate a new .stats and mbtree, and then the second use --pass 3 refining the first pass .stats, using the mbtree, and ignoring the qpfile?

Could you give an example command line?You'd run a new --pass 1 for each resolution you wanted to do.

So, for example, if you wanted 3 different resolutions, each with 3 bitrates, you'd have to do 3 first passes and 9 second passes.

benwaggoner
5th August 2011, 21:52
You'd run a new --pass 1 for each resolution you wanted to do.

So, for example, if you wanted 3 different resolutions, each with 3 bitrates, you'd have to do 3 first passes and 9 second passes.
That makes sense.

Is there any easier way to generate the QP file than having an external script parse the .stats file, and pick out the "I" frames, and then write a qpfile just defining those frames?

One nice thing about this approach is it would allow using the maximum chunk duration as the first pass GOP size, and then inserting extra IDR frames in the qpfile to have a shorter average cadence mid-shot...

That would require running a new first pass using the new qpfile, of course.

Dark Shikari
5th August 2011, 21:54
That makes sense.

Is there any easier way to generate the QP file than having an external script parse the .stats file, and pick out the "I" frames, and then write a qpfile just defining those frames?

One nice thing about this approach is it would allow using the maximum chunk duration as the first pass GOP size, and then inserting extra IDR frames in the qpfile to have a shorter average cadence mid-shot...

That would require running a new first pass using the new qpfile, of course.Hmm. This whole "adaptive streaming" idea is a reasonable reason to allow differing resolutions between passes. Let me see what I can do here.

Target Practice
5th August 2011, 22:41
PEP/CineVision PSE [has] awesome 10-bit to 8-bit conversion.that it does.Should I worry about that "[swscaler @ 018897e0] full chroma interpolation for destination format 'yuv420p' not yet implemented" message? Any workaround?is the source 4:4:4? if so, try --output-csp i444. or, in the avisynth script, you could use ConvertToYV12

benwaggoner
6th August 2011, 00:32
that it does.is the source 4:4:4? if so, try --output-csp i444. or, in the avisynth script, you could use ConvertToYV12
Ah, source was Cineform 4:2:2. ConvertToYV12 worked perfectly.

mp3dom
6th August 2011, 01:32
that it does.
Unless I'm doing something wrong, it does but in a bad/worst way.

Test:
Color ramp in After Effects, project setup to 16bpc, exported to v210 (uncompressed 4:2:2 10bit, common format for HDCAM-SR master), dithered with PSE's pre-processing tools to uncompressed yv12, and lossless avc for x264 (latest rev. 2044):
http://thumbnails46.imagebam.com/14369/37a5a8143682098.jpg (http://www.imagebam.com/image/37a5a8143682098)

Dark Shikari
6th August 2011, 01:42
Patches welcome to try another dither method, or find the problem in the existing one. The existing one is just an implementation of error diffusion; perhaps ordered dither works better?

mp3dom
6th August 2011, 12:50
PSE uses serpentine Floyd Steinberg. Void and cluster dither (here, cfr. 2.7 (http://caca.zoy.org/wiki/libcaca/study/2#a2.7.Voidandclustermethod)) should be very good too.

Audionut
10th August 2011, 04:22
Let me see what I can do here.

4 days! Quick work :)

http://git.videolan.org/?p=x264.git;a=commit;h=0ba8a9c6973897ec35e1a5d241a71f4f5a4f81aa

Improve support for varying resolution between passes
Should give much better quality, but still doesn't support MB-tree yet.

Blue_MiSfit
10th August 2011, 05:02
This is awesome. I'll have to do some tests.... :)