Resolutions for adaptive web content [Archive]

Hyral

15th October 2013, 23:54

Hello,

I am doing research on adaptive web video content presentation (MPEG-DASH) and would highly appreciate feedback from this forum's experts. My goal at this moment is to define down-scaling resolution configurations in the most quality/bitrate-efficient manner possible so I can later fit them into bandwith and bitrate recommendations. I will elaborate my method and conclusions so we can check if I'm on the right track. I've been reading Doom9 since 2011, I hope I have gathered enough to accomplish something useful.

1st of all, I define target displays. By checking the web, particularly this List of Displays By Pixel Density (http://en.wikipedia.org/wiki/List_of_displays_by_pixel_density), I have identified the most relevant displays as 1920x1080 (FHD), 1280x720 (HD), 960x640 (DVGA), 960x540 (qHD), 854x480 (FWVGA), 800x480 (WVGA), 640x480 (VGA), 432x240 (FWWVGA) and 320x240 (QVGA).

2nd, I set the premise that the steps must be reasonably spaced as to minimize the number of versions, maximizing compatibility and enabling more different bitrate versions for each resolution. As a starting guideline, HD is 44.4% the number of pixels of FHD. So, we can already see that many of the initially targeted displays will have to accomodate slightly lower-resolution content. Also, I've seen recommendations to produce lower resolutions for mobile devices than their displays in order to avoid compatibility and performance issues.

3rd, I must take note that my targeted displays are in different aspect ratios. As a preliminary guideline, the horizontal dimension will be the first reference, since it is largest and must be accomodated in lesser displays. For instance, in order to display 1.77:1 video in a 320x240 display, my content will have to be at most 320x180.

4th, I identify what aspects of resolution can be most detrimental to coding efficiency. By searching Doom9 it seems enemy #1 would be letterboxing, which curiously abound in Youtube for instance. This raises the question: in a web content presentation system, is it worth the added hassle of cropping? This means besides 1.77:1 and 1.33:1 content, one now has to deal with cinema 2.39:1, 1.85:1 and possibly more aspect ratios. The secondary sin would be not respecting mod-16. I have gathered that modern encoders such as x264 and XviD mitigate such losses, and in fact 1080p is not mod-16 and no one cares. Anyway, mod-16 remains as an ideal goal during decision making, which should be approximated as possible, while mod-4 should be absolutely mandatory. However, I believe the lesser the resolution and bitrate, the higher the contribution of mod-16. Being so, at low resolutions I will recommend to hit mod-16 instead of hitting the exact aspect ratio. My target for this change of strategy will be resolutions that produce less than 350k pixels (720x480 is 345.6k). Also, since letterboxing is undesirable, if I have to lose strings of pixels I will recommend losing horizontal lines instead of vertical columns. I don't know if it would be the case of keeping the full frame vertically compressed, is this a common practice when aspect ratio is slightly off? In this case it would probably better to slightly compress horizontally as this would result in the least distortion.

5th, I gather references from Kaltura, Adobe, Youtube, Netflix and any other relevant systems.

6th, I set up a table for a single aspect ratio and begin exercising. For this I chose 1.77:1.

1920 x 1080 2.073.600 FHD; not mod-16; high-end
1280 x 720 921.600 HD; 44,4% of 1080p; PCs and tablets in general
1024 x 576 589.824 Kaltura iOS Wifi
960 x 544 522.240 inexact (-4); Kaltura iOS Wifi
960 x 540 518.400 qHD; not mod-8 (0,75), 56% of 720p, 25% of 1080p; many high-end phones by 2011, fits in DVGA
856 x 480 410.880 inexact (-1,5), not mod-16, 44.5% of 720p
854 x 480 409.920 FWVGA,NTSC; not mod-4, inexact (-0,4)
848 x 480 407.040 Wide PAL; inexact (+3)
768 x 432 331.776 60% of 540p; Kaltura Android 4G/Wifi
640 x 360 230.400 not mod-16 (0,5), 44.4% of 540p
640 x 352 228.800 inexact (+8), 44% of 540p, 60% of WVGA, 75% of VGA; Kaltura Desktop /iOS Wifi
512 x 288 147.456 64.5% of 352p; Kaltura Android 3G
480 x 270 129.600 not mod-4
416 x 234 97.344 não mod-4; Kaltura iOS 3G
416 x 232 96.512 inexact (+2)
320 x 180 57.600 not mod-8 (0,25)
320 x 176 56.320 inexact (+4), 73% of QVGA
256 x 144 36.864 Kaltura Android 2G
128 x 72 9.216 Kaltura Android 2G

FHD and HD are no-brainers as targeted resolutions for content, shown here only for comparison purposes.

At close to 44% of HD, we have many options. However, at this "tier", exact aspect ratio is the priority, so the best option is 960x540, which is 56% of HD but also fits into 960x640 and thus a broad range of mobile devices as well as netbooks. 854x480 is absolutely ruled out for not being even mod-4. 1024x576 is ruled out for being too close to 960x540. Let's call this one 540p.

Next, we need a resolution to fit into 640x480. 640x360 has exact aspect ratio and is exactly 44% of our 540p, but we've crossed into under 350k pixel territory and want to get exact mod-16 if possible. 640x352 hits that mark with a slight loss of 8 horizontal lines. In keeping with the constraints of the lesser devices, this is 75% of VGA and should decode smoothly.

VGA and QVGA are too far apart: 307.2k pixels against 76.8k, a 4 to 1 ratio. Seems fair to have an intermediate resolution, which would be exactly 480x360. But, at 1.77:1, that would be 480x270, which is not even mod-4. Kaltura's suggested 512x288 is an exact aspect ratio and mod-16 and has 64.5% the pixels of 352p. Alternatively, 416x232 is not exact at aspect ratio but is mod-16 and exactly 42,2% of 352p, more in line with our other steps, and fits in more devices such as some from Nokia and possibly iPod Nano. It is a close dispute but I'd stick with 416x232 for broader device support.

Finally, we get to 320x180. This is not even mod-8, which means lost bits. 320x176 loses a bit of aspect ratio but assures that every bit counts. This is very important at such low resolution. Incidentally, this is also 73% of QVGA pixel count, so we already hit our limited mobile device discount goal and need not go any further low.

Recapping, we're left with:
1920x1080
1280x720
960x540
640x352
416x232
320x176

So, I'm wondering if this methodology makes sense. I have tried exercises with 1.33:1 and 2.35:1 as well, more about that later perhaps.

Note that at this point I'm only looking into resolutions; I am aware of the differences in profiles, decoding complexity, GOP, framerates and so on.

benwaggoner

16th October 2013, 01:58

These seem generally pretty reasonable. Personally, I don't think it's that valuable to worry about getting 1:1 pixel mapping on a mobile device. A 4" phone screen will look identical playing good 720p and good 1080p encodes. Adaptive streaming really requires a decent quality scaling algorithm in the first place.

That also means square pixel isn't that big a deal either assuming you've got a player that can do the right thing. Plenty don't out of the box, however.

Lastly, H.264 is pretty darn bit efficient with non mod16, so you don't need to sweat that too much from a bit efficiency perspective. It's not like MPEG-2, certainly! Getting mod16 can help a bit software decoder performance, if you need to be compatible with those.

As for aspect ratio, I say crop out when you can. That gives the player flexibility to match the aspect ratio of the screen. I hate seeing letterboxed 16:9 content in a 4:3 windows playing on a 16:9 device where it gets pillar-boxed as well and you have a huge black border around this little frame of video.

Hyral

16th October 2013, 19:54

Thank you for your feedback, Mr. Waggoner. =)

About 1080p/720p on phones, in my country, 4G has just started being commercialized and is quite expensive, and modern smartphones are also disproportionately expensive in our economy, so such devices and applications are not widely adopted by any measure. My own, for instance, is a simple Samsung Galaxy Mini with a 320x240 display and poor processing performance. On the other hand, hardware H.264 decoding IS common, even in such poor devices. Anyway, when we're working with bit rates as low as 200 kbit/s, even with H.264 efficiency, wouldn't it be worth the extra efficiency to force mod-16?

I feel such an adaptive streaming should rely as little as possible on player capabilities. So it seems to me it would be preferable to render my scaled-down versions statically and directly adapted to targeted devices. Is this not common practice?

I'm relieved you recommend cropping. I was afraid there would be undesirable complications, since a large system such as Youtube seems to ignore this completely.