Hyral
15th October 2013, 23:54
Hello,
I am doing research on adaptive web video content presentation (MPEG-DASH) and would highly appreciate feedback from this forum's experts. My goal at this moment is to define down-scaling resolution configurations in the most quality/bitrate-efficient manner possible so I can later fit them into bandwith and bitrate recommendations. I will elaborate my method and conclusions so we can check if I'm on the right track. I've been reading Doom9 since 2011, I hope I have gathered enough to accomplish something useful.
1st of all, I define target displays. By checking the web, particularly this List of Displays By Pixel Density (http://en.wikipedia.org/wiki/List_of_displays_by_pixel_density), I have identified the most relevant displays as 1920x1080 (FHD), 1280x720 (HD), 960x640 (DVGA), 960x540 (qHD), 854x480 (FWVGA), 800x480 (WVGA), 640x480 (VGA), 432x240 (FWWVGA) and 320x240 (QVGA).
2nd, I set the premise that the steps must be reasonably spaced as to minimize the number of versions, maximizing compatibility and enabling more different bitrate versions for each resolution. As a starting guideline, HD is 44.4% the number of pixels of FHD. So, we can already see that many of the initially targeted displays will have to accomodate slightly lower-resolution content. Also, I've seen recommendations to produce lower resolutions for mobile devices than their displays in order to avoid compatibility and performance issues.
3rd, I must take note that my targeted displays are in different aspect ratios. As a preliminary guideline, the horizontal dimension will be the first reference, since it is largest and must be accomodated in lesser displays. For instance, in order to display 1.77:1 video in a 320x240 display, my content will have to be at most 320x180.
4th, I identify what aspects of resolution can be most detrimental to coding efficiency. By searching Doom9 it seems enemy #1 would be letterboxing, which curiously abound in Youtube for instance. This raises the question: in a web content presentation system, is it worth the added hassle of cropping? This means besides 1.77:1 and 1.33:1 content, one now has to deal with cinema 2.39:1, 1.85:1 and possibly more aspect ratios. The secondary sin would be not respecting mod-16. I have gathered that modern encoders such as x264 and XviD mitigate such losses, and in fact 1080p is not mod-16 and no one cares. Anyway, mod-16 remains as an ideal goal during decision making, which should be approximated as possible, while mod-4 should be absolutely mandatory. However, I believe the lesser the resolution and bitrate, the higher the contribution of mod-16. Being so, at low resolutions I will recommend to hit mod-16 instead of hitting the exact aspect ratio. My target for this change of strategy will be resolutions that produce less than 350k pixels (720x480 is 345.6k). Also, since letterboxing is undesirable, if I have to lose strings of pixels I will recommend losing horizontal lines instead of vertical columns. I don't know if it would be the case of keeping the full frame vertically compressed, is this a common practice when aspect ratio is slightly off? In this case it would probably better to slightly compress horizontally as this would result in the least distortion.
5th, I gather references from Kaltura, Adobe, Youtube, Netflix and any other relevant systems.
6th, I set up a table for a single aspect ratio and begin exercising. For this I chose 1.77:1.
1920 x 1080 2.073.600 FHD; not mod-16; high-end
1280 x 720 921.600 HD; 44,4% of 1080p; PCs and tablets in general
1024 x 576 589.824 Kaltura iOS Wifi
960 x 544 522.240 inexact (-4); Kaltura iOS Wifi
960 x 540 518.400 qHD; not mod-8 (0,75), 56% of 720p, 25% of 1080p; many high-end phones by 2011, fits in DVGA
856 x 480 410.880 inexact (-1,5), not mod-16, 44.5% of 720p
854 x 480 409.920 FWVGA,NTSC; not mod-4, inexact (-0,4)
848 x 480 407.040 Wide PAL; inexact (+3)
768 x 432 331.776 60% of 540p; Kaltura Android 4G/Wifi
640 x 360 230.400 not mod-16 (0,5), 44.4% of 540p
640 x 352 228.800 inexact (+8), 44% of 540p, 60% of WVGA, 75% of VGA; Kaltura Desktop /iOS Wifi
512 x 288 147.456 64.5% of 352p; Kaltura Android 3G
480 x 270 129.600 not mod-4
416 x 234 97.344 não mod-4; Kaltura iOS 3G
416 x 232 96.512 inexact (+2)
320 x 180 57.600 not mod-8 (0,25)
320 x 176 56.320 inexact (+4), 73% of QVGA
256 x 144 36.864 Kaltura Android 2G
128 x 72 9.216 Kaltura Android 2G
FHD and HD are no-brainers as targeted resolutions for content, shown here only for comparison purposes.
At close to 44% of HD, we have many options. However, at this "tier", exact aspect ratio is the priority, so the best option is 960x540, which is 56% of HD but also fits into 960x640 and thus a broad range of mobile devices as well as netbooks. 854x480 is absolutely ruled out for not being even mod-4. 1024x576 is ruled out for being too close to 960x540. Let's call this one 540p.
Next, we need a resolution to fit into 640x480. 640x360 has exact aspect ratio and is exactly 44% of our 540p, but we've crossed into under 350k pixel territory and want to get exact mod-16 if possible. 640x352 hits that mark with a slight loss of 8 horizontal lines. In keeping with the constraints of the lesser devices, this is 75% of VGA and should decode smoothly.
VGA and QVGA are too far apart: 307.2k pixels against 76.8k, a 4 to 1 ratio. Seems fair to have an intermediate resolution, which would be exactly 480x360. But, at 1.77:1, that would be 480x270, which is not even mod-4. Kaltura's suggested 512x288 is an exact aspect ratio and mod-16 and has 64.5% the pixels of 352p. Alternatively, 416x232 is not exact at aspect ratio but is mod-16 and exactly 42,2% of 352p, more in line with our other steps, and fits in more devices such as some from Nokia and possibly iPod Nano. It is a close dispute but I'd stick with 416x232 for broader device support.
Finally, we get to 320x180. This is not even mod-8, which means lost bits. 320x176 loses a bit of aspect ratio but assures that every bit counts. This is very important at such low resolution. Incidentally, this is also 73% of QVGA pixel count, so we already hit our limited mobile device discount goal and need not go any further low.
Recapping, we're left with:
1920x1080
1280x720
960x540
640x352
416x232
320x176
So, I'm wondering if this methodology makes sense. I have tried exercises with 1.33:1 and 2.35:1 as well, more about that later perhaps.
Note that at this point I'm only looking into resolutions; I am aware of the differences in profiles, decoding complexity, GOP, framerates and so on.
I am doing research on adaptive web video content presentation (MPEG-DASH) and would highly appreciate feedback from this forum's experts. My goal at this moment is to define down-scaling resolution configurations in the most quality/bitrate-efficient manner possible so I can later fit them into bandwith and bitrate recommendations. I will elaborate my method and conclusions so we can check if I'm on the right track. I've been reading Doom9 since 2011, I hope I have gathered enough to accomplish something useful.
1st of all, I define target displays. By checking the web, particularly this List of Displays By Pixel Density (http://en.wikipedia.org/wiki/List_of_displays_by_pixel_density), I have identified the most relevant displays as 1920x1080 (FHD), 1280x720 (HD), 960x640 (DVGA), 960x540 (qHD), 854x480 (FWVGA), 800x480 (WVGA), 640x480 (VGA), 432x240 (FWWVGA) and 320x240 (QVGA).
2nd, I set the premise that the steps must be reasonably spaced as to minimize the number of versions, maximizing compatibility and enabling more different bitrate versions for each resolution. As a starting guideline, HD is 44.4% the number of pixels of FHD. So, we can already see that many of the initially targeted displays will have to accomodate slightly lower-resolution content. Also, I've seen recommendations to produce lower resolutions for mobile devices than their displays in order to avoid compatibility and performance issues.
3rd, I must take note that my targeted displays are in different aspect ratios. As a preliminary guideline, the horizontal dimension will be the first reference, since it is largest and must be accomodated in lesser displays. For instance, in order to display 1.77:1 video in a 320x240 display, my content will have to be at most 320x180.
4th, I identify what aspects of resolution can be most detrimental to coding efficiency. By searching Doom9 it seems enemy #1 would be letterboxing, which curiously abound in Youtube for instance. This raises the question: in a web content presentation system, is it worth the added hassle of cropping? This means besides 1.77:1 and 1.33:1 content, one now has to deal with cinema 2.39:1, 1.85:1 and possibly more aspect ratios. The secondary sin would be not respecting mod-16. I have gathered that modern encoders such as x264 and XviD mitigate such losses, and in fact 1080p is not mod-16 and no one cares. Anyway, mod-16 remains as an ideal goal during decision making, which should be approximated as possible, while mod-4 should be absolutely mandatory. However, I believe the lesser the resolution and bitrate, the higher the contribution of mod-16. Being so, at low resolutions I will recommend to hit mod-16 instead of hitting the exact aspect ratio. My target for this change of strategy will be resolutions that produce less than 350k pixels (720x480 is 345.6k). Also, since letterboxing is undesirable, if I have to lose strings of pixels I will recommend losing horizontal lines instead of vertical columns. I don't know if it would be the case of keeping the full frame vertically compressed, is this a common practice when aspect ratio is slightly off? In this case it would probably better to slightly compress horizontally as this would result in the least distortion.
5th, I gather references from Kaltura, Adobe, Youtube, Netflix and any other relevant systems.
6th, I set up a table for a single aspect ratio and begin exercising. For this I chose 1.77:1.
1920 x 1080 2.073.600 FHD; not mod-16; high-end
1280 x 720 921.600 HD; 44,4% of 1080p; PCs and tablets in general
1024 x 576 589.824 Kaltura iOS Wifi
960 x 544 522.240 inexact (-4); Kaltura iOS Wifi
960 x 540 518.400 qHD; not mod-8 (0,75), 56% of 720p, 25% of 1080p; many high-end phones by 2011, fits in DVGA
856 x 480 410.880 inexact (-1,5), not mod-16, 44.5% of 720p
854 x 480 409.920 FWVGA,NTSC; not mod-4, inexact (-0,4)
848 x 480 407.040 Wide PAL; inexact (+3)
768 x 432 331.776 60% of 540p; Kaltura Android 4G/Wifi
640 x 360 230.400 not mod-16 (0,5), 44.4% of 540p
640 x 352 228.800 inexact (+8), 44% of 540p, 60% of WVGA, 75% of VGA; Kaltura Desktop /iOS Wifi
512 x 288 147.456 64.5% of 352p; Kaltura Android 3G
480 x 270 129.600 not mod-4
416 x 234 97.344 não mod-4; Kaltura iOS 3G
416 x 232 96.512 inexact (+2)
320 x 180 57.600 not mod-8 (0,25)
320 x 176 56.320 inexact (+4), 73% of QVGA
256 x 144 36.864 Kaltura Android 2G
128 x 72 9.216 Kaltura Android 2G
FHD and HD are no-brainers as targeted resolutions for content, shown here only for comparison purposes.
At close to 44% of HD, we have many options. However, at this "tier", exact aspect ratio is the priority, so the best option is 960x540, which is 56% of HD but also fits into 960x640 and thus a broad range of mobile devices as well as netbooks. 854x480 is absolutely ruled out for not being even mod-4. 1024x576 is ruled out for being too close to 960x540. Let's call this one 540p.
Next, we need a resolution to fit into 640x480. 640x360 has exact aspect ratio and is exactly 44% of our 540p, but we've crossed into under 350k pixel territory and want to get exact mod-16 if possible. 640x352 hits that mark with a slight loss of 8 horizontal lines. In keeping with the constraints of the lesser devices, this is 75% of VGA and should decode smoothly.
VGA and QVGA are too far apart: 307.2k pixels against 76.8k, a 4 to 1 ratio. Seems fair to have an intermediate resolution, which would be exactly 480x360. But, at 1.77:1, that would be 480x270, which is not even mod-4. Kaltura's suggested 512x288 is an exact aspect ratio and mod-16 and has 64.5% the pixels of 352p. Alternatively, 416x232 is not exact at aspect ratio but is mod-16 and exactly 42,2% of 352p, more in line with our other steps, and fits in more devices such as some from Nokia and possibly iPod Nano. It is a close dispute but I'd stick with 416x232 for broader device support.
Finally, we get to 320x180. This is not even mod-8, which means lost bits. 320x176 loses a bit of aspect ratio but assures that every bit counts. This is very important at such low resolution. Incidentally, this is also 73% of QVGA pixel count, so we already hit our limited mobile device discount goal and need not go any further low.
Recapping, we're left with:
1920x1080
1280x720
960x540
640x352
416x232
320x176
So, I'm wondering if this methodology makes sense. I have tried exercises with 1.33:1 and 2.35:1 as well, more about that later perhaps.
Note that at this point I'm only looking into resolutions; I am aware of the differences in profiles, decoding complexity, GOP, framerates and so on.