piping 10bit 444le to x265|x264 => artifacts in output [Archive] - Page 2

View Full Version : piping 10bit 444le to x265|x264 => artifacts in output

Pages : 1 [2]

Jamaika

15th February 2017, 22:37

You need to use lutyuv filter after format filter which instruct scale filter to output yuv444p10le format.

so for full range copy paste vf command exactly as written:

ffmpeg.exe -loglevel verbose -i original_vc3.mxf -an -f yuv4mpegpipe -vf scale=2400:1350:in_color_matrix=bt709:in_range=full:out_color_matrix=bt709:out_range=full,format=yuv444p10le,lutyuv=val:val:val -strict -1 112.yuv
I don't understand, I have only pink picture throughout the movie.

Selur

15th February 2017, 22:58

ffmpeg.exe -loglevel verbose -i original_vc3.mxf -an -f yuv4mpegpipe -vf scale=2400:1350:in_color_matrix=bt709:in_range=full:out_color_matrix=bt709:out_range=full,format=yuv444p10le,lutyuv=val:val:val -strict -1 112.yuv
either use '-f yuv4mpegpipe' and a file with .y4m extension
OR
either use '-f rawvideo' and a file with .yuv extension
mixing those two might cause confusion,...

16th February 2017, 00:32

For now I've prepared version that fixes both problems -- hangs & black dots. There is no new option (for now) but the code is optimized for speed in case that input is clean (0.2s speed drop).

In archive there are 2 patches + Win64 binary
www.msystem.waw.pl/x265/test-selur2.7z

Selur

16th February 2017, 09:07

Thanks!
Using that version I confirm the 422-crashs are fixed.

regarding the black dots,..

10bit y4m -> 10bit output:
ffmpeg -y -loglevel fatal -threads 8 -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -pix_fmt yuv444p10le -f yuv4mpegpipe - | x265 --preset ultrafast --input - --output-depth 10 --y4m --crf 18.00 --output "H:\Output\x265_10-10bit_444.265"
-> no artifacts
12bit y4m -> 10bit output:
ffmpeg -y -loglevel fatal -threads 8 -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -pix_fmt yuv444p12le -f yuv4mpegpipe - | x265 --preset ultrafast --input - --output-depth 10 --y4m --crf 18.00 --output "H:\Output\x265_12-10bit_444.265"
-> no artifacts
16bit y4m -> 10bit output:
ffmpeg -y -loglevel fatal -threads 8 -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -pix_fmt yuv444p16le -f yuv4mpegpipe - | x265 --preset ultrafast --input - --output-depth 10 --y4m --crf 18.00 --output "H:\Output\x265_16-10bit_444.265"
-> no artifacts
16bit y4m -> 12bit output:
ffmpeg -y -loglevel fatal -threads 8 -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -pix_fmt yuv444p16le -f yuv4mpegpipe - | x265 --preset ultrafast --input - --output-depth 10 --y4m --crf 18.00 --output "H:\Output\x265_16-12bit_444.265"
12bit y4m -> 12bit output:
ffmpeg -y -loglevel fatal -threads 8 -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -pix_fmt yuv444p12le -f yuv4mpegpipe - | x265 --preset ultrafast --input - --output-depth 12 --y4m --crf 18.00 --output "H:\Output\x265_12-12bit_444.265"
-> artifacts
12bit y4m -> 12bit output using lutyuv=val:val:val:
ffmpeg -y -loglevel fatal -threads 8 -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -vf scale,format=yuv444p12le,lutyuv=val:val:val -f yuv4mpegpipe - | x265 --preset ultrafast --input - --output-depth 12 --y4m --crf 18.00 --output "H:\Output\x265_12-12bit_lutyuv_val_444.265"
-> artifacts
12bit y4m -> 12bit output using lutyuv=clipval:clipval:clipval:
ffmpeg -y -loglevel fatal -threads 8 -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -vf scale,format=yuv444p12le,lutyuv=clipval:clipval:clipval -f yuv4mpegpipe - | x265 --preset ultrafast --input - --output-depth 12 --y4m --crf 18.00 --output "H:\Output\x265_12-12bit_lutyuv_clipval_444.265"
-> no artifacts

problem seems nearly fixed.

Cu Selur

Jamaika

16th February 2017, 09:21

OK. There is no added value for val and clipval.:stupid:
I lost a little time to test the new codecs. For me it is a very interesting thread about a very complicated suggestions.
Firstly, adding features lutyuv by yuv444p12le to cause additional errors in the image so should not do that. Without lutyuv original is the same as yuv444p12le.
Secondly, whether it is better to convert the yuv444p12le or may yuv444p10le to X265 yuv444p10le?
The results are interesting. For the full range it is better with yuv444p12le for limited with yuv444p10le.
Thirdly, must I add lutyuv when taking a screenshot or rather not?
Shall not be added.
Fourth, I tested options for rawvideo rgb24 in x264. In the other case, using rawvideo makes no sense. The problem is that the functions RGB and lavf are dead for the x264. So I propose to remove them so as not deceiving users.
ffmpeg.exe -i original_vc3.mxf -f rawvideo -an -vf scale=2400:1350:in_color_matrix=bt709:in_range=full:out_color_matrix=rgb:out_range=full,format=rgb24,lutyuv=val:val:val -strict -1 - |
x264.exe -v --demuxer raw --input-csp rgb --input-range pc --input-res 2400x1350 --output-csp rgb --threads 4 --preset veryslow --tune grain --crf 18 --fps 24.000 --keyint 48 --nal-hrd none
--colormatrix bt709 --range pc --output "rgb24_crf18b.h264" -

@Selur: I wish you success by explaining the conversion in the forum for the program Hybrid and not only. ;)
I don't add pictures into not cluttering the forum. Anyone can do a test.

Selur

16th February 2017, 09:50

"format=rgb24,lutyuv=val:val:val" -> looks like a bad idea, since lutyuv is meant for yuv color spaces not for RGB (lutrgb should be used then) :)

Jamaika

16th February 2017, 10:21

Mayby, it's my provocation.
My mistake for RGB. I didn't know that I cann't add colormatrix and range. In X265 is this only info.
ffmpeg.exe -i original_vc3.mxf -f rawvideo -an -vf scale=2400:1350:in_color_matrix=bt709:in_range=full:out_color_matrix=rgb:out_range=full,format=rgb24 - |
x264.exe -v --demuxer raw --input-csp rgb --input-res 2400x1350 --output-csp rgb --threads 4 --preset veryslow --tune grain --crf 18 --fps 24.000 --keyint 48 --nal-hrd none
--output "rgb24_crf18b.h264" -

Selur

16th February 2017, 10:38

If you use a rawvideo pipe use '-loglevel fatal' to avoid that unwanted output ends up inside the piped output which then will corrupt the video stream.
using:
ffmpeg -y -loglevel fatal -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -an -sn -vf scale=2400:1350:in_color_matrix=bt709:in_range=full:out_color_matrix=rgb:out_range=full,format=rgb24 -f rawvideo - |
x264 -v --demuxer raw --input-csp rgb --input-res 2400x1350 --output-csp rgb --preset ultrafast --tune grain --crf 18 --fps 24.000 --output "h:\Outputrgb24_crf18b.264" -
encoding works fine here.

nevcairiel

16th February 2017, 11:23

If you use a rawvideo pipe use '-loglevel fatal' to avoid that unwanted output ends up inside the piped output which then will corrupt the video stream.

Thats now how that works, logging goes to stderr, pipe uses stdout, so logging never goes into the pipe but just comes out to the console.

Jamaika

16th February 2017, 11:27

Thats now how that works, logging goes to stderr, pipe uses stdout, so logging never goes into the pipe but just comes out to the console.
OK, loglevel fatal has influenc on lutyuv.
How to I use 'colormatrix' bt709 in x264 I have inverted the colors from RGB to BGR.

Selur

16th February 2017, 11:29

Thats now how that works, logging goes to stderr, pipe uses stdout, so logging never goes into the pipe but just comes out to the console.
You are right that is how it should work. :)

16th February 2017, 20:23

12bit y4m -> 12bit output:
[...]
-> artifacts

Strange. Source file is 12bit and ffmpeg decode it properly, so there is no values above 4095. I can't reproduce this. In my test output form x265s2 (which was in test-selur2.7z archive with validate input) is the same as from x265s1 (from archive test-selur.7z without validate input) and both are OK.
f:\speed\2.3+2>ffmpeg -y -loglevel fatal -threads 8 -i "..\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -pix_fmt yuv
444p12le -f yuv4mpegpipe - | x265s2 --preset ultrafast --input - --output-depth 12 --y4m --crf 18.00 --output w2.hevc
y4m [info]: 2400x1350 fps 24/1 i444p12 sar 1:1 unknown frame count
raw [info]: output file: w2.hevc
x265 [info]: HEVC encoder version 2.3+2-912dd749bdb5
x265 [info]: build info [Windows][GCC 6.3.0][64 bit] 12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [info]: Main 4:4:4 12 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 2 / wpp(43 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 16
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : dia / 57 / 0 / 2
x265 [info]: Keyframe min / max / scenecut / bias: 24 / 250 / 0 / 5.00
x265 [info]: Cb/Cr QP Offset : 6 / 6
x265 [info]: Lookahead / bframes / badapt : 5 / 3 / 0
x265 [info]: b-pyramid / weightp / weightb : 1 / 0 / 0
x265 [info]: References / ref-limit cu / depth : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 0.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-18.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 early-skip rskip tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing lslices=8 deblock
x265 [info]: frame I: 2, Avg QP:19.81 kb/s: 99486.43
x265 [info]: frame P: 81, Avg QP:22.06 kb/s: 37460.62
x265 [info]: frame B: 241, Avg QP:25.33 kb/s: 4761.86
x265 [info]: consecutive B-frames: 2.4% 1.2% 0.0% 96.4%

encoded 324 frames in 89.95s (3.60 fps), 13521.27 kb/s, Avg QP:24.48

f:\speed\2.3+2>ffmpeg -y -loglevel fatal -threads 8 -i "..\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -pix_fmt yuv
444p12le -f yuv4mpegpipe - | x265s1 --preset ultrafast --input - --output-depth 12 --y4m --crf 18.00 --output w1.hevc
y4m [info]: 2400x1350 fps 24/1 i444p12 sar 1:1 unknown frame count
raw [info]: output file: w1.hevc
x265 [info]: HEVC encoder version 2.3+2-912dd749bdb5
x265 [info]: build info [Windows][GCC 6.3.0][64 bit] 12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [info]: Main 4:4:4 12 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 2 / wpp(43 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 16
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : dia / 57 / 0 / 2
x265 [info]: Keyframe min / max / scenecut / bias: 24 / 250 / 0 / 5.00
x265 [info]: Cb/Cr QP Offset : 6 / 6
x265 [info]: Lookahead / bframes / badapt : 5 / 3 / 0
x265 [info]: b-pyramid / weightp / weightb : 1 / 0 / 0
x265 [info]: References / ref-limit cu / depth : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 0.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-18.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 early-skip rskip tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing lslices=8 deblock
x265 [info]: frame I: 2, Avg QP:19.81 kb/s: 99486.43
x265 [info]: frame P: 81, Avg QP:22.06 kb/s: 37460.62
x265 [info]: frame B: 241, Avg QP:25.33 kb/s: 4761.86
x265 [info]: consecutive B-frames: 2.4% 1.2% 0.0% 96.4%

encoded 324 frames in 90.04s (3.60 fps), 13521.27 kb/s, Avg QP:24.48

Selur

16th February 2017, 21:08

copied the x264.exe you send me again into the folder and did the call again:
ffmpeg -y -loglevel fatal -threads 8 -i "F:\TestClips&Co\vc-3\original_vc3.mxf" -map 0:0 -an -sn -vsync 0 -strict -1 -pix_fmt yuv444p12le -f yuv4mpegpipe - | x265 --preset ultrafast --input - --output-depth 12 --y4m --crf 18.00 --output "H:\Output\x265_12-12bit_444.265"
y4m [info]: 2400x1350 fps 24/1 i444p12 sar 1:1 unknown frame count
raw [info]: output file: H:\Output\x265_12-12bit_444.265
x265 [info]: HEVC encoder version 2.3+2-912dd749bdb5
x265 [info]: build info [Windows][GCC 6.3.0][64 bit] 12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
x265 [info]: Main 4:4:4 12 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 8 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 3 / wpp(43 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 16
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : dia / 57 / 0 / 2
x265 [info]: Keyframe min / max / scenecut / bias: 24 / 250 / 0 / 5.00
x265 [info]: Cb/Cr QP Offset : 6 / 6
x265 [info]: Lookahead / bframes / badapt : 5 / 3 / 0
x265 [info]: b-pyramid / weightp / weightb : 1 / 0 / 0
x265 [info]: References / ref-limit cu / depth : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 0.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-18.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 early-skip rskip tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing lslices=8 deblock
x265 [info]: frame I: 2, Avg QP:19.81 kb/s: 99486.43
x265 [info]: frame P: 81, Avg QP:22.06 kb/s: 37460.62
x265 [info]: frame B: 241, Avg QP:25.33 kb/s: 4761.86
x265 [info]: consecutive B-frames: 2.4% 1.2% 0.0% 96.4%

encoded 324 frames in 26.15s (12.39 fps), 13521.27 kb/s, Avg QP:24.48
output shows artifacts here (uploaded the output to my GoogleDrive (https://drive.google.com/drive/folders/0B_WxUS1XGCPAU1poaXdzNW10Qm8?usp=sharing))

16th February 2017, 21:21

output shows artifacts here (uploaded the output to my GoogleDrive (https://drive.google.com/drive/folders/0B_WxUS1XGCPAU1poaXdzNW10Qm8?usp=sharing))

The file is (except cpuid, frame-threads and numa-pools in header) the same as my. If I play this video I don't see any black dots.

Selur

16th February 2017, 21:27

Okay, then the problem is related to my MPC-HC setup and the conversion from 12bit to 10bit. (since my display only shows 10bit)
Thanks! (will probably have to wait for the next MPC-HC nightly until that works)

sneaker_ger

16th February 2017, 21:44

The dots appear when LAV outputs Y410. They aren't there when it outputs Y416. (using madvr)

Selur

16th February 2017, 21:51

Nice! You are right, enabling Y416 in Libav seems to fix the issue (madvr reports Y416 then :))
Thanks!

nevcairiel

16th February 2017, 21:55

Never disable any of the output formats from LAV Video, let the renderer do a conversion to whatever your display wants, faster and safer that way. :)

Selur

16th February 2017, 21:59

I not aware I disabled any of the outputs, but since I haven't really looked into those settings for quite some time I might have just forgotten about it. ;)

11th March 2017, 21:07

I found the bug in ffmpeg. It is in DITHER_COPY macro (or dither_scale data or dithers data):
https://github.com/FFmpeg/FFmpeg/blob/master/libswscale/swscale_unscaled.c#L1487
in line(s):
dst = (src + dither) * scale >> shift;

In Selur example we have 12-bit source and 10-bit destination, which give as in DITHER_COPY:
scale = 2047
shift = 13

First question -- how big can be dither to avoid overflow in operation:
(4095 + dither) * 2047 >> 13
we have
(4095+3)*2047>>13=1023
(4095+4)*2047>>13=1024
so we can add as much as 3 (but in DITHER_COPY we add 15).

Second question -- how big can be scale to avoid overflow in operation:
(4095+15) * scale >> 13
we have
(4095+15)*2041>>13=1023
(4095+15)*2042>>13=1024
so if we want to add 15 we can multiply by 2041 max (but in DITHER_COPY we multiply by 2047).
-------------------------------
First attempt to fix is:
diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index ba3d688..bd4e4f2 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -1488,7 +1488,7 @@ static int packedCopyWrapper(SwsContext *c, const uint8_t *src[],
uint16_t scale= dither_scale[dst_depth-1][src_depth-1];\
int shift= src_depth-dst_depth + dither_scale[src_depth-2][dst_depth-1];\
for (i = 0; i < height; i++) {\
- const uint8_t *dither= dithers[src_depth-9][i&7];\
+ const uint8_t *dither= dithers[src_depth-dst_depth-1][i&7];\
for (j = 0; j < length-7; j+=8){\
dst[j+0] = dbswap((bswap(src[j+0]) + dither[0])*scale>>shift);\
dst[j+1] = dbswap((bswap(src[j+1]) + dither[1])*scale>>shift);\

12th March 2017, 17:24

I've checked all numbers in dither_scale table and most are OK. Wrong numbers are for dither [from 11-bit to 8-bit are OK, my bad in copy numbers] from 16-bit to 15-bit (overflow in signed multiply -- undefined behavior; now impossible conversion).

I decided to simplify dither_scale table and write the numbers in form that you at once see that all numbers are optimal (and easy to change if you change dithers table).

nevcairiel or richardpl -- please review this patch:
diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index ba3d688..8bc9ba6 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -110,22 +110,19 @@ DECLARE_ALIGNED(8, static const uint8_t, dithers)[8][8][8]={
{ 112, 16,104, 8,118, 22,110, 14,},
}};

-static const uint16_t dither_scale[15][16]={
-{ 2, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,},
-{ 2, 3, 7, 7, 13, 13, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,},
-{ 3, 3, 4, 15, 15, 29, 57, 57, 57, 113, 113, 113, 113, 113, 113, 113,},
-{ 3, 4, 4, 5, 31, 31, 61, 121, 241, 241, 241, 241, 481, 481, 481, 481,},
-{ 3, 4, 5, 5, 6, 63, 63, 125, 249, 497, 993, 993, 993, 993, 993, 1985,},
-{ 3, 5, 6, 6, 6, 7, 127, 127, 253, 505, 1009, 2017, 4033, 4033, 4033, 4033,},
-{ 3, 5, 6, 7, 7, 7, 8, 255, 255, 509, 1017, 2033, 4065, 8129,16257,16257,},
-{ 3, 5, 6, 8, 8, 8, 8, 9, 511, 511, 1021, 2041, 4081, 8161,16321,32641,},
-{ 3, 5, 7, 8, 9, 9, 9, 9, 10, 1023, 1023, 2045, 4089, 8177,16353,32705,},
-{ 3, 5, 7, 8, 10, 10, 10, 10, 10, 11, 2047, 2047, 4093, 8185,16369,32737,},
-{ 3, 5, 7, 8, 10, 11, 11, 11, 11, 11, 12, 4095, 4095, 8189,16377,32753,},
-{ 3, 5, 7, 9, 10, 12, 12, 12, 12, 12, 12, 13, 8191, 8191,16381,32761,},
-{ 3, 5, 7, 9, 10, 12, 13, 13, 13, 13, 13, 13, 14,16383,16383,32765,},
-{ 3, 5, 7, 9, 10, 12, 14, 14, 14, 14, 14, 14, 14, 15,32767,32767,},
-{ 3, 5, 7, 9, 11, 12, 14, 15, 15, 15, 15, 15, 15, 15, 16,65535,},
+/* Numbers 1, 3, 7, 15, 31, 63, 63, 126 in table dither_scale are from
+ * maximum values in diters[0], dithers[1], ..., dithers[7] table.
+ * If you change dithers table please update these numbers.
+ */
+static const uint16_t dither_scale[8][8]={
+{ ((1<<24)-1)/(511+1), ((1<<25)-1)/(1023+3), ((1<<26)-1)/(2047+7), ((1<<27)-1)/(4095+15), ((1<<28)-1)/(8191+31), ((1<<29)-1)/(16383+63), ((1<<30)-1)/(32767+63), ((1u<<31)-1)/(65535+126),},
+{ 0, ((1<<25)-1)/(1023+1), ((1<<26)-1)/(2047+3), ((1<<27)-1)/(4095+ 7), ((1<<28)-1)/(8191+15), ((1<<29)-1)/(16383+31), ((1<<30)-1)/(32767+63), ((1u<<31)-1)/(65535+ 63),},
+{ 0, 0, ((1<<26)-1)/(2047+1), ((1<<27)-1)/(4095+ 3), ((1<<28)-1)/(8191+ 7), ((1<<29)-1)/(16383+15), ((1<<30)-1)/(32767+31), ((1u<<31)-1)/(65535+ 63),},
+{ 0, 0, 0, ((1<<27)-1)/(4095+ 1), ((1<<28)-1)/(8191+ 3), ((1<<29)-1)/(16383+ 7), ((1<<30)-1)/(32767+15), ((1u<<31)-1)/(65535+ 31),},
+{ 0, 0, 0, 0, ((1<<28)-1)/(8191+ 1), ((1<<29)-1)/(16383+ 3), ((1<<30)-1)/(32767+ 7), ((1u<<31)-1)/(65535+ 15),},
+{ 0, 0, 0, 0, 0, ((1<<29)-1)/(16383+ 1), ((1<<30)-1)/(32767+ 3), ((1u<<31)-1)/(65535+ 7),},
+{ 0, 0, 0, 0, 0, 0, ((1<<30)-1)/(32767+ 1), ((1u<<31)-1)/(65535+ 3),},
+{ 0, 0, 0, 0, 0, 0, 0, ((1u<<31)-1)/(65535+ 1),},
};

@@ -1485,10 +1482,10 @@ static int packedCopyWrapper(SwsContext *c, const uint8_t *src[],
}

#define DITHER_COPY(dst, dstStride, src, srcStride, bswap, dbswap)\
- uint16_t scale= dither_scale[dst_depth-1][src_depth-1];\
- int shift= src_depth-dst_depth + dither_scale[src_depth-2][dst_depth-1];\
+ uint16_t scale= dither_scale[dst_depth-8][src_depth-9];\
+ int shift= src_depth-dst_depth + 15;\
for (i = 0; i < height; i++) {\
- const uint8_t *dither= dithers[src_depth-9][i&7];\
+ const uint8_t *dither= dithers[src_depth-dst_depth-1][i&7];\
for (j = 0; j < length-7; j+=8){\
dst[j+0] = dbswap((bswap(src[j+0]) + dither[0])*scale>>shift);\
dst[j+1] = dbswap((bswap(src[j+1]) + dither[1])*scale>>shift);\

15th March 2017, 12:47

This DITHER_COPY approach in ffmpeg is mathematically wrong. It makes the image darker. After 200 iterations on Selur's sample movie 12-bit to 10-bit, 10-bit to 8-bit and 8-bit to 12-bit:
for /L %%i in (1 1 200) do (
ffmpeg2 -i w2.y4m -v warning -strict -1 -pix_fmt yuv444p10 w2-10-%%i.y4m
ffmpeg2 -i w2-10-%%i.y4m -v warning -strict -1 -pix_fmt yuv444p w2-8-%%i.y4m
ffmpeg2 -i w2-8-%%i.y4m -y -v warning -strict -1 -pix_fmt yuv444p12 w2.y4m
)
the final movie is darker and greener -- www.msystem.waw.pl/x265/w200.mkv

17th March 2017, 00:37

I've sent a patch to ffmpeg-devel that is mathematical balanced (but 0.2s slower on whole Selur's movie) -- https://patchwork.ffmpeg.org/patch/2941/

200th iteration with new patch looks like this -- www.msystem.waw.pl/x265/n200.mkv (it is the same like first iteration because new patch do not change upshifted data i.e. data with 0 on bits to remove).

Selur

17th March 2017, 05:53

Thanks! :D