Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
1st November 2014, 01:45 | #1 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
Scaling, colorspace conversion, and dithering library
Some folks have expressed interest in having a program library to perform the common tasks of manipulating resolution, colorspace, and depth. The "z" library (name subject to change) provides a simple interface to perform those tasks. An open source Vapoursynth plugin is provided, demonstrating the usage of the API.
Code:
NAME: z - z.lib for VapourSynth SYNOPSIS: z.Format(clip clip, int "width", int "height", int "format", enum "matrix", enum "transfer", enum "primaries", enum "range", enum "chromaloc", enum "matrix_in", enum "transfer_in", enum "primaries_in", enum "range_in", enum "chromaloc_in", string "resample_filter", float "filter_param_a", float "filter_param_b", string "resample_filter_uv", float "filter_param_a_uv", float "filter_param_b_uv", string "dither_type") z.Subresize(clip clip, int width, int height, float "shift_w", float "shift_h", float "subwidth", float "subheight", string "resample_filter", float "filter_param_a", float "filter_param_b", string "dither_type") DESCRIPTION: z.Format is a drop-in replacement for the built-in VapourSynth resize functions. It converts a clip of known or unknown format to another clip of known or unknown format, changing only the parameters specified by the user. z.Subresize provides advanced resampling capabilities intended for use by script writers. Arguments denoted as type "enum" may be specified by numerical index (see ITU-T H.265 Annex E.3) or by name. Enums specified by name have their argument name suffixed with "_s". clip: input clip The input may be of COMPAT color family (requires VS R28). width, height: output image dimensions format: output format preset id The output may be of COMPAT color family (requires VS R28). matrix, transfer, primaries: output colorspace specification If not provided, the corresponding attribute from the input clip will be selected, except for YCoCg and RGB color families, where the corresponding matrix is set by default. range: output pixel range For integer formats, this allows selection of the legal code values. Even when set, out of range values (BTB/WTW) may be generated. If the input format is of a different color family, the default range is studio/limited for YUV and full-range for RGB. chromaloc: output chroma location For subsampled formats, specifies the chroma location. If the input format is 4:4:4 or RGB and the output is subsampled, the default location is left-aligned, as per MPEG. matrix_in, transfer_in, primaries_in, range_in, chromaloc_in: input colorspace/format specification If the corresponding frame property is set to a value other than unspecified, the frame property is used instead of this parameter. Default values are set for certain color families. resample_filter, filter_param_a, filter_param_b: scaling method for RGB and Y-channel For the bicubic filter, filter_param_a/b represent the "b" and "c" parameters. For the lanczos filter, filter_param_a represents the number of taps. resample_filter_uv, resample_filter_uv_a, resample_filter_uv_b: scaling method for UV channels dither_type: dithering method Dithering is used only for conversions resulting in an integer format. shift_w, shift_h: offset of image top-left corner The top-left image corner is assumed to be at coordinate (0, 0) and the first sample centered at coordinate (0.5, 0.5). An offset may be applied to the assumed image origin to "shift" the image. subwidth, subheight: fractional dimensions of input image The input image is assumed to span from its origin a distance equal to its dimensions in pixels. An alternative image resolution may be specified. The following tables list values of selected colorspace enumerations and their abbreviated names. For all possible values, see ITU-T H.265. Matrix coefficients (ITU-T H.265 Table E.5): rgb Identity The identity matrix. Typically used for GBR (often referred to as RGB); however, may also be used for YZX (often referred to as XYZ); 709 KR = 0.2126; KB = 0.0722 ITU-R Rec. BT.709-5 unspec Unspecified Image characteristics are unknown or are determined by the application. 470bg KR = 0.299; KB = 0.114 ITU-R Rec. BT.470-6 System B, G (historical) (functionally the same as the value 6 (170m)) 170m KR = 0.299; KB = 0.114 SMPTE 170M (2004) (functionally the same as the value 5 (470bg)) ycgco YCgCo 2020ncl KR = 0.2627; KB = 0.0593 Rec. ITU-R BT.2020 non-constant luminance system 2020cl KR = 0.2627; KB = 0.0593 Rec. ITU-R BT.2020 constant luminance system Transfer characteristics (ITU-T H.265 Table E.4): 709 V = a * Lc0.45 - ( a - 1 ) for 1 >= Lc >= b V = 4.500 * Lc for b > Lc >= 0 Rec. ITU-R BT.709-5 (functionally the same as the values 6 (601), 14 (2020_10) and 15 (2020_12)) unspec Unspecified Image characteristics are unknown or are determined by the application. 601 V = a * Lc0.45 - ( a - 1 ) for 1 >= Lc >= b V = 4.500 * Lc for b > Lc >= 0 Rec. ITU-R BT.601-6 525 or 625 (functionally the same as the values 1 (709), 14 (2020_10) and 15 (2020_12)) linear V = Lc for all values of Lc Linear transfer characteristics 2020_10 V = a * Lc0.45 - ( a - 1 ) for 1 >= Lc >= b V = 4.500 * Lc for b > Lc >= 0 Rec. ITU-R BT.2020 (functionally the same as the values 1 (709), 6 (601) and 15 (2020_12)) 2020_12 V = a * Lc0.45 - ( a - 1 ) for 1 >= Lc >= b V = 4.500 * Lc for b > Lc >= 0 Rec. ITU-R BT.2020 (functionally the same as the values 1 (709), 6 (601) and 14 (2020_10)) Color primaries (ITU-T H.265 Table E.3): 709 primary x y green 0.300 0.600 blue 0.150 0.060 red 0.640 0.330 white D65 0.3127 0.3290 Rec. ITU-R BT.709-5 unspec Unspecified Image characteristics are unknown or are determined by the application. 170m primary x y green 0.310 0.595 blue 0.155 0.070 red 0.630 0.340 white D65 0.3127 0.3290 SMPTE 170M (2004) (functionally the same as the value 7 (240m)) 240m primary x y green 0.310 0.595 blue 0.155 0.070 red 0.630 0.340 white D65 0.3127 0.3290 SMPTE 240M (1999) (functionally the same as the value 6 (170m)) 2020 primary x y green 0.170 0.797 blue 0.131 0.046 red 0.708 0.292 white D65 0.3127 0.3290 Rec. ITU-R BT.2020 Pixel range (ITU-T H.265 Eq E-4 to E-15): limited Y = Clip1Y( Round( ( 1 << ( BitDepthY - 8 ) ) * ( 219 * E'Y + 16 ) ) ) Cb = Clip1C( Round( ( 1 << ( BitDepthC - 8 ) ) * ( 224 * E'PB + 128 ) ) ) Cr = Clip1C( Round( ( 1 << ( BitDepthC - 8 ) ) * ( 224 * E'PR + 128 ) ) ) R = Clip1Y( ( 1 << ( BitDepthY - 8 ) ) * ( 219 * E'R + 16 ) ) G = Clip1Y( ( 1 << ( BitDepthY - 8 ) ) * ( 219 * E'G + 16 ) ) B = Clip1Y( ( 1 << ( BitDepthY - 8 ) ) * ( 219 * E'B + 16 ) ) full Y = Clip1Y( Round( ( ( 1 << BitDepthY ) - 1 ) * E'Y ) ) Cb = Clip1C( Round( ( ( 1 << BitDepthC ) - 1 ) * E'PB + ( 1 << ( BitDepthC - 1 ) ) ) ) Cr = Clip1C( Round( ( ( 1 << BitDepthC ) - 1 ) * E'PR + ( 1 << ( BitDepthC - 1 ) ) ) ) R = Clip1Y( ( ( 1 << BitDepthY ) - 1 ) * E'R ) G = Clip1Y( ( ( 1 << BitDepthY ) - 1 ) * E'G ) B = Clip1Y( ( ( 1 << BitDepthY ) - 1 ) * E'B ) Chroma location (ITU-T H.265 Figure E.1): left center top_left top bottom_left bottom The following scaling methods are available: point, bilinear, bicubic, spline16, spline36, lanczos The following dithering methods are available: none, ordered, random, error_diffusion Last edited by Stephen R. Savage; 7th December 2015 at 07:55. Reason: v2.0.2 |
2nd November 2014, 17:57 | #6 | Link | |
Registered User
Join Date: Mar 2005
Posts: 129
|
Quote:
An alternative filter to vf_scale might be accepted upstream, since it has a C API (although they might be all NIH and butthurt about it), but swapping the actual use in the ffmpeg cli fully is pretty much a "never going to ever be accepted" scenario. |
|
27th November 2014, 12:06 | #7 | Link |
unsigned int
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
|
Since no one posted such a thing yet, here are some speed comparisons.
CPU is a mobile Core 2 Duo T5470, 1.6 GHz, no hyper-threading. Due to a lack of AVX2, F16C, and FMA, all the tests use zimg's SSE2 paths. Input is 700×480 YUV420P8, h264, 1000 frames, decoded with ffms2. Command used: Code:
vspipe test.py /dev/null --end 999 zimg version is d2e712dc54fadf45a2c55169f5a49dd74e86d62e. fmtconv version is r8. swscale is from ffmpeg 2.4.3. Note that swscale never processes more than one frame at a time, because it doesn't like multithreading (great library design). Only the input frames are maybe fetched in parallel in the 2 thread tests. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Code:
Upscaling by 2 using lanczos (700×480 -> 1400×960), 8 bit input: 1 thread: fmtconv: 31.88 fps zimg: 32.11 fps swscale: 28.93 fps 2 threads: fmtconv: 46.33 fps zimg: 45.19 fps swscale: 30.33 fps Code:
import vapoursynth as vs c = vs.get_core(threads=2) # or threads=1 src = c.ffms2.Source("700x480 YUV420P8 h264.mkv") def resize_zimg(clip): src = clip src = c.z.Depth(src, depth=16) src = c.z.Resize(src, width=2*src.width, height=2*src.height, filter="lanczos") src = c.z.Depth(src, depth=8, dither="ordered") return src def resize_fmtconv(clip): src = clip src = c.fmtc.resample(src, w=2*src.width, h=2*src.height, kernel="lanczos") src = c.fmtc.bitdepth(src, bits=8, dmode=0) return src def resize_swscale(clip): src = clip src = c.resize.Lanczos(src, width=2*src.width, height=2*src.height) return src src = resize_zimg(src) #src = resize_swscale(src) #src = resize_fmtconv(src) src.set_output() Code:
Upscaling by 2 using lanczos (700×480 -> 1400×960), 16 bit input: 1 thread: fmtconv: 40.66 fps zimg: 36.54 fps swscale: 22.89 fps 2 threads: fmtconv: 55.60 fps zimg: 50.99 fps swscale: 24.66 fps Code:
import vapoursynth as vs c = vs.get_core(threads=2) src = c.ffms2.Source("700x480 YUV420P8 h264.mkv") src = c.fmtc.bitdepth(src, bits=16) def resize_zimg(clip): src = clip src = c.z.Resize(src, width=2*src.width, height=2*src.height, filter="lanczos") return src def resize_fmtconv(clip): src = clip src = c.fmtc.resample(src, w=2*src.width, h=2*src.height, kernel="lanczos") return src def resize_swscale(clip): src = clip src = c.resize.Lanczos(src, width=2*src.width, height=2*src.height) return src src = resize_zimg(src) #src = resize_swscale(src) #src = resize_fmtconv(src) src.set_output() Code:
Conversion from YUV420P8 to RGB24: 1 thread: fmtconv: 60.58 fps zimg: 54.88 fps swscale: 59.05 fps 2 threads: fmtconv: 73.32 fps zimg: 60.79 fps swscale: 64.14 fps Code:
import vapoursynth as vs c = vs.get_core(threads=2) src = c.ffms2.Source("700x480 YUV420P8 h264.mkv") def test_zimg(clip): src = clip src = c.z.Depth(src, sample=1, depth=32) src = c.z.Resize(src, width=src.width, height=src.height, filter_uv="lanczos", subsample_w=0, subsample_h=0) src = c.z.Colorspace(src, 6, 6, 6, 0) src = c.z.Depth(src, sample=0, depth=8, dither="ordered") return src def test_swscale(clip): src = clip src = c.resize.Lanczos(src, format=vs.RGB24) return src def test_fmtconv(clip): src = clip src = c.fmtc.resample(src, kernel="lanczos", css="444") src = c.fmtc.matrix(src, mat="601", col_fam=vs.RGB) src = c.fmtc.bitdepth(src, bits=8, dmode=0) return src src = test_zimg(src) #src = test_swscale(src) #src = test_fmtconv(src) src.set_output() Code:
Conversion from YUV420P10 to RGB24: 1 thread: fmtconv: 56.96 fps zimg: 53.05 fps swscale: 56.43 fps 2 threads: fmtconv: 70.60 fps zimg: 59.14 fps swscale: 60.84 fps Code:
import vapoursynth as vs c = vs.get_core(threads=2) src = c.ffms2.Source("700x480 YUV420P8 h264.mkv") src = c.fmtc.bitdepth(src, bits=10) def test_zimg(clip): src = clip src = c.z.Depth(src, sample=1, depth=32) src = c.z.Resize(src, width=src.width, height=src.height, filter_uv="lanczos", subsample_w=0, subsample_h=0) src = c.z.Colorspace(src, 6, 6, 6, 0) src = c.z.Depth(src, sample=0, depth=8, dither="ordered") return src def test_swscale(clip): src = clip src = c.resize.Lanczos(src, format=vs.RGB24) return src def test_fmtconv(clip): src = clip src = c.fmtc.resample(src, kernel="lanczos", css="444") src = c.fmtc.matrix(src, mat="601", col_fam=vs.RGB) src = c.fmtc.bitdepth(src, bits=8, dmode=0) return src src = test_zimg(src) #src = test_swscale(src) #src = test_fmtconv(src) src.set_output() Code:
Bit depth conversion from 16 to 8 bits: 1 thread: No dithering: fmtconv: 127.38 fps zimg: 138.32 fps Ordered dithering: fmtconv: 126.02 fps zimg: 139.20 fps Floyd-Steinberg error diffusion: fmtconv: 99.35 fps zimg: 56.43 fps 2 threads: No dithering: fmtconv: 131.94 fps zimg: 134.10 fps Ordered dithering: fmtconv: 123.25 fps zimg: 128.98 fps Floyd-Steinberg error diffusion: fmtconv: 105.70 fps zimg: 69.97 fps The VapourSynth filter doesn't have any parameters for it. Code:
1 thread: swscale: 142.85 fps 2 threads: swscale: 142.04 fps Script used: Code:
import vapoursynth as vs c = vs.get_core(threads=2) src = c.ffms2.Source("700x480 YUV420P8 h264.mkv") src = c.fmtc.bitdepth(src, bits=16) def bits_zimg(clip): src = clip src = c.z.Depth(src, depth=8, dither="none") # or "ordered", or "error_diffusion" return src def bits_fmtconv(clip): src = clip src = c.fmtc.bitdepth(src, bits=8, dmode=1) # or 0 for ordered, or 6 for Floyd-Steinberg error diffusion return src def bits_swscale(clip): src = clip src = c.resize.Lanczos(src, format=vs.YUV420P8) return src src = bits_zimg(src) #src = bits_fmtconv(src) #src = bits_swscale(src) src.set_output() Code:
Bit depth conversion from 8 to 16 bits: 1 thread: fmtconv: 159.20 fps zimg: 145.33 fps swscale: 150.64 fps 2 threads: fmtconv: 148.23 fps zimg: 155.85 fps swscale: 161.81 fps Code:
import vapoursynth as vs c = vs.get_core(threads=2) src = c.ffms2.Source("700x480 YUV420P8 h264.mkv") def bits_zimg(clip): src = clip src = c.z.Depth(src, depth=16) return src def bits_fmtconv(clip): src = clip src = c.fmtc.bitdepth(src, bits=16) return src def bits_swscale(clip): src = clip src = c.resize.Lanczos(src, format=vs.YUV420P16) return src src = bits_zimg(src) #src = bits_fmtconv(src) #src = bits_swscale(src) src.set_output()
__________________
Buy me a "coffee" and/or hire me to write code! |
29th November 2014, 13:42 | #8 | Link |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,845
|
What is the pure decoding speed? Does decoding use CPU or GPU?
For such a test it's better to use uncompressed source stored on fast raid or ram disk. Thanks a lot for your time. Last edited by kolak; 29th November 2014 at 13:45. |
29th November 2014, 18:11 | #10 | Link |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,845
|
In this case your results are affected by the decoding speed and it's possible than some may be 'incorrect'.
I'm not sure why many people use heavily compressed sources to measure speed of some filters. Quite often decoding speed is lower than filter speed, so you can't reliably compare different filters speed. The best to use uncompressed source or encoded with very fast codecs. Last edited by kolak; 29th November 2014 at 18:17. |
29th November 2014, 18:36 | #11 | Link |
unsigned int
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
|
It's a realistic scenario, at least for me. My sources are always compressed.
While the numbers aren't as high as they could be (with blankclip), all three plugins were tested with the same source file, so the differences between them should be correct.
__________________
Buy me a "coffee" and/or hire me to write code! |
30th November 2014, 01:20 | #13 | Link |
Professional Code Monkey
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,588
|
Everything is done on the cpu so the decoding speed will scale as well. I say the test is correct.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet |
30th November 2014, 02:15 | #14 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
After much (?) work, I am pleased to announce the release of "zlib" v1.0 FINAL. The library and VapourSynth example are linked in the frist post. Work is underway to improve the multithreaded scalability of the software, making it a better fit for playback scenarios
Last edited by Stephen R. Savage; 30th November 2014 at 02:17. |
30th November 2014, 15:53 | #15 | Link | |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,845
|
Quote:
Filter A speed (from uncompressed source) = 200fps Filter B speed (from uncompressed source) = 100fps Decoding of compressed source= 100fps What will be processing speed for A,B filters with compressed source? Will filter A give faster processing or both cases give about the same speed? Even if A will give faster speed, it won't be possible to tell that A is 2x faster than B. I understand that this was rather real scenario case and not speed banchmark for each filter itself. |
|
30th November 2014, 16:02 | #16 | Link | |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,845
|
Quote:
Could you add some noise generator? I found that Floyd Steinberg dithering with a tiny amount of noise gives very good results. This small noise helps to cover contouring left after dithering. Another thing is Filter Light algorithm, which is close to Floyd, but apparently waay faster. Last edited by kolak; 30th November 2014 at 16:05. |
|
7th December 2014, 15:49 | #19 | Link |
Registered User
Join Date: Apr 2014
Location: France
Posts: 33
|
Question of the day :
What you guys have with cats ? Answer of the day : I tried the filter on my Odroid-U2 (unlike fmtconv, it doesn't throw me SSE2 errors) for resizing/cropping a 1920x1080 Blu-Ray to 1280x536. Swscale is 2x as fast as zlib (8.79fps against 4.36fps). It's probably caused by the lack of NEON optimizations. |
1st April 2015, 14:25 | #20 | Link |
Registered User
Join Date: Feb 2002
Posts: 26
|
Downscaling crashing
Hi,
I am very new to Vapoursynth. Just beginning to test it and its' potential in hopes of moving on from Avisynth. I'm getting script evaluations ok using VapourSynth Editor but the preview and VirtualDub crash when I try to actually use the script. I'm just trying a simple downsize. Any number will crash, even divide by 2 (/ or // operators, actually using / the script does not evaluate). Here is my script bit: Code:
import vapoursynth as vs core = vs.get_core() def downscale_zimg(clip): clip = core.z.Depth(clip, depth=16) clip = core.z.Resize(clip, width=clip.width//2, height=clip.height//2, filter="bicubic") clip = core.z.Depth(clip, depth=8, dither="ordered") return clip source = core.ffms2.Source(source='c:/temp/test.ts', fpsnum=25, fpsden=1) source = downscale_zimg(source) source.set_output() Am I missing something or is this a bug? Using the native resize does not crash on downscale. BR, -J Last edited by jammupatu; 1st April 2015 at 14:27. Reason: Added clarifications. |
Thread Tools | Search this Thread |
Display Modes | |
|
|