Scaling, colorspace conversion, and dithering library [Archive]

Stephen R. Savage

1st November 2014, 01:45

Some folks have expressed interest in having a program library to perform the common tasks of manipulating resolution, colorspace, and depth. The "z" library (name subject to change) provides a simple interface to perform those tasks. An open source Vapoursynth plugin is provided, demonstrating the usage of the API.

NAME:
z - z.lib for VapourSynth

SYNOPSIS:
z.Format(clip clip,
int "width",
int "height",
int "format",
enum "matrix",
enum "transfer",
enum "primaries",
enum "range",
enum "chromaloc",
enum "matrix_in",
enum "transfer_in",
enum "primaries_in",
enum "range_in",
enum "chromaloc_in",
string "resample_filter",
float "filter_param_a",
float "filter_param_b",
string "resample_filter_uv",
float "filter_param_a_uv",
float "filter_param_b_uv",
string "dither_type")

z.Subresize(clip clip,
int width,
int height,
float "shift_w",
float "shift_h",
float "subwidth",
float "subheight",
string "resample_filter",
float "filter_param_a",
float "filter_param_b",
string "dither_type")

DESCRIPTION:
z.Format is a drop-in replacement for the built-in VapourSynth resize
functions. It converts a clip of known or unknown format to another clip
of known or unknown format, changing only the parameters specified by the
user. z.Subresize provides advanced resampling capabilities intended for
use by script writers.

Arguments denoted as type "enum" may be specified by numerical index
(see ITU-T H.265 Annex E.3) or by name. Enums specified by name have their
argument name suffixed with "_s".

clip: input clip
The input may be of COMPAT color family (requires VS R28).

width,
height: output image dimensions

format: output format preset id
The output may be of COMPAT color family (requires VS R28).

matrix,
transfer,
primaries: output colorspace specification
If not provided, the corresponding attribute from the input clip will
be selected, except for YCoCg and RGB color families, where the
corresponding matrix is set by default.

range: output pixel range
For integer formats, this allows selection of the legal code values.
Even when set, out of range values (BTB/WTW) may be generated. If the
input format is of a different color family, the default range is
studio/limited for YUV and full-range for RGB.

chromaloc: output chroma location
For subsampled formats, specifies the chroma location. If the input
format is 4:4:4 or RGB and the output is subsampled, the default
location is left-aligned, as per MPEG.

matrix_in,
transfer_in,
primaries_in,
range_in,
chromaloc_in: input colorspace/format specification
If the corresponding frame property is set to a value other than
unspecified, the frame property is used instead of this parameter.
Default values are set for certain color families.

resample_filter,
filter_param_a,
filter_param_b: scaling method for RGB and Y-channel
For the bicubic filter, filter_param_a/b represent the "b" and "c"
parameters. For the lanczos filter, filter_param_a represents the
number of taps.

resample_filter_uv,
resample_filter_uv_a,
resample_filter_uv_b: scaling method for UV channels

dither_type: dithering method
Dithering is used only for conversions resulting in an integer format.

shift_w,
shift_h: offset of image top-left corner
The top-left image corner is assumed to be at coordinate (0, 0) and
the first sample centered at coordinate (0.5, 0.5). An offset may be
applied to the assumed image origin to "shift" the image.

subwidth,
subheight: fractional dimensions of input image
The input image is assumed to span from its origin a distance equal to
its dimensions in pixels. An alternative image resolution may be
specified.

The following tables list values of selected colorspace enumerations and
their abbreviated names. For all possible values, see ITU-T H.265.
Matrix coefficients (ITU-T H.265 Table E.5):
rgb Identity
The identity matrix.
Typically used for GBR (often referred to as RGB);
however, may also be used for YZX (often referred to as
XYZ);
709 KR = 0.2126; KB = 0.0722
ITU-R Rec. BT.709-5
unspec Unspecified
Image characteristics are unknown or are determined by the
application.
470bg KR = 0.299; KB = 0.114
ITU-R Rec. BT.470-6 System B, G (historical)
(functionally the same as the value 6 (170m))
170m KR = 0.299; KB = 0.114
SMPTE 170M (2004)
(functionally the same as the value 5 (470bg))
ycgco YCgCo
2020ncl KR = 0.2627; KB = 0.0593
Rec. ITU-R BT.2020 non-constant luminance system
2020cl KR = 0.2627; KB = 0.0593
Rec. ITU-R BT.2020 constant luminance system

Transfer characteristics (ITU-T H.265 Table E.4):
709 V = a * Lc0.45 - ( a - 1 ) for 1 >= Lc >= b
V = 4.500 * Lc for b > Lc >= 0
Rec. ITU-R BT.709-5
(functionally the same as the values 6 (601),
14 (2020_10) and 15 (2020_12))
unspec Unspecified
Image characteristics are unknown or are determined by the
application.
601 V = a * Lc0.45 - ( a - 1 ) for 1 >= Lc >= b
V = 4.500 * Lc for b > Lc >= 0
Rec. ITU-R BT.601-6 525 or 625
(functionally the same as the values 1 (709),
14 (2020_10) and 15 (2020_12))
linear V = Lc for all values of Lc
Linear transfer characteristics
2020_10 V = a * Lc0.45 - ( a - 1 ) for 1 >= Lc >= b
V = 4.500 * Lc for b > Lc >= 0
Rec. ITU-R BT.2020
(functionally the same as the values 1 (709),
6 (601) and 15 (2020_12))
2020_12 V = a * Lc0.45 - ( a - 1 ) for 1 >= Lc >= b
V = 4.500 * Lc for b > Lc >= 0
Rec. ITU-R BT.2020
(functionally the same as the values 1 (709),
6 (601) and 14 (2020_10))

Color primaries (ITU-T H.265 Table E.3):
709 primary x y
green 0.300 0.600
blue 0.150 0.060
red 0.640 0.330
white D65 0.3127 0.3290
Rec. ITU-R BT.709-5
unspec Unspecified
Image characteristics are unknown or are determined by the
application.
170m primary x y
green 0.310 0.595
blue 0.155 0.070
red 0.630 0.340
white D65 0.3127 0.3290
SMPTE 170M (2004)
(functionally the same as the value 7 (240m))
240m primary x y
green 0.310 0.595
blue 0.155 0.070
red 0.630 0.340
white D65 0.3127 0.3290
SMPTE 240M (1999)
(functionally the same as the value 6 (170m))
2020 primary x y
green 0.170 0.797
blue 0.131 0.046
red 0.708 0.292
white D65 0.3127 0.3290
Rec. ITU-R BT.2020

Pixel range (ITU-T H.265 Eq E-4 to E-15):
limited Y = Clip1Y( Round( ( 1 << ( BitDepthY - 8 ) ) *
( 219 * E'Y + 16 ) ) )
Cb = Clip1C( Round( ( 1 << ( BitDepthC - 8 ) ) *
( 224 * E'PB + 128 ) ) )
Cr = Clip1C( Round( ( 1 << ( BitDepthC - 8 ) ) *
( 224 * E'PR + 128 ) ) )

R = Clip1Y( ( 1 << ( BitDepthY - 8 ) ) *
( 219 * E'R + 16 ) )
G = Clip1Y( ( 1 << ( BitDepthY - 8 ) ) *
( 219 * E'G + 16 ) )
B = Clip1Y( ( 1 << ( BitDepthY - 8 ) ) *
( 219 * E'B + 16 ) )
full Y = Clip1Y( Round( ( ( 1 << BitDepthY ) - 1 ) * E'Y ) )
Cb = Clip1C( Round( ( ( 1 << BitDepthC ) - 1 ) * E'PB +
( 1 << ( BitDepthC - 1 ) ) ) )
Cr = Clip1C( Round( ( ( 1 << BitDepthC ) - 1 ) * E'PR +
( 1 << ( BitDepthC - 1 ) ) ) )

R = Clip1Y( ( ( 1 << BitDepthY ) - 1 ) * E'R )
G = Clip1Y( ( ( 1 << BitDepthY ) - 1 ) * E'G )
B = Clip1Y( ( ( 1 << BitDepthY ) - 1 ) * E'B )

Chroma location (ITU-T H.265 Figure E.1):
left
center
top_left
top
bottom_left
bottom

The following scaling methods are available:
point, bilinear, bicubic, spline16, spline36, lanczos
The following dithering methods are available:
none, ordered, random, error_diffusion

Release 2.0.2: Download link (https://github.com/sekrit-twc/zimg/releases/download/release-2.0.2/z-2.0.2.7z)

kolak

1st November 2014, 14:25

How this is different than fmtconv, which I found very good?

kolak

1st November 2014, 16:51

Is library fully free? Is it cross-platform?

What error diffusion method is implemented? Floyd-Steinberg?

kolak

2nd November 2014, 13:45

Can it be implemented to ffmpeg?
Ffmpeg current 'conversions' are not the best.

Stephen R. Savage

2nd November 2014, 17:34

Can it be implemented to ffmpeg?
Ffmpeg current 'conversions' are not the best.

The API header is included in the ZIP package, so surely someone more experienced with ffmpeg than me could determine if this is the case.

Daemon404

2nd November 2014, 17:57

The API header is included in the ZIP package, so surely someone more experienced with ffmpeg than me could determine if this is the case.

Would be pretty simple to hook up as a filter in libavfilter (which itself is pretty gross), and with and with the right filter negotiation could perhaps take the place of vf_scale. There are a bazillion places where swscale is used though that would be slightly uglier.

An alternative filter to vf_scale might be accepted upstream, since it has a C API (although they might be all NIH and butthurt about it), but swapping the actual use in the ffmpeg cli fully is pretty much a "never going to ever be accepted" scenario.

jackoneill

27th November 2014, 12:06

Since no one posted such a thing yet, here are some speed comparisons.

CPU is a mobile Core 2 Duo T5470, 1.6 GHz, no hyper-threading.
Due to a lack of AVX2, F16C, and FMA, all the tests use zimg's SSE2 paths.

Input is 700×480 YUV420P8, h264, 1000 frames, decoded with ffms2.

Command used:

vspipe test.py /dev/null --end 999

with an additional "--requests 1" for the 1 thread tests.

zimg version is d2e712dc54fadf45a2c55169f5a49dd74e86d62e.
fmtconv version is r8.
swscale is from ffmpeg 2.4.3.

Note that swscale never processes more than one frame at a time, because
it doesn't like multithreading (great library design). Only the input
frames are maybe fetched in parallel in the 2 thread tests.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Upscaling by 2 using lanczos (700×480 -> 1400×960), 8 bit input:
1 thread:
fmtconv: 31.88 fps
zimg: 32.11 fps
swscale: 28.93 fps

2 threads:
fmtconv: 46.33 fps
zimg: 45.19 fps
swscale: 30.33 fps

Script used:

import vapoursynth as vs

c = vs.get_core(threads=2) # or threads=1

src = c.ffms2.Source("700x480 YUV420P8 h264.mkv")

def resize_zimg(clip):
src = clip
src = c.z.Depth(src, depth=16)
src = c.z.Resize(src, width=2*src.width, height=2*src.height, filter="lanczos")
src = c.z.Depth(src, depth=8, dither="ordered")
return src

def resize_fmtconv(clip):
src = clip
src = c.fmtc.resample(src, w=2*src.width, h=2*src.height, kernel="lanczos")
src = c.fmtc.bitdepth(src, bits=8, dmode=0)
return src

def resize_swscale(clip):
src = clip
src = c.resize.Lanczos(src, width=2*src.width, height=2*src.height)
return src

src = resize_zimg(src)
#src = resize_swscale(src)
#src = resize_fmtconv(src)

src.set_output()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Upscaling by 2 using lanczos (700×480 -> 1400×960), 16 bit input:
1 thread:
fmtconv: 40.66 fps
zimg: 36.54 fps
swscale: 22.89 fps

2 threads:
fmtconv: 55.60 fps
zimg: 50.99 fps
swscale: 24.66 fps

Script used:

import vapoursynth as vs

c = vs.get_core(threads=2)

src = c.ffms2.Source("700x480 YUV420P8 h264.mkv")
src = c.fmtc.bitdepth(src, bits=16)

def resize_zimg(clip):
src = clip
src = c.z.Resize(src, width=2*src.width, height=2*src.height, filter="lanczos")
return src

def resize_fmtconv(clip):
src = clip
src = c.fmtc.resample(src, w=2*src.width, h=2*src.height, kernel="lanczos")
return src

def resize_swscale(clip):
src = clip
src = c.resize.Lanczos(src, width=2*src.width, height=2*src.height)
return src

src = resize_zimg(src)
#src = resize_swscale(src)
#src = resize_fmtconv(src)

src.set_output()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Conversion from YUV420P8 to RGB24:
1 thread:
fmtconv: 60.58 fps
zimg: 54.88 fps
swscale: 59.05 fps

2 threads:
fmtconv: 73.32 fps
zimg: 60.79 fps
swscale: 64.14 fps

Script used:

import vapoursynth as vs

c = vs.get_core(threads=2)

src = c.ffms2.Source("700x480 YUV420P8 h264.mkv")

def test_zimg(clip):
src = clip
src = c.z.Depth(src, sample=1, depth=32)
src = c.z.Resize(src, width=src.width, height=src.height, filter_uv="lanczos", subsample_w=0, subsample_h=0)
src = c.z.Colorspace(src, 6, 6, 6, 0)
src = c.z.Depth(src, sample=0, depth=8, dither="ordered")
return src

def test_swscale(clip):
src = clip
src = c.resize.Lanczos(src, format=vs.RGB24)
return src

def test_fmtconv(clip):
src = clip
src = c.fmtc.resample(src, kernel="lanczos", css="444")
src = c.fmtc.matrix(src, mat="601", col_fam=vs.RGB)
src = c.fmtc.bitdepth(src, bits=8, dmode=0)
return src

src = test_zimg(src)
#src = test_swscale(src)
#src = test_fmtconv(src)

src.set_output()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Conversion from YUV420P10 to RGB24:
1 thread:
fmtconv: 56.96 fps
zimg: 53.05 fps
swscale: 56.43 fps

2 threads:
fmtconv: 70.60 fps
zimg: 59.14 fps
swscale: 60.84 fps

Script used:

import vapoursynth as vs

c = vs.get_core(threads=2)

src = c.ffms2.Source("700x480 YUV420P8 h264.mkv")
src = c.fmtc.bitdepth(src, bits=10)

def test_zimg(clip):
src = clip
src = c.z.Depth(src, sample=1, depth=32)
src = c.z.Resize(src, width=src.width, height=src.height, filter_uv="lanczos", subsample_w=0, subsample_h=0)
src = c.z.Colorspace(src, 6, 6, 6, 0)
src = c.z.Depth(src, sample=0, depth=8, dither="ordered")
return src

def test_swscale(clip):
src = clip
src = c.resize.Lanczos(src, format=vs.RGB24)
return src

def test_fmtconv(clip):
src = clip
src = c.fmtc.resample(src, kernel="lanczos", css="444")
src = c.fmtc.matrix(src, mat="601", col_fam=vs.RGB)
src = c.fmtc.bitdepth(src, bits=8, dmode=0)
return src

src = test_zimg(src)
#src = test_swscale(src)
#src = test_fmtconv(src)

src.set_output()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Bit depth conversion from 16 to 8 bits:
1 thread:
No dithering:
fmtconv: 127.38 fps
zimg: 138.32 fps

Ordered dithering:
fmtconv: 126.02 fps
zimg: 139.20 fps

Floyd-Steinberg error diffusion:
fmtconv: 99.35 fps
zimg: 56.43 fps

2 threads:
No dithering:
fmtconv: 131.94 fps
zimg: 134.10 fps

Ordered dithering:
fmtconv: 123.25 fps
zimg: 128.98 fps

Floyd-Steinberg error diffusion:
fmtconv: 105.70 fps
zimg: 69.97 fps

I have no clue what sort of dithering swscale uses, if any.
The VapourSynth filter doesn't have any parameters for it.

1 thread:
swscale: 142.85 fps

2 threads:
swscale: 142.04 fps

For these tests I used 2000 frames instead of 1000.

Script used:

import vapoursynth as vs

c = vs.get_core(threads=2)

src = c.ffms2.Source("700x480 YUV420P8 h264.mkv")
src = c.fmtc.bitdepth(src, bits=16)

def bits_zimg(clip):
src = clip
src = c.z.Depth(src, depth=8, dither="none") # or "ordered", or "error_diffusion"
return src

def bits_fmtconv(clip):
src = clip
src = c.fmtc.bitdepth(src, bits=8, dmode=1) # or 0 for ordered, or 6 for Floyd-Steinberg error diffusion
return src

def bits_swscale(clip):
src = clip
src = c.resize.Lanczos(src, format=vs.YUV420P8)
return src

src = bits_zimg(src)
#src = bits_fmtconv(src)
#src = bits_swscale(src)

src.set_output()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Bit depth conversion from 8 to 16 bits:
1 thread:
fmtconv: 159.20 fps
zimg: 145.33 fps
swscale: 150.64 fps

2 threads:
fmtconv: 148.23 fps
zimg: 155.85 fps
swscale: 161.81 fps

Script used:

import vapoursynth as vs

c = vs.get_core(threads=2)

src = c.ffms2.Source("700x480 YUV420P8 h264.mkv")

def bits_zimg(clip):
src = clip
src = c.z.Depth(src, depth=16)
return src

def bits_fmtconv(clip):
src = clip
src = c.fmtc.bitdepth(src, bits=16)
return src

def bits_swscale(clip):
src = clip
src = c.resize.Lanczos(src, format=vs.YUV420P16)
return src

src = bits_zimg(src)
#src = bits_fmtconv(src)
#src = bits_swscale(src)

src.set_output()

kolak

29th November 2014, 13:42

What is the pure decoding speed? Does decoding use CPU or GPU?
For such a test it's better to use uncompressed source stored on fast raid or ram disk.

Thanks a lot for your time.

jackoneill

29th November 2014, 14:31

What is the pure decoding speed? Does decoding use CPU or GPU?
For such a test it's better to use uncompressed source stored on fast raid or ram disk.

Thanks a lot for your time.

Just decoding the file is 200 fps. It uses the CPU.

kolak

29th November 2014, 18:11

In this case your results are affected by the decoding speed and it's possible than some may be 'incorrect'.

I'm not sure why many people use heavily compressed sources to measure speed of some filters. Quite often decoding speed is lower than filter speed, so you can't reliably compare different filters speed.
The best to use uncompressed source or encoded with very fast codecs.

jackoneill

29th November 2014, 18:36

It's a realistic scenario, at least for me. My sources are always compressed.

While the numbers aren't as high as they could be (with blankclip), all three plugins were tested with the same source file, so the differences between them should be correct.

kolak

29th November 2014, 20:53

They may be correct in this case, but you have to be careful.

If decoding speed would be 100fps than your results could suggest that all filters are as fast, which may be not the case at all.

Myrsloik

30th November 2014, 01:20

They may be correct in this case, but you have to be careful.

If decoding speed would be 100fps than your results could suggest that all filters are as fast, which may be not the case at all.

Everything is done on the cpu so the decoding speed will scale as well. I say the test is correct.

Stephen R. Savage

30th November 2014, 02:15

After much (?) work, I am pleased to announce the release of "zlib" v1.0 FINAL. The library and VapourSynth example are linked in the frist post (http://forum.doom9.org/showthread.php?p=1698352#post1698352). Work is underway to improve the multithreaded scalability of the software, making it a better fit for playback scenarios

kolak

30th November 2014, 15:53

Everything is done on the cpu so the decoding speed will scale as well. I say the test is correct.

Example:

Filter A speed (from uncompressed source) = 200fps
Filter B speed (from uncompressed source) = 100fps

Decoding of compressed source= 100fps

What will be processing speed for A,B filters with compressed source?
Will filter A give faster processing or both cases give about the same speed? Even if A will give faster speed, it won't be possible to tell that A is 2x faster than B.

I understand that this was rather real scenario case and not speed banchmark for each filter itself.

kolak

30th November 2014, 16:02

After much (?) work, I am pleased to announce the release of "zlib" v1.0 FINAL. The library and VapourSynth example are linked in the frist post (http://forum.doom9.org/showthread.php?p=1698352#post1698352). Work is underway to improve the multithreaded scalability of the software, making it a better fit for playback scenarios

Great, thank you for your work.

Could you add some noise generator?
I found that Floyd Steinberg dithering with a tiny amount of noise gives very good results. This small noise helps to cover contouring left after dithering.
Another thing is Filter Light algorithm, which is close to Floyd, but apparently waay faster.

mandarinka

1st December 2014, 14:55

Wouldn't that be a step back (removal of error-diffusion dithering)?

kolak

1st December 2014, 19:41

No, Floyd method is staying.
Filter Light would be an additional option. It's very close to Floyd, but can be waaay faster. At leas this is what I have read.

YamashitaRen

7th December 2014, 15:49

Question of the day :
What you guys have with cats ? :p

Answer of the day :
I tried the filter on my Odroid-U2 (unlike fmtconv, it doesn't throw me SSE2 errors) for resizing/cropping a 1920x1080 Blu-Ray to 1280x536.
Swscale is 2x as fast as zlib (8.79fps against 4.36fps). It's probably caused by the lack of NEON optimizations.

jammupatu

1st April 2015, 14:25

Hi,

I am very new to Vapoursynth. Just beginning to test it and its' potential in hopes of moving on from Avisynth.

I'm getting script evaluations ok using VapourSynth Editor but the preview and VirtualDub crash when I try to actually use the script. I'm just trying a simple downsize. Any number will crash, even divide by 2 (/ or // operators, actually using / the script does not evaluate). Here is my script bit:

import vapoursynth as vs
core = vs.get_core()

def downscale_zimg(clip):
clip = core.z.Depth(clip, depth=16)
clip = core.z.Resize(clip, width=clip.width//2, height=clip.height//2, filter="bicubic")
clip = core.z.Depth(clip, depth=8, dither="ordered")
return clip

source = core.ffms2.Source(source='c:/temp/test.ts', fpsnum=25, fpsden=1)
source = downscale_zimg(source)
source.set_output()

Upscaling works fine. I'm running Win 8.1.1 64b and all things 64bit; VapourSynth, Zimg, Vdub, the editor, Python 3.4.3. All components should be the latest; Zimg 1.1, VS 26 etc.

Am I missing something or is this a bug? Using the native resize does not crash on downscale.

BR,

-J

jammupatu

14th April 2015, 18:31

:thanks: Works now!

BR,

-J

mawen1250

16th April 2015, 15:39

test script

src = core.std.BlankClip(width=120, height=120, format=vs.YUV444P8, color=[0,0,0])

dst = core.z.Depth(src, dither='ordered', depth=16, fullrange_in=True, fullrange_out=True)

dst = core.std.ShufflePlanes([dst, dst, dst], [0,1,2], vs.RGB)
dst = core.std.Expr(dst, 'x', vs.RGB48)
dst_stack = core.fmtc.nativetostack16(dst)

dst_stack.set_output()
I use full range YUV444 as input, to test both luma and chroma.
The result YUV444Px is copied to RGB48 planes to avoid YUV->RGB conversion.
Finally convert native-16bit to stacked-16bit format to better visualize the MSB/LSB in RGB24 image.

ordered
http://i683.photobucket.com/albums/vv197/mawen1250/YUV%20000%20full_range%208in16out%20ditherordered_zpsqddlfrj8.png

none or random
http://i683.photobucket.com/albums/vv197/mawen1250/YUV%20000%20full_range%208in16out%20dithernonerandom_zps9nywwylx.png

error_diffusion
http://i683.photobucket.com/albums/vv197/mawen1250/YUV%20000%20full_range%208in16out%20dithererror_diffusion_zpsfi6wxq9x.png

1. There're underflow issues. For example, 0 in 8bit chroma converted to 16bit is -128, it should be cliped to 0, but the result is 65408.
2. There're also overflow issues with dither enabled. For example, 65535 in 16bit chroma converted to 15bit is 32767.25, then the resulting dithering pattern is mixed with 32767 and 32768, exceeding valid range [0,32767].
3. The ordered dither is always applied even if no range conversion is needed, and the dithering amplitude is always 1 in 8-bit scale no matter what the output depth is. I wonder if it is designed to be this or not.
4. In the above examples, the right most part is not affected, but in some other tests this doesn't happen. I'm not very sure in which conditions will this happen.

mawen1250

17th April 2015, 17:58

Thanks for your reply!
This build fixes the magnitude of ordered dither (to 0.5), which solved lots of problems. The underflow issues is solved, but the overflow issues for 9-15bit still exists.

2. Not only full range YUV can produce this kind of overflow issue, but also limited range RGB/YUV with out of range value, such as reducing 65535 to 9-15bit with dithering, or converting 65535 from limited range 16bit to full range 9-15bit. Thus, IMO this may lead to potential problems in practice, since we may not guarantee the input image is perfect. Acctually converting to 9-15bit is not so often used except in the case of final output, so I prefer safe output rather than performance or intermediate precision. Perhaps an additional option for clamping the result to valid range can be added? Also it can be used for limiting the value to limited range when fullrange_out=False.

3. My mistake, the ordered dither does affect only the least significant bit of the output, maybe I was misled by the image with underflow issues. In the previous build, when the output depth & range are the same as input, the ordered dithering pattern still applies to the image. After fixing the magnitude issue of ordered dither it's also solved. I suppose it will be faster to directly return the src frame pointer on this condition, since the frame data is always unchanged.

4. Yes, I got it.

mawen1250

18th April 2015, 21:08

I see. This will make this filter clean and clear. On the other side, the users need to be aware of what they are doing and take more care of such risks on these special conditions.
Anyway, this is a great library and thanks for your efforts!

foxyshadis

25th April 2015, 07:52

The download link has an underscore when it should be a dash.

mawen1250

7th June 2015, 03:39

# YUV420P8 input
src = core.z.Depth(src, depth=16, fullrange_in=False)
src = core.z.Resize(src, src.width, src.height, filter_uv="bicubic", filter_param_a_uv=1/3, filter_param_b_uv=1/3, subsample_w=0, subsample_h=0)
src = core.z.Colorspace(src, matrix_in=1, transfer_in=1, primaries_in=1, matrix_out=0, transfer_out=1, primaries_out=1)
# RGB48 output
When I run this script, there's a filter error from z.Colorspace.
Error getting the frame number 27800:
unsupported pixel type

EDIT: Oops, I forgot z.Colorspace only accepts float input...

mawen1250

15th August 2015, 13:24

zimg-1.1.1 crashes VS for unknown reason, after running a script for some time(about 70-100s in my tests).

Windows 7 x64
VapourSynth R27 x64
threads=8

I just found that using BM3D built with MSVC14 also introduces the same problem (64bit crashes, 32bit doesn't in my test).
Considering zimg-1.1.1 is also built with MSVC14, could this be some problems related to VapourSynth and MSVC14?

feisty2

15th August 2015, 13:37

zimg-1.1.1 crashes VS for unknown reason, after running a script for some time(about 70-100s in my tests).

Windows 7 x64
VapourSynth R27 x64
threads=8

I just found that using BM3D built with MSVC14 also introduces the same problem (64bit crashes, 32bit doesn't in my test).
Considering zimg-1.1.1 is also built with MSVC14, could this be some problems related to VapourSynth and MSVC14?

http://forum.doom9.org/showthread.php?p=1727527#post1727527
like this?
80% sure it's a vs2015 issue

mawen1250

16th August 2015, 06:51

This build works fine.

mawen1250

8th October 2015, 16:26

two issues about 1.95 Beta:
1. If the clip is first cropped with std.CropAbs/std.CropRel and the width is not mod16, right-most pixels are corrupt.
2. no arguments for shift_w, shift_h, subwidth, subheight

Stephen R. Savage

8th October 2015, 18:14

two issues about 1.95 Beta:
1. If the clip is first cropped with std.CropAbs/std.CropRel and the width is not mod16, right-most pixels are corrupt.
2. no arguments for shift_w, shift_h, subwidth, subheight

1. I found an issue in the border handling of BYTE dither. It will be fixed in the next prerelease.
2. Is there a use case for this that's not covered by specifying chromaloc?

mawen1250

9th October 2015, 04:50

2. Is there a use case for this that's not covered by specifying chromaloc?
1. as a post-resampler for non-center aligned resampler such as nnedi/eedi, this is commonly needed in AA/scaling scripts using edi
2. to do top-left aligned resampling, for example mv.Super and warp.AWarp may need it
3. any other time you want to do a (sub-pixel) cropping/padding

kolak

24th October 2015, 11:49

Looks like around 32 threads 'engine' gets saturated in terms of threading.
How does the threading work- some slices? Does it mean higher resolution will scale better?

Latest ffmpeg added zscale filter, which I think is great news!

kolak

24th October 2015, 22:50

I meant: performance doesn't scale linear with cores. Provided graphs show big speed/core performance drop with bigger core numbers. Speed still rises, but we are wasting many cores and CPU power to get eg. just 20% speedup.

I think my question is- does z library engine scales linearly with core numbers? If we would be able to deliver raw video data at unlimited speed and just measure z performace, would processing speed be linear with cores rise?

kolak

25th October 2015, 09:37

If 48 threads gives 21x speed than this is very good, but I can't see this on these graphs.

This information is enough. Sometimes adding hyperthreading to such a test can be bit missleading. In this case we gain close to 20% from 'fake' threads, which is very good result.
I should look closer at PC descriptions :)
Grapghs should always have units :)
I think z will be great addition to existing libraries, specially in ffmpeg.

I would like to see noise generator and maybe Sierra algorithm, which seams to be as good as Floyd, but can be way faster.
Floyd+tiny amount of noise gives amazing results for high quality video masters.

TheFluff

25th October 2015, 19:36

You need to quote every post he makes, otherwise he deletes them afterwards and it looks like you're talking to yourself.

mawen1250

27th October 2015, 17:30

I found that vszimg 2.0 will crash, for example, when resizing 1920x1080 Gray8 to 1080x1080 Gray8.

Blue_MiSfit

9th February 2016, 00:08

Is this library useful for converting sRGB stored in full-range yuv420(for example in JPEG image sequences) into full-range bt709?