Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
16th April 2015, 15:39 | #22 | Link |
Registered User
Join Date: Aug 2011
Posts: 103
|
Here're some issues I found
test script
Code:
src = core.std.BlankClip(width=120, height=120, format=vs.YUV444P8, color=[0,0,0]) dst = core.z.Depth(src, dither='ordered', depth=16, fullrange_in=True, fullrange_out=True) dst = core.std.ShufflePlanes([dst, dst, dst], [0,1,2], vs.RGB) dst = core.std.Expr(dst, 'x', vs.RGB48) dst_stack = core.fmtc.nativetostack16(dst) dst_stack.set_output() The result YUV444Px is copied to RGB48 planes to avoid YUV->RGB conversion. Finally convert native-16bit to stacked-16bit format to better visualize the MSB/LSB in RGB24 image. ordered none or random error_diffusion 1. There're underflow issues. For example, 0 in 8bit chroma converted to 16bit is -128, it should be cliped to 0, but the result is 65408. 2. There're also overflow issues with dither enabled. For example, 65535 in 16bit chroma converted to 15bit is 32767.25, then the resulting dithering pattern is mixed with 32767 and 32768, exceeding valid range [0,32767]. 3. The ordered dither is always applied even if no range conversion is needed, and the dithering amplitude is always 1 in 8-bit scale no matter what the output depth is. I wonder if it is designed to be this or not. 4. In the above examples, the right most part is not affected, but in some other tests this doesn't happen. I'm not very sure in which conditions will this happen. Last edited by mawen1250; 16th April 2015 at 16:05. |
17th April 2015, 17:58 | #23 | Link |
Registered User
Join Date: Aug 2011
Posts: 103
|
Thanks for your reply!
This build fixes the magnitude of ordered dither (to 0.5), which solved lots of problems. The underflow issues is solved, but the overflow issues for 9-15bit still exists. 2. Not only full range YUV can produce this kind of overflow issue, but also limited range RGB/YUV with out of range value, such as reducing 65535 to 9-15bit with dithering, or converting 65535 from limited range 16bit to full range 9-15bit. Thus, IMO this may lead to potential problems in practice, since we may not guarantee the input image is perfect. Acctually converting to 9-15bit is not so often used except in the case of final output, so I prefer safe output rather than performance or intermediate precision. Perhaps an additional option for clamping the result to valid range can be added? Also it can be used for limiting the value to limited range when fullrange_out=False. 3. My mistake, the ordered dither does affect only the least significant bit of the output, maybe I was misled by the image with underflow issues. In the previous build, when the output depth & range are the same as input, the ordered dithering pattern still applies to the image. After fixing the magnitude issue of ordered dither it's also solved. I suppose it will be faster to directly return the src frame pointer on this condition, since the frame data is always unchanged. 4. Yes, I got it. |
18th April 2015, 21:08 | #24 | Link |
Registered User
Join Date: Aug 2011
Posts: 103
|
I see. This will make this filter clean and clear. On the other side, the users need to be aware of what they are doing and take more care of such risks on these special conditions.
Anyway, this is a great library and thanks for your efforts! |
7th June 2015, 03:39 | #26 | Link | |
Registered User
Join Date: Aug 2011
Posts: 103
|
Code:
# YUV420P8 input src = core.z.Depth(src, depth=16, fullrange_in=False) src = core.z.Resize(src, src.width, src.height, filter_uv="bicubic", filter_param_a_uv=1/3, filter_param_b_uv=1/3, subsample_w=0, subsample_h=0) src = core.z.Colorspace(src, matrix_in=1, transfer_in=1, primaries_in=1, matrix_out=0, transfer_out=1, primaries_out=1) # RGB48 output Quote:
Last edited by mawen1250; 7th June 2015 at 04:54. |
|
15th August 2015, 13:24 | #27 | Link |
Registered User
Join Date: Aug 2011
Posts: 103
|
zimg-1.1.1 crashes VS for unknown reason, after running a script for some time(about 70-100s in my tests).
Windows 7 x64 VapourSynth R27 x64 threads=8 I just found that using BM3D built with MSVC14 also introduces the same problem (64bit crashes, 32bit doesn't in my test). Considering zimg-1.1.1 is also built with MSVC14, could this be some problems related to VapourSynth and MSVC14? Last edited by mawen1250; 15th August 2015 at 13:31. |
15th August 2015, 13:37 | #28 | Link | |
I'm Siri
Join Date: Oct 2012
Location: void
Posts: 2,633
|
Quote:
like this? 80% sure it's a vs2015 issue |
|
8th October 2015, 16:26 | #30 | Link |
Registered User
Join Date: Aug 2011
Posts: 103
|
two issues about 1.95 Beta:
1. If the clip is first cropped with std.CropAbs/std.CropRel and the width is not mod16, right-most pixels are corrupt. 2. no arguments for shift_w, shift_h, subwidth, subheight Last edited by mawen1250; 8th October 2015 at 17:20. |
8th October 2015, 18:14 | #31 | Link | |
Registered User
Join Date: Nov 2009
Posts: 327
|
Quote:
2. Is there a use case for this that's not covered by specifying chromaloc? Last edited by Stephen R. Savage; 8th October 2015 at 19:02. |
|
9th October 2015, 04:50 | #32 | Link | |
Registered User
Join Date: Aug 2011
Posts: 103
|
Quote:
2. to do top-left aligned resampling, for example mv.Super and warp.AWarp may need it 3. any other time you want to do a (sub-pixel) cropping/padding |
|
24th October 2015, 11:49 | #33 | Link |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,847
|
Looks like around 32 threads 'engine' gets saturated in terms of threading.
How does the threading work- some slices? Does it mean higher resolution will scale better? Latest ffmpeg added zscale filter, which I think is great news! |
24th October 2015, 22:50 | #34 | Link |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,847
|
I meant: performance doesn't scale linear with cores. Provided graphs show big speed/core performance drop with bigger core numbers. Speed still rises, but we are wasting many cores and CPU power to get eg. just 20% speedup.
I think my question is- does z library engine scales linearly with core numbers? If we would be able to deliver raw video data at unlimited speed and just measure z performace, would processing speed be linear with cores rise? Last edited by kolak; 24th October 2015 at 22:54. |
25th October 2015, 09:37 | #35 | Link |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,847
|
If 48 threads gives 21x speed than this is very good, but I can't see this on these graphs.
This information is enough. Sometimes adding hyperthreading to such a test can be bit missleading. In this case we gain close to 20% from 'fake' threads, which is very good result. I should look closer at PC descriptions Grapghs should always have units I think z will be great addition to existing libraries, specially in ffmpeg. I would like to see noise generator and maybe Sierra algorithm, which seams to be as good as Floyd, but can be way faster. Floyd+tiny amount of noise gives amazing results for high quality video masters. Last edited by kolak; 25th October 2015 at 09:47. |
Thread Tools | Search this Thread |
Display Modes | |
|
|