fmtconv: resize, bitdepth and colorspace conversions [Archive]

cretindesalpes

16th November 2012, 21:33

Here is another plug-in for Vapoursynth.

>>> fmtconv-r31.zip <<< (https://ldesoras.fr/src/vs/fmtconv-r31.zip)

Fmtconv is a format-conversion plug-in for the Vapoursynth and Avisynth+ video processing engines. It does:

Resizing.
Bitdepth conversion with dithering.
Colorspace conversion (matrix, transfer characteristics and chromatic adaptation).

Supports:

8-–12-, 14- and 16-bit integer, 32-bit float
Colorspaces: RGB, Y, YUV, YCoCg, YDzDx and ICtCp in 4:4:4, 4:2:2, 4:2:0 and 4:1:1 chroma subsampling factors.
Progressive and interlaced content.

Fmtconv is focussed primarily on quality and exactness rather than execution speed. This does not mean it is slow or unoptimized, but fmtconv is clearly not on par with the fastest equivalent 8-bit filters.

The full documentation is included in the zip file.
New requirement from r29: Vapoursynth r55 or above (support for API v4 only).

If you’re curious, you’ll see undocumented functions in this plug-in. However they are temporary and will be removed later or moved to another plug-in, so please don’t use them.

Source code is also available as Git repository (https://gitlab.com/EleonoreMizo/fmtconv).

sneaker_ger

16th November 2012, 22:26

Thank you, but do you really think it is a good idea to offer stacked? Maybe we should be getting rid of that hack once and for all.

cretindesalpes

16th November 2012, 22:32

The stacked stuff is only meant to offer interoperability with Avisynth plug-ins, before they all get ported.

kolak

16th November 2012, 23:01

Thanks a lot- great work :)

:thanks:

Keiyakusha

16th November 2012, 23:26

Here it comes first question from someone who have no clue about this plugin.

With it can I convert lets say 320x2 (or 2x320) YV12 clip into 320x2 YV24 and without rising bitdepth? For example 8bit in - 8 bit calculation - 8 bit out, not 8-16-8

Edit: i'm not an expert on YCgCo, does current implementation offers lossless conversion to and from RGB?

kolak

16th November 2012, 23:31

You probably can't as internal processing is done at high precision, but this is only a good thing :)

cretindesalpes

16th November 2012, 23:36

Keiyakusha: no it's not possible. All the resizing calculations are done in float anyway so in the end a bitdepth conversion has to be done (if you don't want a float clip).

I haven't tested if the YCgCo can restore lossless RGB. It would need at least 9 bits in theory.

Keiyakusha

16th November 2012, 23:47

You probably can't as internal processing is done at high precision, but this is only a good thing :)

Unfortunately I have different opinion on this. Outside of professional 3D and 2.5D editing software, where high bitdepth often used for better control in manipulating with color or may represent something like strength of light that goes through less than 100% opaque obstacles (and also maybe x264's 10bit streams too), I don't believe in high bitdeph. In other words using more than 8bits for things like resize, denoise or whatever in my opinion gives so little improvements, that it not worth even one extra cpu cycle spent on it.

This rises another question. Is it possible to estimate how big is speed difference between similar colorspace (or some other) conversions (8bit in - 8bit out) using this plugin compared to avisynth? (assuming everything is done in 1 thread)

cretindesalpes

17th November 2012, 00:02

Therefore this plug-in is most likely not for you. It’s much slower than the Avisynth resizers. You can check the speed difference by setting only one thread in vapoursynth (use core = vs.Core (threads=1) ) and do a benchmark with a simple blankclip as input. Or you can do everything in Avisynth, using Dither_resize16 which uses the same algorithms and should give a comparable speed, making sure that avstp is disabled (call avstp_set_threads(1) at the end of the script).

Myrsloik

17th November 2012, 00:04

Unfortunately I have different opinion on this. Outside of professional 3D and 2.5D editing software, where high bitdepth often used for better control in manipulating with color or may represent something like strength of light that goes through less than 100% opaque obstacles (and also maybe x264's 10bit streams too), I don't believe in high bitdeph. In other words using more than 8bits for things like resize, denoise or whatever in my opinion gives so little improvements, that it not worth even one extra cpu cycle spent on it.

This rises another question. Is it possible to estimate how big is speed difference between similar colorspace (or some other) conversions (8bit in - 8bit out) using this plugin compared to avisynth? (assuming everything is done in 1 thread)

If course it is, just set vs to use one thread and use ffms2 as the video source in both and it should be very easy to compare 1:1.

Keiyakusha

17th November 2012, 00:11

Thanks for your answers. I kind of expected this answer but had to make sure. I mainly asked because right now using vapoursynth's internal resizer it seem to be impossible to do 320x2 YV12 -> 320x2 YV24 (and back) at all. (or Yv12->rgb, or whatever). And color conversion seems to be not included in separate plugin with internal avisynth functions. Thus right now I have to use ffms2 in avisynth, do conversions there and only then pass this to vapoursynth... (using awesome plugin by Chikuzen). Unless I missed something.

kolak

17th November 2012, 00:24

Unfortunately I have different opinion on this. Outside of professional 3D and 2.5D editing software, where high bitdepth often used for better control in manipulating with color or may represent something like strength of light that goes through less than 100% opaque obstacles (and also maybe x264's 10bit streams too), I don't believe in high bitdeph. In other words using more than 8bits for things like resize, denoise or whatever in my opinion gives so little improvements, that it not worth even one extra cpu cycle spent on it.

This rises another question. Is it possible to estimate how big is speed difference between similar colorspace (or some other) conversions (8bit in - 8bit out) using this plugin compared to avisynth? (assuming everything is done in 1 thread)

I deal with different type of data and done many test, which actually surprised me.

a= take 10bit HD file and scale at 8bit to SD.
b= take 10bit HD file, dither it and scale to SD (at 8bit).
c= take same file scale at 10bit to SD and than dither.

There is actually quite visible difference in all of them, but I was surprised that even b and c show visible difference. I though that there will be no real difference, but there is.

Just a not- we're talking about proper 10bit source- eg shot on RED, Alexa etc cmera.

Keiyakusha

17th November 2012, 00:48

I deal with different type of data and done many test, which actually surprised me.

a= take 10bit HD file and scale at 8bit to SD.
b= take 10bit HD file, dither it and scale to SD (at 8bit).
c= take same file scale at 10bit to SD and than dither.

There is actually quite visible difference in all of them, but I was surprised that even b and c show visible difference. I though that there will be no real difference, but there is.

Just a not- we're talking about proper 10bit source- eg shot on RED, Alexa etc cmera.
Yes there can be quite big visual difference when going from higher to lower bitdepth one way or another. Also depends on the actual content though. I'll do "c" if it is possible. But in this particular case I was talking about 8bit source to begin with. Sorry for not making it clear, this was kind of continuation of my initial post. Also what you say applies more to genuine 10bit sources than to upconverted.

sneaker_ger

17th November 2012, 12:28

A few questions:
1.) Does resample output 16 bit int or 32 bit float by default?
2.) "Bitdepth conversion with optional dithering."
How do I know if dithering is on or off?
3.) Is "MPEG2" the default chroma placement?

cretindesalpes

17th November 2012, 13:14

A few questions:
1.) Does resample output 16 bit int or 32 bit float by default?
The default is 16 bits for integer input, and float for floating point input. The 16-bit integer output is a straight conversion from the intermediate float results without dithering.

2.) "Bitdepth conversion with optional dithering."
How do I know if dithering is on or off?
Dithering is used when:
- Reducing the bitdepth of integer data, or converting from float to integer
- Doing a full-range ↔ TV-range conversion between integer formats, because the resulting values haven't an exact representation.

3.) Is "MPEG2" the default chroma placement?
Yes.

sneaker_ger

17th November 2012, 13:26

The default is 16 bits for integer input, and float for floating point input. The 16-bit integer output is a straight conversion from the intermediate float results without dithering.

Does it make any sense to choose 32 float output for 16 bit source? (Assuming I have more of your filters after that?)

Where can we find info on the dithering algos? Is there any good comparison that keeps video encoding in mind? Or is "Filter Lite" generally a good idea?

ajp_anton

17th November 2012, 14:09

Isn't YCgCo just YUV with different coefficients than 601 and 709? Because you're mentioning it as if it's an alternative to YUV.
And how do you use it? =)

Thread title is wrong BTW, "ftmconv".

cretindesalpes

17th November 2012, 14:39

sneaker_ger:

I’d say that keeping everything in 16 bit is sufficient, there is no need to use float. Float could be useful when working with linear light, because dark areas need more precision. Or when working with non-perceptually uniform colorspaces like CIE XYZ. Keeping a pipeline in float also avoids the conversion overhead, but the gain has to be balanced with the memory bandwidth doubling and related cache issues.

For the dithering algorithms, you’ll find relevant information here:
http://caca.zoy.org/wiki/libcaca/study/2
http://caca.zoy.org/wiki/libcaca/study/3
http://caca.zoy.org/wiki/libcaca/study/4
Filter Lite (Sierra 2-4A) is OK as a general purpose dither algorithm and is very similar to Floyd-Steinberg. Anyway if you want to dither to 8 bits before encoding to avoid colorbanding, you’ll probably prefer using ordered dithering.

ajp_anton:

Technically, YCgCo just is another linear combination of the RGB values into signals approximating perceptual luminance and chrominance. But the result is different enough to create another category. I think it’s suited more for encoding than for processing, as it is reported to compress a bit better than YUV. It is specified by MPEG-4 part 10 (H.264) but I think players and encoders supporting it are still rare.

Thank you for the typo report, I’ll fix it.

mandarinka

17th November 2012, 17:37

Nice addition!

So I take it that if one is paranoid, it'S best to pass around the float format and dither to 8/16 bit integer with bitdepth() as the last step (when the followup filters/encoder require it).

sneaker_ger

17th November 2012, 20:08

@cretindesalpes

Thanks for the detailed answers and your work on the plugin. I tested it out today and it worked really great.

cretindesalpes

18th November 2012, 18:21

fmtconv r2 (http://forum.doom9.org/showthread.php?t=166504):

resample: optimized paths involving float input or output
resample: fixed white/magenta screen with 8-bit input and float output
bitdepth: implemented fast dither mode (but not in SSE2 yet)
bitdepth: optimized float-to-integer path
bitdepth: faster dithering when ampo = 1 and ampn = 0
matrix: enabled the SSE path for float operations

Note: I'll rename the "bitdepth" argument in the bitdepth() function in "bits" in the next release.

Mug Funky

19th November 2012, 02:46

this is beautiful. thankyou very much.

any speed losses in this plugin are made up for by the parallelism gained from using VS, and it can only get faster. i'm breaking realtime on BD transcodes for the first time ever, and in higher quality than i was ever able to achieve.

Keiyakusha

19th November 2012, 02:58

any speed losses in this plugin are made up for by the parallelism gained from using VS, and it can only get faster.
I don't understand how this is possible.
If that is true, this means x264 or whatever software you use fails to fill all available CPU resources.
If we assume that A filter twice as fast compared to B filter (both in one thread), then under 100% cpu load A still will be twice faster, regardless of how much threads you put in filter B. Even if B will output result faster, it will eat cycles that otherwise will go to x264, so overall there shouldn't be speed gain or unchanged speed, there should be speed loss.

Mug Funky

19th November 2012, 07:09

Keiyakusha

19th November 2012, 08:33

i'm comparing to encoding the same blu-ray through avisynth, which is single threaded. often it's the bottleneck when encoding, especially if using intermediate formats (because sometimes x264 isn't the final destination :))
If you were some new user, I could understand this response, but you're not, so it's kind of trolling. Anyhow, I got your point.

kolak

19th November 2012, 11:42

cretindesalpes: thank you for your work. Quick test are promising- all seams to be working well and speed is very good. Are you going to implement adding noise? I found that whatever dithering method you use it's always good put put a bit of noise anyway.

Great job :thanks:

kolak

19th November 2012, 12:12

Have crash on these lines:

ret = core.ffms2.Source(source=r'S:\test.mov') -10bit DNxHD source
b= core.fmtc.resample(clip=ret, w=720, h=576)
c= core.fmtc.matrix (clip=b, mats="709", matd="601", col_fam=vs.YUV)
a= core.fmtc.bitdepth (clip=c, bitdepth=8, dmode=3)

It's matrix line, as without it all was working fine.
It also looks like dithering mode=5 does not work at all- video looks like it's without any dithering.

Any idea why (vs R16)?

cretindesalpes

20th November 2012, 00:19

kolak

20th November 2012, 01:01

I couldn’t reproduce your problems here (with both VS r15 and r16). Your script works and dmode=5 is effective. Are your sure your input is 4:4:4? Matrix needs 4:4:4 to work, but it should issue an error message, not a crash.

No- source is 4:2:2 10bit (most likely YUV422P10 color space).
So I need to add line with conversion to 4:4:4 before matrix?

It crashes Vdub badly- straight away, no log etc.

Another question - how lossless is/can be YUV->RGB->YUV with your tool?
If source is YUV will it stay in YUV for all conversions?

Keiyakusha

20th November 2012, 01:23

kolak
Maybe I miss something but you not supposed to open in VDub anything that is not V210 or 8bit formats. Sure it crashes,closes,whatever, it doesn't supports whats coming into it.
Edit: and yes in theory it should give an error, but on practice for me just after some message pops-up, the process terminates.

Myrsloik

20th November 2012, 01:25

kolak
Maybe I miss something but you not supposed to open in VDub anything that is not V210 or 8bit formats. Sure it crashes,closes,whatever, it doesn't supports whats coming into it.

Stop trolling around. Vdub doesn't crash on unsupported formats. It simply says it can't decode that fourcc.

Keiyakusha

20th November 2012, 01:29

Stop trolling around. Vdub doesn't crash on unsupported formats. It simply says it can't decode that fourcc.
It does. Not sure if it's a crash or whatever but it stops working

kolak

20th November 2012, 01:35

I convert to v210 or YUY2 if I want to see it in Vdub- I know Vdub very well. Read above.

cretindesalpes

20th November 2012, 08:34

No- source is 4:2:2 10bit (most likely YUV422P10 color space).
So I need to add line with conversion to 4:4:4 before matrix?
Yes, just add css="444" in your resample line.

Another question - how lossless is/can be YUV->RGB->YUV with your tool?
It's not lossless. For the matrix operation, it will be bound by numerical error noise. So the higher the bitdepth, the better. Anyway a 16-bit chain should be enough to process a 10-bit input.

And if you add inverse and direct chroma subsampling, it will add more errors because of the aliasing and the limited kernel bandwidth. Even using a very large kernel, the losslessness will not be guaranteed.

If you want more chance to recover the original pixels, dither with dmode=1 at the very last step. But if you inserted some operations in your processing like color or level correction, use a real dithering and forget about losslessness.

If source is YUV will it stay in YUV for all conversions?
I'm not sure if I correctly understand your question. We now have real high-bitdepth and planar RGB colorspaces in Vapoursynth, so there is no need to use the same tricks as in the Dither tools for Avisynth. When you specify a RGB colorspace in matrix, the clip is really converted to this colorspace.

kolak

20th November 2012, 12:19

If import filter reads file as eg. YUV422P10 and I want to scale it to SD staying in YUV mode is this possible (or will it always go to RGB during internal conversions)?
Like in the case above- DNxHD scaled to PAL with 709->601 conversion- will this stay in YUV ?

I don't want to go to RGB when source file is read as YUV.

Thanks for your clear answers :)

cretindesalpes

20th November 2012, 13:12

Yes of course, resample and matrix will keep the same colorspace family by default. For matrix, you can specify another family as target colorspace.

If your input is YUV422P10 it must be converted to 4:4:4 before applying the matrix, and converted back to 4:2:2 after. This is how it should be done. This will yield a more reliable result than for example ColorMatrix, which processes the chroma without taking the luma into account (well, this is not important for the specific 601<->709 case), and processes the luma by point-resizing the chroma on the fly. This approximation is fast but not very accurate.

If resampling the chroma twice is not acceptable for you, there is another possibility, a bit more complex to set up: use the basic 4:4:4 conversion to generate the luma plane, resample the luma to match exactly the initial chroma (half-sized 4:4:4) to generate the chroma planes, and merge the resulting planes.

kolak

20th November 2012, 13:17

Well- if it stays in YUV or RGB depending on incoming format by default, than this is good for me :)

I did try YUV->RGB->YUV but it end up quite far from being lossless.

Source is YUY2, than:

ret= core.fmtc.resample(clip=ret, css="444")
ret= core.fmtc.matrix (clip=ret, mat="601", col_fam=vs.RGB)
ret= core.fmtc.matrix (clip=ret, mat="601", col_fam=vs.YUV)
ret= core.fmtc.resample (clip=ret, css="422")
ret = core.fmtc.bitdepth (clip=ret, bitdepth=8, dmode=1)

dmode=-1 keeps crashing.

cretindesalpes

20th November 2012, 13:37

Hmm right it's dmode=1 now, it was -1 in the Dither tools for avisynth so I got confused. I fixed my previous message. Also, for this kind of stuff, you can resample with a better kernel than the default (a 8-tap Blackman for example).

kolak

20th November 2012, 13:53

Are all kernels as in dither, should I look in dither docs? I used spline36 I think (it was just a guess :)).

Sorry- I used spline36 when I tested resizing.

Reel.Deel

20th November 2012, 14:55

Hi cretindesalpes, like always thank you very much for another awesome tool. :)
For testing purposes only, I used one of your resizing examples in the Dither documentation. I don't know if I'm doing something wrong but I'm getting some strange results.

Example in Dither doc:
Sharpening the luma using the convolver of the resizer:

Dither_convert_8_to_16 ()
Dither_resize16 (Width (), Height () / 2, kernel="impulse -1 6 -1",
\ fh=-1, fv=-1, cnorm=true, center=false, y=3, u=2, v=2)
DitherPost ()
My Script:
import vapoursynth as vs
import sys
core = vs.Core()

# Load Plugins
core.std.LoadPlugin(path=r'C:\Vapoursynth\FMTConv\fmtconv.dll')
core.avs.LoadPlugin(path=r'C:\AviSynth 2.5\plugins\DGDecodeNV.dll')

# Blu-ray source cropped to 1920x1080 by DGSource
src = core.avs.DGSource(dgi=r'X:\Test.dgi')

# Processing
src = core.fmtc.resample(clip=src, w=960, h=540, impulse="-1 6 -1", fh=-1, fv=-1, cnorm=1, css="420", planes="3,2,2", center=2)
src = core.fmtc.bitdepth(clip=src, bitdepth=8)

# VDub Output
last = src
The problems I'm encountering:
Sometimes it opens without resizing at all.
Sometimes it resizes to 960x1080 or to 1920x540.
Sometimes (rare) it resizes correctly.
But most of the time VDub gives me the following error:
Avisynth open failure:
python exception: 'resample: argument clip is required'
I tried removing some parameters to try to find the problem. The following still produces the problems mentioned above.
.....
src = core.fmtc.resample(clip=src, w=960, h=540, impulse="-1 6 -1")
src = core.fmtc.bitdepth(clip=src, bitdepth=8)
.....
I'm using VS r16 on 32-bit Windows XP SP3.

*edit*
Using the following, the correct results are more consistent but the other problems still remain.
src = core.fmtc.resample(clip=src, w=1920, h=540, impulse="-1 6 -1")

sneaker_ger

20th November 2012, 18:40

If your input is YUV422P10 it must be converted to 4:4:4 before applying the matrix, and converted back to 4:2:2 after. This is how it should be done. This will yield a more reliable result than for example ColorMatrix, which processes the chroma without taking the luma into account (well, this is not important for the specific 601<->709 case), and processes the luma by point-resizing the chroma on the fly. This approximation is fast but not very accurate.

So matrix on something other than 4:4:4 is not just not yet implemented, but is not implemented on purpose?

cretindesalpes

20th November 2012, 20:14

Reel.Deel:

Try with the syntax: impulse=[-1, 6, -1]. This is no longer a string but a wonderful datastructure called array.

However I don't understand what's happening with these error messages that don't match the actual error. I need to investigate this.

sneaker_ger:

Yes, more or less. As mentioned previously, I could implement specific cases differently but at the moment it works as designed. It needs to be optimized, though.

ajp_anton

21st November 2012, 14:27

Would it be possible to add support for the Jinc resizer (in madVR)?

kolak

22nd November 2012, 17:56

cretindesalpes:

Source is YUY2:

b= core.fmtc.resample(clip=ret, w=720, h=576, kernel="spline36", css="444")
c= core.fmtc.matrix (clip=b, mats="709", matd="601", col_fam=vs.YUV)
a= core.fmtc.bitdepth (clip=c, bitdepth=8, dmode=1)

I seams to have some issues with dither tools- resizing seams to mess with chroma placement, specially with css=444, which is required for matrix conversions. There are issues without css=444 also, but less visible.
Test on HD bars going to SD shows chroma shifts compared to Vdub and internal Edius resizing (they are both about the same). There is something wrong going on with dither tools. Another issue are dithering (higher modes, eg 3) artefacts, which seams to break solid nature of bars (specially green one).
This was also visible a lot when I tested YUV->RGB->YUV conversion- avisynth was way more lossless, which is not the way how it should be taking into account dither precision.
I will try to put some grabs later.

update: it's all related to last line in my script which is: last=core.resize.Spline(clip=h,format=vs.COMPATYUY2) - it's not dither tools, but this line messing with good output of dither tools. Is this part of swscale?

cretindesalpes

23rd November 2012, 23:52

ajp_anton:

Not at the moment. Jinc is quite different of the other kernels (the transform is not separable) so it would need a significant rewrite to add it. Currently I focus on making everything work correctly.

kolak:

Yes, core.resize.Spline uses swscale. Do I understand that finally you haven't any more problem with fmtconv?

— - —

Time for a new release:

fmtconv r3 (http://forum.doom9.org/showthread.php?t=166504) bitdepth: changed the "bitdepth" parameter to "bits"
bitdepth: added SSE2 optimizations for upconversions.
resample: added interlaced resizing ("interlaced" parameter)
resample: now sets "_ColorRange" and "_ChromaLocation" properties when known
resample: fixed the "planes" parameter previously interpreted as 0 (black or green screen).

kolak

24th November 2012, 00:14

Yes- fmtconv is absolutely fine (at least bits which I tested)- all problems were related to swscale!

Question- how interlaced resizing is handled- resize per field?

cretindesalpes

9th December 2012, 16:18

kolak:

For interlaced resizing, you have to separate the fields first. Then the relative field position is automatically taken into account when resizing.

fmtconv r4 (http://forum.doom9.org/showthread.php?t=166504):

Added a documentation.
Filters now write some frame properties when known.
Fixed the code so it can be compiled on Linux. Thanks to Jackoneill/Nodame for testing.
bitdepth: no need to specify any bitdepth or colorspace (for simple range conversions).
matrix: Added SSE2 implementation for integer processing.
matrix: Allows the destination bitdepth to be higher than the input (added the bits parameter).
matrix: col_fam completes csp instead of replacing it.
resample: Added interlacedd to specify if output is interlaced (allows simple bobbing).
resample: Added tff and tffd to specify field parity.
resample: Added scale, scaleh and scalev for easier magnification.
resample: Added a two-digit mode to css.
resample: Fixed a typo preventing to select 4:1:1 chroma subsampling.
Added nativetostack16.