Merge / Blend n frames together

ChaosKing · 19th November 2018, 22:14

I'm playing around with this OCR script here https://github.com/pocketsnizort/pythOCR

I extracted some text from an old movie but the text it is unstable. My idea is to blend all text-frames togehter to see if this improves the text quality for ocr.

Scroll to 14s: https://www.dropbox.com/s/59sicf9gqosa614/subs.mkv?dl=0

But I don't know how to approach this. I have a scenechanges.csv and now I would need a function to blend (or maybe merge, I don't know whats better) frames from n to m together. Where should I start? Or Maybe someone can give an example.

SceneChanges.csv

Code:

[Video Informations]
fps=23.976024
frame_count=2878

[Scene Informations]
frame,is_start,is_end,subimage
0269,1,0,"0269.png"
0326,0,1,""
1071,0,1,""
1072,1,0,"1072.png"
1178,0,1,""
1681,1,0,"1681.png"
1757,0,1,""
1758,1,1,"1758.png"
1759,1,1,"1759.png"
1760,1,1,"1760.png"
1761,1,1,"1761.png"
1762,1,1,"1762.png"
1763,1,0,"1763.png"
1787,0,1,""
1895,1,0,"1895.png"
1943,0,1,""
1945,1,0,"1945.png"
2017,0,1,""
2018,1,0,"2018.png"
2297,1,0,"2297.png"
2377,0,1,""
2479,1,0,"2479.png"
2535,0,1,""
2613,1,0,"2613.png"
2670,0,1,""

DJATOM · 19th November 2018, 22:32

Try std.Expr(). The basic idea to split range onto frames and feed them to Expr, your expression will be like 'x y max z max ... etc' or for example, count frames in your range, divide every clip into that value and add all frames afterwards:
rangeLen = len(range)
f'x {rangeLen} / y {rangeLen} / + z {rangeLen} / +'
Then just loop resulted image to match original length and add to the clip you will OCR later.
On my opinion first method should be better, but probably you'll have a lot of unneeded white dots in the end.

lansing · 19th November 2018, 22:53

What does OCR do? Read the subtitle in the frame and turn them into text?

ChaosKing · 20th November 2018, 11:04

@lansing yes ocr = img to text

@DJATOM
Do you mean like this?

Code:

rangeLen = 8
Clip = Clip.std.Trim(1895,1943) # one sub block
Clip = core.std.Expr(clips=[Clip,Clip,Clip], expr=[f'x {rangeLen} / y {rangeLen} / + z {rangeLen} / +'])

I needed to add clip 3 times to get it to work (clip is GRAY8), but the result is like a "fade to black". range = 1 text is white, range = 48 text is not visible anymore.
I never used Expr and don't really understand it yet.

@HolyWu
Yes this is more or less what I wanted. I also tried tla.TempLinearApproximate(radius=7), it looks also very similar to Averageframes() but needs its own trim() for every sub. Your solution is more elegant

The ocr result has definitely improved now but still contains some random crap chars.

DJATOM · 20th November 2018, 14:41

Not really. I'll describe it below.

Code:

Clip = Clip.std.Trim(1895,1943) # one sub block
rangeLen = len(Clip)
rangeList = []
exprString = ''
for i in range(rangeLen):
    rangeList.append(Clip[i])
    if len(exprString) is 0:
        exprString += f'x {rangeLen} / '
    else:
        exprString += f'y {rangeLen} / + ' # I think you can make a string with "xyz..." symbols and then extract proper variable with slicing: exprVar[1] ... exprVar[26]

Clip = core.std.Expr(clips=rangeList, expr=exprString)

lansing · 20th November 2018, 16:26

Quote:

Originally Posted by ChaosKing

Yes this is more or less what I wanted. I also tried tla.TempLinearApproximate(radius=7), it looks also very similar to Averageframes() but needs its own trim() for every sub. Your solution is more elegant

The ocr result has definitely improved now but still contains some random crap chars.

Maybe a despot filter to remove the random spots before averageframe?

ChaosKing · 21st November 2018, 13:03

Quote:

Originally Posted by DJATOM

Not really. I'll describe it below.

Code:

Clip = Clip.std.Trim(1895,1943) # one sub block
rangeLen = len(Clip)
rangeList = []
exprString = ''
for i in range(rangeLen):
    rangeList.append(Clip[i])
    if len(exprString) is 0:
        exprString += f'x {rangeLen} / '
    else:
        exprString += f'y {rangeLen} / + ' # I think you can make a string with "xyz..." symbols and then extract proper variable with slicing: exprVar[1] ... exprVar[26]

Clip = core.std.Expr(clips=rangeList, expr=exprString)

I always get vapoursynth.Error: Expr: More than 26 input clips provided
I guess this is an Expr limitation? It works with when is set the Trim to Trim(1895,1895+25)

The result is 1 frame which looks a bit better but still very similar than without expr. Pro side: it has less artifacts

Here is a comparison

EDIT
Ok, it depends on the startframe. Since the artifacts in the averageframe() version are "less white" they can now be filtered out more easily. Big THX @ all
With std.Binarize(threshold=80) the gray smear is gone in the average version

ChaosKing · 28th November 2018, 16:32

One additional question. Is it possible to make the black background transparent? RGBA? How would this look in vs code?

poisondeathray · 28th November 2018, 17:37

Quote:

Originally Posted by ChaosKing

One additional question. Is it possible to make the black background transparent? RGBA? How would this look in vs code?

Copy itself as Y8 into the alpha channel . Vapoursynth passes RGB24 + Y8 as the alpha

e.g (starting from your RGB png)

Code:

v = core.imwri.Read(r'PATH\7MQEv0x.png')
a = core.resize.Point(v, format=vs.GRAY8, matrix_s="709", range_s="full")
v.set_output(alpha=a)

19th November 2018, 22:14	#1 \| Link
ChaosKing Registered User Join Date: Dec 2005 Location: Germany Posts: 1,795	Merge / Blend n frames together I'm playing around with this OCR script here https://github.com/pocketsnizort/pythOCR I extracted some text from an old movie but the text it is unstable. My idea is to blend all text-frames togehter to see if this improves the text quality for ocr. Scroll to 14s: https://www.dropbox.com/s/59sicf9gqosa614/subs.mkv?dl=0 But I don't know how to approach this. I have a scenechanges.csv and now I would need a function to blend (or maybe merge, I don't know whats better) frames from n to m together. Where should I start? Or Maybe someone can give an example. SceneChanges.csv Code: [Video Informations] fps=23.976024 frame_count=2878 [Scene Informations] frame,is_start,is_end,subimage 0269,1,0,"0269.png" 0326,0,1,"" 1071,0,1,"" 1072,1,0,"1072.png" 1178,0,1,"" 1681,1,0,"1681.png" 1757,0,1,"" 1758,1,1,"1758.png" 1759,1,1,"1759.png" 1760,1,1,"1760.png" 1761,1,1,"1761.png" 1762,1,1,"1762.png" 1763,1,0,"1763.png" 1787,0,1,"" 1895,1,0,"1895.png" 1943,0,1,"" 1945,1,0,"1945.png" 2017,0,1,"" 2018,1,0,"2018.png" 2297,1,0,"2297.png" 2377,0,1,"" 2479,1,0,"2479.png" 2535,0,1,"" 2613,1,0,"2613.png" 2670,0,1,"" __________________ AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK \|\| VapourSynth Database

19th November 2018, 22:32	#2 \| Link
DJATOM Registered User Join Date: Sep 2010 Location: Ukraine, Bohuslav Posts: 377	Try std.Expr(). The basic idea to split range onto frames and feed them to Expr, your expression will be like 'x y max z max ... etc' or for example, count frames in your range, divide every clip into that value and add all frames afterwards: rangeLen = len(range) f'x {rangeLen} / y {rangeLen} / + z {rangeLen} / +' Then just loop resulted image to match original length and add to the clip you will OCR later. On my opinion first method should be better, but probably you'll have a lot of unneeded white dots in the end. __________________ Me on GitHub PC Specs: Ryzen 5950X, 64 GB RAM, RTX 2070

20th November 2018, 11:04	#4 \| Link
ChaosKing Registered User Join Date: Dec 2005 Location: Germany Posts: 1,795	@lansing yes ocr = img to text @DJATOM Do you mean like this? Code: rangeLen = 8 Clip = Clip.std.Trim(1895,1943) # one sub block Clip = core.std.Expr(clips=[Clip,Clip,Clip], expr=[f'x {rangeLen} / y {rangeLen} / + z {rangeLen} / +']) I needed to add clip 3 times to get it to work (clip is GRAY8), but the result is like a "fade to black". range = 1 text is white, range = 48 text is not visible anymore. I never used Expr and don't really understand it yet. @HolyWu Yes this is more or less what I wanted. I also tried tla.TempLinearApproximate(radius=7), it looks also very similar to Averageframes() but needs its own trim() for every sub. Your solution is more elegant The ocr result has definitely improved now but still contains some random crap chars. __________________ AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK \|\| VapourSynth Database Last edited by ChaosKing; 20th November 2018 at 11:09.

28th November 2018, 16:32	#8 \| Link
ChaosKing Registered User Join Date: Dec 2005 Location: Germany Posts: 1,795	One additional question. Is it possible to make the black background transparent? RGBA? How would this look in vs code? __________________ AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK \|\| VapourSynth Database

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

19th November 2018, 22:53	#3 \| Link
lansing Registered User Join Date: Sep 2006 Posts: 1,657	What does OCR do? Read the subtitle in the frame and turn them into text?