Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th November 2018, 22:14   #1  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,795
Merge / Blend n frames together

I'm playing around with this OCR script here https://github.com/pocketsnizort/pythOCR

I extracted some text from an old movie but the text it is unstable. My idea is to blend all text-frames togehter to see if this improves the text quality for ocr.

Scroll to 14s: https://www.dropbox.com/s/59sicf9gqosa614/subs.mkv?dl=0

But I don't know how to approach this. I have a scenechanges.csv and now I would need a function to blend (or maybe merge, I don't know whats better) frames from n to m together. Where should I start? Or Maybe someone can give an example.


SceneChanges.csv
Code:
[Video Informations]
fps=23.976024
frame_count=2878

[Scene Informations]
frame,is_start,is_end,subimage
0269,1,0,"0269.png"
0326,0,1,""
1071,0,1,""
1072,1,0,"1072.png"
1178,0,1,""
1681,1,0,"1681.png"
1757,0,1,""
1758,1,1,"1758.png"
1759,1,1,"1759.png"
1760,1,1,"1760.png"
1761,1,1,"1761.png"
1762,1,1,"1762.png"
1763,1,0,"1763.png"
1787,0,1,""
1895,1,0,"1895.png"
1943,0,1,""
1945,1,0,"1945.png"
2017,0,1,""
2018,1,0,"2018.png"
2297,1,0,"2297.png"
2377,0,1,""
2479,1,0,"2479.png"
2535,0,1,""
2613,1,0,"2613.png"
2670,0,1,""
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database
ChaosKing is offline   Reply With Quote
Old 19th November 2018, 22:32   #2  |  Link
DJATOM
Registered User
 
DJATOM's Avatar
 
Join Date: Sep 2010
Location: Ukraine, Bohuslav
Posts: 377
Try std.Expr(). The basic idea to split range onto frames and feed them to Expr, your expression will be like 'x y max z max ... etc' or for example, count frames in your range, divide every clip into that value and add all frames afterwards:
rangeLen = len(range)
f'x {rangeLen} / y {rangeLen} / + z {rangeLen} / +'
Then just loop resulted image to match original length and add to the clip you will OCR later.
On my opinion first method should be better, but probably you'll have a lot of unneeded white dots in the end.
__________________
Me on GitHub
PC Specs: Ryzen 5950X, 64 GB RAM, RTX 2070
DJATOM is offline   Reply With Quote
Old 19th November 2018, 22:53   #3  |  Link
lansing
Registered User
 
Join Date: Sep 2006
Posts: 1,657
What does OCR do? Read the subtitle in the frame and turn them into text?
lansing is offline   Reply With Quote
Old 20th November 2018, 11:04   #4  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,795
@lansing yes ocr = img to text

@DJATOM
Do you mean like this?
Code:
rangeLen = 8
Clip = Clip.std.Trim(1895,1943) # one sub block
Clip = core.std.Expr(clips=[Clip,Clip,Clip], expr=[f'x {rangeLen} / y {rangeLen} / + z {rangeLen} / +'])
I needed to add clip 3 times to get it to work (clip is GRAY8), but the result is like a "fade to black". range = 1 text is white, range = 48 text is not visible anymore.
I never used Expr and don't really understand it yet.

@HolyWu
Yes this is more or less what I wanted. I also tried tla.TempLinearApproximate(radius=7), it looks also very similar to Averageframes() but needs its own trim() for every sub. Your solution is more elegant


The ocr result has definitely improved now but still contains some random crap chars.
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database

Last edited by ChaosKing; 20th November 2018 at 11:09.
ChaosKing is offline   Reply With Quote
Old 20th November 2018, 14:41   #5  |  Link
DJATOM
Registered User
 
DJATOM's Avatar
 
Join Date: Sep 2010
Location: Ukraine, Bohuslav
Posts: 377
Not really. I'll describe it below.
Code:
Clip = Clip.std.Trim(1895,1943) # one sub block
rangeLen = len(Clip)
rangeList = []
exprString = ''
for i in range(rangeLen):
    rangeList.append(Clip[i])
    if len(exprString) is 0:
        exprString += f'x {rangeLen} / '
    else:
        exprString += f'y {rangeLen} / + ' # I think you can make a string with "xyz..." symbols and then extract proper variable with slicing: exprVar[1] ... exprVar[26]

Clip = core.std.Expr(clips=rangeList, expr=exprString)
__________________
Me on GitHub
PC Specs: Ryzen 5950X, 64 GB RAM, RTX 2070

Last edited by DJATOM; 20th November 2018 at 15:05.
DJATOM is offline   Reply With Quote
Old 20th November 2018, 16:26   #6  |  Link
lansing
Registered User
 
Join Date: Sep 2006
Posts: 1,657
Quote:
Originally Posted by ChaosKing View Post
Yes this is more or less what I wanted. I also tried tla.TempLinearApproximate(radius=7), it looks also very similar to Averageframes() but needs its own trim() for every sub. Your solution is more elegant


The ocr result has definitely improved now but still contains some random crap chars.
Maybe a despot filter to remove the random spots before averageframe?
lansing is offline   Reply With Quote
Old 21st November 2018, 13:03   #7  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,795
Quote:
Originally Posted by DJATOM View Post
Not really. I'll describe it below.
Code:
Clip = Clip.std.Trim(1895,1943) # one sub block
rangeLen = len(Clip)
rangeList = []
exprString = ''
for i in range(rangeLen):
    rangeList.append(Clip[i])
    if len(exprString) is 0:
        exprString += f'x {rangeLen} / '
    else:
        exprString += f'y {rangeLen} / + ' # I think you can make a string with "xyz..." symbols and then extract proper variable with slicing: exprVar[1] ... exprVar[26]

Clip = core.std.Expr(clips=rangeList, expr=exprString)
I always get vapoursynth.Error: Expr: More than 26 input clips provided
I guess this is an Expr limitation? It works with when is set the Trim to Trim(1895,1895+25)

The result is 1 frame which looks a bit better but still very similar than without expr. Pro side: it has less artifacts

Here is a comparison


EDIT
Ok, it depends on the startframe. Since the artifacts in the averageframe() version are "less white" they can now be filtered out more easily. Big THX @ all
With std.Binarize(threshold=80) the gray smear is gone in the average version
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database

Last edited by ChaosKing; 21st November 2018 at 13:12.
ChaosKing is offline   Reply With Quote
Old 28th November 2018, 16:32   #8  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,795
One additional question. Is it possible to make the black background transparent? RGBA? How would this look in vs code?
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database
ChaosKing is offline   Reply With Quote
Old 28th November 2018, 17:37   #9  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,346
Quote:
Originally Posted by ChaosKing View Post
One additional question. Is it possible to make the black background transparent? RGBA? How would this look in vs code?
Copy itself as Y8 into the alpha channel . Vapoursynth passes RGB24 + Y8 as the alpha

e.g (starting from your RGB png)

Code:
v = core.imwri.Read(r'PATH\7MQEv0x.png')
a = core.resize.Point(v, format=vs.GRAY8, matrix_s="709", range_s="full")
v.set_output(alpha=a)
poisondeathray is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:37.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.