Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 30th September 2007, 14:46   #221  |  Link
Taktaal
Registered User
 
Join Date: May 2003
Posts: 114
Quote:
Originally Posted by Deckard2019 View Post
Thanks. I was able to fix two more bugs that led to your problems. The new 0.82 doesn't have that anymore.
http://x0r.ch/suprip/

Quote:
Originally Posted by madshi View Post
Hmmmm... Just beginning to look into subtitles stuff. @Taktaal, thanks for investing work here! I've one crazy idea. Maybe you'll like it, or maybe not. But I thought I'd just post it, just in case:

Couldn't you create one monstrous bitmap (e.g. 800x100000 pixels) and draw all subtitles to that one large bitmap? Additionally you could manually add the timestamps (as written text) to that bitmap, too. We could then feed such a monster bitmap to a good OCR software and the result should be a full SRT subtitle text file. Maybe we'd need a little helper tool which cleans up the final text file to make it fully SRT compatible, but that should be no big problem. What do you think?
Did you have any OCR software in mind that can produce an output that's easily postprocessed? Anyway, making such a bitmap isn't very hard, I'll add an option to the next version, maybe it'll lead somewhere.
Taktaal is offline   Reply With Quote
Old 30th September 2007, 20:37   #222  |  Link
Deckard2019
Registered User
 
Join Date: Jan 2005
Posts: 110
Quote:
Originally Posted by Taktaal View Post
Thanks. I was able to fix two more bugs that led to your problems. The new 0.82 doesn't have that anymore.
It's ok now. Thank you !
Deckard2019 is offline   Reply With Quote
Old 30th September 2007, 20:47   #223  |  Link
Rectal Prolapse
Registered User
 
Join Date: Mar 2005
Posts: 433
madshi - I wrote a tool that can take supread .srt output and mix it in with ABBYY's FineReader HTML output and create a perfectly usable SRT file.

The technique is this: Load the .sup file in SupRead. Save the SRT file WITHOUT using OCR. Then save the subtitles as PNG files.

Close supread. Open photoshop and batch process the subs to maximize the contrast for each PNG file and lower the brightness (to force subtitle outlines to be same color as the black background for easier OCR).

After photoshop is done, open ABBYY FineReader and load in all the PNGs.

OCR them. Save as HTML (some options are needed - like include horizontal lines to signify the end of each subtitle group) - you will get an HTML file.

I wrote a little tool to merge the subtitles from the SRT and the HTML into one SRT file.

That's it.

To give you an idea of how reliable this is: There are occasional problems with the letter I being confused for the number one (when the line begins with a dash). OCR accuracy appears to be 99% using FineReader.

The most important step is the contrast and brightness adjustment - the black outline around the white letters on a transparent background is VERY BAD for OCR.

Last edited by Rectal Prolapse; 30th September 2007 at 20:50.
Rectal Prolapse is offline   Reply With Quote
Old 30th September 2007, 21:02   #224  |  Link
Zelos
Registered User
 
Join Date: May 2007
Location: Marseille
Posts: 73
thanks Taktaal , your soft works great now !
apart the way that there is confusion between "I" and "1" the accurancy goes for 99%
Zelos is offline   Reply With Quote
Old 30th September 2007, 22:01   #225  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by Taktaal View Post
Did you have any OCR software in mind that can produce an output that's easily postprocessed?
Haven't checked yet. I think I'd try FineReader first.

Quote:
Originally Posted by Taktaal View Post
Anyway, making such a bitmap isn't very hard, I'll add an option to the next version, maybe it'll lead somewhere.
Great, thank you...
madshi is offline   Reply With Quote
Old 30th September 2007, 22:07   #226  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by Rectal Prolapse View Post
madshi - I wrote a tool that can take supread .srt output and mix it in with ABBYY's FineReader HTML output and create a perfectly usable SRT file.

The technique is this: Load the .sup file in SupRead. Save the SRT file WITHOUT using OCR. Then save the subtitles as PNG files.

Close supread. Open photoshop and batch process the subs to maximize the contrast for each PNG file and lower the brightness (to force subtitle outlines to be same color as the black background for easier OCR).

After photoshop is done, open ABBYY FineReader and load in all the PNGs.

OCR them. Save as HTML (some options are needed - like include horizontal lines to signify the end of each subtitle group) - you will get an HTML file.

I wrote a little tool to merge the subtitles from the SRT and the HTML into one SRT file.

That's it.

To give you an idea of how reliable this is: There are occasional problems with the letter I being confused for the number one (when the line begins with a dash). OCR accuracy appears to be 99% using FineReader.

The most important step is the contrast and brightness adjustment - the black outline around the white letters on a transparent background is VERY BAD for OCR.
Sounds good. But I think there's still a lot of workflow optimization possible. I think e.g. having a tool which saves all subtitles in one big bitmap (as I suggested) would help a lot to reduce the time needed to do all the steps you described. Maybe it would also be possible for Taktaal's tool to remove the letter outlines without having to use photoshop. Basically I'm looking for as much automation as possible. If Taktaal can implement some of the needed stuff into his tool, and if you throw your FineReader HTML parsing tool into the mix, maybe we can achieve something nice?
madshi is offline   Reply With Quote
Old 30th September 2007, 22:36   #227  |  Link
Rectal Prolapse
Registered User
 
Join Date: Mar 2005
Posts: 433
Madshi - yeah - a good optimization would be using a bitmap tool that can do the contrast/brightness adjustment without the bloat of Photoshop!

I'm not so sure about a massive bitmap though - that could break a lot of programs because I can imagine a bitmap too large crashing a lot of them! And processing time could increase exponentially as well. (although to be fair, finereader is really horribly slow processing each file...)

Including is a Visual Studio 2005 project for my little tool, written in C++. It is called SRTMaker. It's very rough though - it doesn't properly check command-line arguments, for example, but hopefully will help you if you decide to make a better tool. No restrictions, do whatever you like with it.
Attached Files
File Type: zip SRTMaker.zip (35.6 KB, 247 views)

Last edited by Rectal Prolapse; 30th September 2007 at 22:43.
Rectal Prolapse is offline   Reply With Quote
Old 30th September 2007, 22:37   #228  |  Link
Rectal Prolapse
Registered User
 
Join Date: Mar 2005
Posts: 433
Code:
Here are the notes I wrote on my Blu-ray subtitle conversion workflow:

Tools needed:

• Xport
• Supread
• Adobe Photoshop
• ABBYY Finereader

Extract the subtitle with xport.
Load the SUP file with Supread:
	○ Save the subtitles in PNG format using Save Bitmaps.
	○ Save an SRT file (do not OCR the subtitles within Supread)
Open Photoshop:
	1. Use an action that modifies the images:
		i.  Brightness to -100, Contrast to +100
		ii. Overwrites the PNG.
	This forces the background color to pure black and should eliminate any dropshadows in the subs. This makes OCRing much more reliable.
	2. Select File->Batch from Photoshop's menu.
		i. Select the folder container all the PNGs.
		ii. Select the action. Execute.
	This should convert all PNGs to have a black background and removes dropshadows.
Launch Finereader:
	1. Press the Open Image button, and select all the PNGs.
	2. Select all the images (press CTRL-A), then select Image->Correct Resolution from the menu.
	3. Choose a resolution of 300 DPI, and apply to All images in the batch.
	A dialog box with progress of the DPI assignment should appear.
	4. Go to Image->Load Block Template. 
	The template should be a Text Block that is the size of the whole image (roughly 1920x1080 or a bit less). It will be applied to all the images.
	5. Click the Read All button (dropdown button next to the Read All button).
	6. After everything is OCR, you can go through each page, making sure things look ok.
	7. Click on Save Pages button.
		i. Formatting options should be set for HTML: 
			1) Retain Layout: Retain font and font size.
			2) Save mode: Simple.
			3) Text settings: Keep Line breaks checked, Use Solid Line as Page break checked, uncheck Retain text color.
		ii. Save as type Unicode HTML (UTF-8).
		iii. Save Pages: All pages.
		iv. Create a single file for all pages.
	8. Click Okay to save the OCR'd pages to one file.
	9. Use SRTMaker, giving it the SRT and HTML file on the command-line to generate a new SRT that is piped to standard output.

Last edited by Rectal Prolapse; 30th September 2007 at 22:42.
Rectal Prolapse is offline   Reply With Quote
Old 1st October 2007, 00:05   #229  |  Link
Taktaal
Registered User
 
Join Date: May 2003
Posts: 114
Ok I made a sample PNG how such a combined subtitle image would look.

http://x0r.ch/suprip/serenity-50.png

That Finereader looks decent, but it's 130€, and to be honest, I'd rather have less, not more commercial software that people have to "acquire" one way or another to rip HD movies.
Taktaal is offline   Reply With Quote
Old 1st October 2007, 03:12   #230  |  Link
Rectal Prolapse
Registered User
 
Join Date: Mar 2005
Posts: 433
I'm sure there are a lot of opensource/free OCR apps that work well with PNGs too! At least I hope so!

My method is just one way of getting reliable OCR - for those who cannot wait for the proper tools to come out.

One problem with the steps I outlined above - you may not want to use unicode output - I had issues on some titles because of it (usually accented characters). They will be recognized easily by FineReader, but they don't work well with the SRT format.

Last edited by Rectal Prolapse; 1st October 2007 at 03:15.
Rectal Prolapse is offline   Reply With Quote
Old 1st October 2007, 07:47   #231  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by Taktaal View Post
Ok I made a sample PNG how such a combined subtitle image would look.

http://x0r.ch/suprip/serenity-50.png
That looks nice! However, the timecodes are missing. I'd suggest to use TextOut to add the timecodes (in SRT format) to that png file. Thank you...

Quote:
Originally Posted by Taktaal View Post
That Finereader looks decent, but it's 130€, and to be honest, I'd rather have less, not more commercial software that people have to "acquire" one way or another to rip HD movies.
Sure, I agree. But once we have a good PNG we can use for OCRing, maybe we can dig out a good freeware OCR tool or programming component to do the rest of the work. Or those of us who already own a professional OCR software can of course use that.

@Rectal Prolapse, thanks for the code!
madshi is offline   Reply With Quote
Old 5th October 2007, 21:11   #232  |  Link
bagge1
Registered User
 
Join Date: Feb 2005
Posts: 36
@Pelican9

I have a similar problem as Musky5790 described here
http://forum.doom9.org/showthread.ph...54#post1036754

I´m trying to add subtitles from a .srt file into Scenarist ACA (HDDVD)
I converted a´the .srt file to .sup with SubtitleCreator. Then I imported the .sup into SUPread and exported it to .PNG: s and scn-sst.

I get the same error from Scenarist as Musky5790:

internal software error:.\core\AUs\Advmux_timeGrip.ccp. line 94 -2
Advmuxmux::TimeGrid::addgrippointfill -- time not on field boundary.

Is this because the .sup file I get from SubtitleCreator is not a hd-sup but just a SDDVD sup? Is there any other way around it?

Thanks for the great effort you put in developing this!
bagge1 is offline   Reply With Quote
Old 5th October 2007, 21:19   #233  |  Link
manusse
SubtitleCreator's Co-Dev
 
manusse's Avatar
 
Join Date: Oct 2005
Location: France
Posts: 564
Quote:
time not on field boundary
I don't know if Scenarist ACA (HDDVD) can accept SD Sup. However the message seems to mean that the times of your subtitles are not a multiple of the frame duration. Depending if your DVD is 50Hz or 60Hz, you should try to align the timings of your subtitles with an integer number of the field duration:

1/50 or 1/60 seconds.

Cheers
Manusse
manusse is offline   Reply With Quote
Old 5th October 2007, 23:53   #234  |  Link
bagge1
Registered User
 
Join Date: Feb 2005
Posts: 36
Quote:
Originally Posted by manusse View Post
I don't know if Scenarist ACA (HDDVD) can accept SD Sup. However the message seems to mean that the times of your subtitles are not a multiple of the frame duration. Depending if your DVD is 50Hz or 60Hz, you should try to align the timings of your subtitles with an integer number of the field duration:

1/50 or 1/60 seconds.

Cheers
Manusse
The video is NTSC 1280x720@29.97, so I guess that should be 60Hz. The .srt file was only 23.98. This time, I converted the .srt to 29.97 before converting it to .sup. I get the exact same error in Scenarist though.

I´m not sure if Scenarist accepts SD Sup. Wouldn´t the position of the subtitles be messed up if the resolution is wrong?

Another solution might be to add a text-based subtitle (eg. .srt) as an "advanced subtitle track" directly into Scenarist. But I have to convert it to XML first somehow.
bagge1 is offline   Reply With Quote
Old 6th October 2007, 01:27   #235  |  Link
woah!
Registered User
 
Join Date: Oct 2003
Posts: 435
i believe its a supread/evodemux issue as to how it makes/outputs the hd sups. as i get the same error from all hd-dvd extracted sups when ripped from the source evo files with evodemux. but i have had success with blu-ray subs extracted using xport, and authored to a hd-dvd using ACA, it is a bitch tho huh..

Last edited by woah!; 6th October 2007 at 01:31.
woah! is offline   Reply With Quote
Old 6th October 2007, 10:10   #236  |  Link
manusse
SubtitleCreator's Co-Dev
 
manusse's Avatar
 
Join Date: Oct 2005
Location: France
Posts: 564
Quote:
and exported it to .PNG: s and scn-sst.
Quote:
but i have had success with blu-ray subs extracted using xport
I believe the timings are in the .scn-sst file. You could have a look at this text file to see what is the difference between BD and HD Sups. I suspect the time frame is different.

Cheers
Manusse
manusse is offline   Reply With Quote
Old 9th October 2007, 17:56   #237  |  Link
Taktaal
Registered User
 
Join Date: May 2003
Posts: 114
The problem I see here is that .scn-sst files are supposed to use frame numbers and not timecodes. Misaligned frames can't happen with those..,
Taktaal is offline   Reply With Quote
Old 12th October 2007, 09:57   #238  |  Link
tribble222
Registered User
 
Join Date: Oct 2003
Posts: 3
In regard to open source OCR software, the current best appears to be tesseract (http://code.google.com/p/tesseract-ocr). It seems to accept tiff's.

The best open source command line image editing program that I've found is ImageMagick (http://www.imagemagick.org)

One request I have for SUPRead is to have an "auto save images" option to make it save all the subtitles as images on load. Or better yet let it operate from the command line.

I'm working in Linux on making a completely automated "rip HD DVD to mkv" script and it'd be nice not to have to spawn the window in the right spot and simulate a mouse click :-P
tribble222 is offline   Reply With Quote
Old 14th October 2007, 18:31   #239  |  Link
Taktaal
Registered User
 
Join Date: May 2003
Posts: 114
I posted a new version that pretty much does BluRay perfectly now, as well as has a command line option to output the subtitles to a PNG image.

http://x0r.ch/suprip/

Now it's just the dictionary feature that's still missing for a 1.0 release
Taktaal is offline   Reply With Quote
Old 15th October 2007, 09:01   #240  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by Taktaal View Post
I posted a new version that pretty much does BluRay perfectly now, as well as has a command line option to output the subtitles to a PNG image.

http://x0r.ch/suprip/

Now it's just the dictionary feature that's still missing for a 1.0 release
Sounds good, will test later! Thanks!!

If you've fun and time, maybe you can check out this one?

http://de.wikipedia.org/wiki/OCRopus

It's an open source command line OCR tool (found through tribble222's link). Maybe it would work on that PNG image you're outputting?
madshi is offline   Reply With Quote
Reply

Tags
supread, suprip


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:40.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.