Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
30th September 2007, 14:46 | #221 | Link | ||
Registered User
Join Date: May 2003
Posts: 114
|
Quote:
http://x0r.ch/suprip/ Quote:
|
||
30th September 2007, 20:47 | #223 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
madshi - I wrote a tool that can take supread .srt output and mix it in with ABBYY's FineReader HTML output and create a perfectly usable SRT file.
The technique is this: Load the .sup file in SupRead. Save the SRT file WITHOUT using OCR. Then save the subtitles as PNG files. Close supread. Open photoshop and batch process the subs to maximize the contrast for each PNG file and lower the brightness (to force subtitle outlines to be same color as the black background for easier OCR). After photoshop is done, open ABBYY FineReader and load in all the PNGs. OCR them. Save as HTML (some options are needed - like include horizontal lines to signify the end of each subtitle group) - you will get an HTML file. I wrote a little tool to merge the subtitles from the SRT and the HTML into one SRT file. That's it. To give you an idea of how reliable this is: There are occasional problems with the letter I being confused for the number one (when the line begins with a dash). OCR accuracy appears to be 99% using FineReader. The most important step is the contrast and brightness adjustment - the black outline around the white letters on a transparent background is VERY BAD for OCR. Last edited by Rectal Prolapse; 30th September 2007 at 20:50. |
30th September 2007, 22:07 | #226 | Link | |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
|
|
30th September 2007, 22:36 | #227 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
Madshi - yeah - a good optimization would be using a bitmap tool that can do the contrast/brightness adjustment without the bloat of Photoshop!
I'm not so sure about a massive bitmap though - that could break a lot of programs because I can imagine a bitmap too large crashing a lot of them! And processing time could increase exponentially as well. (although to be fair, finereader is really horribly slow processing each file...) Including is a Visual Studio 2005 project for my little tool, written in C++. It is called SRTMaker. It's very rough though - it doesn't properly check command-line arguments, for example, but hopefully will help you if you decide to make a better tool. No restrictions, do whatever you like with it. Last edited by Rectal Prolapse; 30th September 2007 at 22:43. |
30th September 2007, 22:37 | #228 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
Code:
Here are the notes I wrote on my Blu-ray subtitle conversion workflow: Tools needed: • Xport • Supread • Adobe Photoshop • ABBYY Finereader Extract the subtitle with xport. Load the SUP file with Supread: ○ Save the subtitles in PNG format using Save Bitmaps. ○ Save an SRT file (do not OCR the subtitles within Supread) Open Photoshop: 1. Use an action that modifies the images: i. Brightness to -100, Contrast to +100 ii. Overwrites the PNG. This forces the background color to pure black and should eliminate any dropshadows in the subs. This makes OCRing much more reliable. 2. Select File->Batch from Photoshop's menu. i. Select the folder container all the PNGs. ii. Select the action. Execute. This should convert all PNGs to have a black background and removes dropshadows. Launch Finereader: 1. Press the Open Image button, and select all the PNGs. 2. Select all the images (press CTRL-A), then select Image->Correct Resolution from the menu. 3. Choose a resolution of 300 DPI, and apply to All images in the batch. A dialog box with progress of the DPI assignment should appear. 4. Go to Image->Load Block Template. The template should be a Text Block that is the size of the whole image (roughly 1920x1080 or a bit less). It will be applied to all the images. 5. Click the Read All button (dropdown button next to the Read All button). 6. After everything is OCR, you can go through each page, making sure things look ok. 7. Click on Save Pages button. i. Formatting options should be set for HTML: 1) Retain Layout: Retain font and font size. 2) Save mode: Simple. 3) Text settings: Keep Line breaks checked, Use Solid Line as Page break checked, uncheck Retain text color. ii. Save as type Unicode HTML (UTF-8). iii. Save Pages: All pages. iv. Create a single file for all pages. 8. Click Okay to save the OCR'd pages to one file. 9. Use SRTMaker, giving it the SRT and HTML file on the command-line to generate a new SRT that is piped to standard output. Last edited by Rectal Prolapse; 30th September 2007 at 22:42. |
1st October 2007, 00:05 | #229 | Link |
Registered User
Join Date: May 2003
Posts: 114
|
Ok I made a sample PNG how such a combined subtitle image would look.
http://x0r.ch/suprip/serenity-50.png That Finereader looks decent, but it's 130€, and to be honest, I'd rather have less, not more commercial software that people have to "acquire" one way or another to rip HD movies. |
1st October 2007, 03:12 | #230 | Link |
Registered User
Join Date: Mar 2005
Posts: 433
|
I'm sure there are a lot of opensource/free OCR apps that work well with PNGs too! At least I hope so!
My method is just one way of getting reliable OCR - for those who cannot wait for the proper tools to come out. One problem with the steps I outlined above - you may not want to use unicode output - I had issues on some titles because of it (usually accented characters). They will be recognized easily by FineReader, but they don't work well with the SRT format. Last edited by Rectal Prolapse; 1st October 2007 at 03:15. |
1st October 2007, 07:47 | #231 | Link | ||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
@Rectal Prolapse, thanks for the code! |
||
5th October 2007, 21:11 | #232 | Link |
Registered User
Join Date: Feb 2005
Posts: 36
|
@Pelican9
I have a similar problem as Musky5790 described here http://forum.doom9.org/showthread.ph...54#post1036754 I´m trying to add subtitles from a .srt file into Scenarist ACA (HDDVD) I converted a´the .srt file to .sup with SubtitleCreator. Then I imported the .sup into SUPread and exported it to .PNG: s and scn-sst. I get the same error from Scenarist as Musky5790: internal software error:.\core\AUs\Advmux_timeGrip.ccp. line 94 -2 Advmuxmux::TimeGrid::addgrippointfill -- time not on field boundary. Is this because the .sup file I get from SubtitleCreator is not a hd-sup but just a SDDVD sup? Is there any other way around it? Thanks for the great effort you put in developing this! |
5th October 2007, 21:19 | #233 | Link | |
SubtitleCreator's Co-Dev
Join Date: Oct 2005
Location: France
Posts: 564
|
Quote:
1/50 or 1/60 seconds. Cheers Manusse |
|
5th October 2007, 23:53 | #234 | Link | |
Registered User
Join Date: Feb 2005
Posts: 36
|
Quote:
I´m not sure if Scenarist accepts SD Sup. Wouldn´t the position of the subtitles be messed up if the resolution is wrong? Another solution might be to add a text-based subtitle (eg. .srt) as an "advanced subtitle track" directly into Scenarist. But I have to convert it to XML first somehow. |
|
6th October 2007, 01:27 | #235 | Link |
Registered User
Join Date: Oct 2003
Posts: 435
|
i believe its a supread/evodemux issue as to how it makes/outputs the hd sups. as i get the same error from all hd-dvd extracted sups when ripped from the source evo files with evodemux. but i have had success with blu-ray subs extracted using xport, and authored to a hd-dvd using ACA, it is a bitch tho huh..
Last edited by woah!; 6th October 2007 at 01:31. |
6th October 2007, 10:10 | #236 | Link | ||
SubtitleCreator's Co-Dev
Join Date: Oct 2005
Location: France
Posts: 564
|
Quote:
Quote:
Cheers Manusse |
||
12th October 2007, 09:57 | #238 | Link |
Registered User
Join Date: Oct 2003
Posts: 3
|
In regard to open source OCR software, the current best appears to be tesseract (http://code.google.com/p/tesseract-ocr). It seems to accept tiff's.
The best open source command line image editing program that I've found is ImageMagick (http://www.imagemagick.org) One request I have for SUPRead is to have an "auto save images" option to make it save all the subtitles as images on load. Or better yet let it operate from the command line. I'm working in Linux on making a completely automated "rip HD DVD to mkv" script and it'd be nice not to have to spawn the window in the right spot and simulate a mouse click :-P |
14th October 2007, 18:31 | #239 | Link |
Registered User
Join Date: May 2003
Posts: 114
|
I posted a new version that pretty much does BluRay perfectly now, as well as has a command line option to output the subtitles to a PNG image.
http://x0r.ch/suprip/ Now it's just the dictionary feature that's still missing for a 1.0 release |
15th October 2007, 09:01 | #240 | Link | |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
If you've fun and time, maybe you can check out this one? http://de.wikipedia.org/wiki/OCRopus It's an open source command line OCR tool (found through tribble222's link). Maybe it would work on that PNG image you're outputting? |
|
Tags |
supread, suprip |
|
|