Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 8th October 2011, 12:19   #1  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Subtitle Edit

Subtitle Edit 4.0.7 is now out
https://github.com/SubtitleEdit/subtitleedit

SE is an open source (C#) subtitle editor with main focus on creating/editing/sync'ing/adjusting/fixing subtitles, but SE can also import and ocr vobsub and blu-ray image based subtitles (even from matroska/mp4 files), and DVB sub + teletext from .ts files.
SE supports 300+ subtitle formats - let me know if you need more
Can create/edit blu-ray sup and bdn xml files.

Available for Windows and Linux

Last edited by tebasuna51; 30th August 2024 at 11:10. Reason: Updated version number
Nikse555 is offline   Reply With Quote
Old 9th October 2011, 21:43   #2  |  Link
mastrboy
Registered User
 
Join Date: Sep 2008
Posts: 365
thanks...
mastrboy is offline   Reply With Quote
Old 10th October 2011, 06:59   #3  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,772
It looks promising.
I never used it before, that's why I would ask you how well manages SE32 to work with DVD subtitles (SUP), like retiming, synching (to other/preexistent SUP), bitmap editing etc?
__________________
Born in the USB (not USA)
Ghitulescu is offline   Reply With Quote
Old 10th October 2011, 20:09   #4  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Quote:
Originally Posted by Ghitulescu View Post
It looks promising.
I never used it before, that's why I would ask you how well manages SE32 to work with DVD subtitles (SUP), like retiming, synching (to other/preexistent SUP), bitmap editing etc?
Not too well
SE can read and ocr dvds/vobsub/blu-ray sup + a few more image based formats - but the only image based format SE can write is bdn xml/png.

(the blu-ray sup code is converted from 0xdeadbeef's java code for BDSup2Sub)
Nikse555 is offline   Reply With Quote
Old 16th October 2011, 03:36   #5  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,018
Crash during "Fix Common Errors".

Crash Report & Mi2.srt here:-

EDIT: Link removed.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 20th October 2011 at 10:43.
StainlessS is offline   Reply With Quote
Old 16th October 2011, 20:33   #6  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Quote:
Originally Posted by StainlessS View Post
Crash during "Fix Common Errors".

Crash Report & Mi2.srt here:-

http://www.mediafire.com/?4z8obiqskr8ikui
Hi StainlessS!

Thx for reporting this
Fixed here: http://www.nikse.dk/SubtitleEdit.zip

Last edited by Nikse555; 16th October 2011 at 21:28.
Nikse555 is offline   Reply With Quote
Old 20th October 2011, 10:46   #7  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,018
Thanks Nikse555, got something else to keep you busy.

SubTitle Edit 3.2.2, Build 25663

Crash during Spell Check (HunSpell, dont know if same error as previously reported in other thread)

Crash Report & DWL.srt here:-
http://www.mediafire.com/?6pltpi72a82lz52
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???
StainlessS is offline   Reply With Quote
Old 21st October 2011, 02:19   #8  |  Link
MajorX
Registered User
 
Join Date: Mar 2010
Posts: 52
Thanks Nikse555 for new version of SE.
I have some problem with OCR ...plzz help me...when i use 3.2 OCR of VobSub & Blu-ray sup files are working perfectly but when uninstall it and install new version 3.2.2 my OCR not working now..it only shows orange lines no text.

Sample of *.sup subtitles...can u plzz check these subtitles.

http://www.mediafire.com/?zqml8hbrcy6jqgt

Last edited by MajorX; 21st October 2011 at 07:20.
MajorX is offline   Reply With Quote
Old 22nd October 2011, 07:42   #9  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Quote:
Originally Posted by StainlessS View Post
Crash during Spell Check (HunSpell, dont know if same error as previously reported in other thread)
Looks like it's still Hunspell suggest!
I could not re-create this error on my Win7 machine, but I've tried to fix it here (by running suggestions in a separate thread): http://www.nikse.dk/SubtitleEdit.zip
Any better?
Nikse555 is offline   Reply With Quote
Old 22nd October 2011, 18:52   #10  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Quote:
Originally Posted by MajorX View Post
Thanks Nikse555 for new version of SE.
I have some problem with OCR ...plzz help me...when i use 3.2 OCR of VobSub & Blu-ray sup files are working perfectly but when uninstall it and install new version 3.2.2 my OCR not working now..it only shows orange lines no text.

Sample of *.sup subtitles...can u plzz check these subtitles.

http://www.mediafire.com/?zqml8hbrcy6jqgt
Aye aye, Major
New version upped: http://www.nikse.dk/SubtitleEdit.zip

Any better?
Nikse555 is offline   Reply With Quote
Old 23rd October 2011, 19:30   #11  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,018
Sorry for the delay, Nikse555,
Before I tried your update, I had to download the srt from MediaFire as I did not
keep a verbatim copy. I tried it with the original faulting build 25663, and it did
not fault. Tried this several times, no fault. Got the version srt that I kept,
(probably spell checked via other means) and checked that, same thing, no
fault. Have not ripped any other subs since then (I think) and made no changes
to the setup. I guess it will have to remain a mystery.

EDIT: Also tried with build 13726, no fault (earlier build No ???).
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 23rd October 2011 at 19:37.
StainlessS is offline   Reply With Quote
Old 24th October 2011, 06:51   #12  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
What do I do to OCR German subs? I've downloaded a German Tesseract package and unpacked it to the program dir but to no avail. I pretty much have to type every word?
__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch
Chetwood is offline   Reply With Quote
Old 24th October 2011, 07:36   #13  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Quote:
Originally Posted by StainlessS View Post
Sorry for the delay, Nikse555,
...
Tried this several times, no fault.
...
I guess it will have to remain a mystery.
Yep, the nhunspell "suggest-method" is not entirely stable


Quote:
Originally Posted by Chetwood View Post
What do I do to OCR German subs? I've downloaded a German Tesseract package and unpacked it to the program dir but to no avail. I pretty much have to type every word?
The German tesseract package should be unpacked to Tesseract\tessdata. Unpacked the file is called deu.traineddata.
Do choose Tesseract as OCR method (not image compare)
And if you're lazy just get this version with German dictionaries included: http://subtitleedit.googlecode.com/files/SE322DE.zip
Nikse555 is offline   Reply With Quote
Old 25th October 2011, 07:55   #14  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
Mm, I had unpacked it to Subtitle Edit\tesseract\tessdata but ok, your de package is fine, thanks. It also works pretty good, however some events described in parenthesis for the hearing impaired are recognized with mixed case, like

(KEucH†) instead of (KEUCHT)
(I_AcH†) instead of (LACHT).

Also, the small t is recognized as a small l which messes up a lot of items and can only be fixed manually. These new words don't even exit in the German language so shouldn't spellchecking kick in with "prompt for unkown words" being checked? Then the distance between two words ending with r and starting with j is not recognized. Instead of "aber jetzt" it reads "aberjetzt". What can I do to improve this? Thanks.
__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch

Last edited by Chetwood; 25th October 2011 at 08:20.
Chetwood is offline   Reply With Quote
Old 25th October 2011, 18:04   #15  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Quote:
Originally Posted by Chetwood View Post
These new words don't even exit in the German language so shouldn't spellchecking kick in with "prompt for unkown words" being checked?
Yes... problems with loading German dictionary should be fixed here: http://www.nikse.dk/SubtitleEdit.zip
(Tesseract should also be a bit faster now)

Quote:
Originally Posted by Chetwood View Post
Then the distance between two words ending with r and starting with j is not recognized. Instead of "aber jetzt" it reads "aberjetzt". What can I do to improve this? Thanks.
When you press "change all" or "use always" the correction is remembered in OcrFixReplacelist.xml...
Nikse555 is offline   Reply With Quote
Old 25th October 2011, 22:00   #16  |  Link
xekon
Registered User
 
Join Date: Jul 2011
Posts: 224
This actually works really good! almost all of the text is right on, and the GUI guides your through smoothly when it needs a fix.

I have a rather strange auto fix though (some kind of error or bug): http://i1208.photobucket.com/albums/...kon/weirds.png

If you want the .SUP that caused this error to occur give me an email address I can send the file to. (Upon further testing this weird error only happens if the "Try MS MODI OCR for unknown words" checkbox is checked, If I un-check it then this strange substitution does not happen.)

The only OCR error that I get that does not get automatically corrected is the letter "k" being detected as "l<" and not being auto corrected: http://i1208.photobucket.com/albums/cc361/xekon/k.png

I notice similar errors in OCR but they DO get auto corrected like "I\/ly" -> "My"

is there a way I can add "l<" to be autocorrected to "k" ?

also a setting in the options panel to disable "Try MS MODI OCR for unknown words" by default would be handy, then I wouldn't have to uncheck it every subtitle I load

I have also had "d" been detected as "ol" pretty often, then the spell checker dont recognize the word so I edit it manually and change the "ol" to "d"

like the word worried, gets detected as worrieol

Last edited by xekon; 25th October 2011 at 22:57.
xekon is offline   Reply With Quote
Old 26th October 2011, 10:03   #17  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Quote:
Originally Posted by xekon View Post
...a setting in the options panel to disable "Try MS MODI OCR for unknown words" by default
OK, this setting is now remembered - but I've also improved then check for when to use MODI, so do try to keep it on.

Tesseract (new 3.01 version) now runs in it's own thread, so it should be a bit faster too.

Link to new version:
http://www.nikse.dk/SubtitleEdit.zip

How is it working?
Nikse555 is offline   Reply With Quote
Old 26th October 2011, 14:22   #18  |  Link
Chetwood
Registered User
 
Chetwood's Avatar
 
Join Date: Nov 2001
Posts: 1,104
Quote:
Originally Posted by Nikse555 View Post
problems with loading German dictionary should be fixed here: http://www.nikse.dk/SubtitleEdit.zip
Mh, this file contains no German dictionary so I copied the one from the de.zip over.

Quote:
Originally Posted by Nikse555 View Post
When you press "change all" or "use always" the correction is remembered in OcrFixReplacelist.xml...
The German Umlaut ü (u with two dots above) is often recognized as two i's: ii. Since it's common in several words, how do I replace it for all of them? Thx.
__________________

MultiMakeMKV: MakeMKV batch processing (Win)
MultiShrink
: DVD Shrink batch processing
Offizieller Übersetzer von DVD Shrink deutsch

Last edited by Chetwood; 26th October 2011 at 14:25.
Chetwood is offline   Reply With Quote
Old 26th October 2011, 14:52   #19  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 432
Quote:
Originally Posted by Chetwood View Post
The German Umlaut ü (u with two dots above) is often recognized as two i's: ii. Since it's common in several words, how do I replace it for all of them? Thx.
You can edit [Subtitle edit folder]\Dictionaries\deu_OCRFixReplaceList.xml - add a new WordPart under PartialWords:
<PartialWords>
...
<WordPart from="ii" to="ü" />
</PartialWords>
SE will now look for correct spelled words, where "ii" is replaced with "ü".

You can also take a look at "eng_OCRFixReplaceList.xml".
Nikse555 is offline   Reply With Quote
Old 26th October 2011, 23:41   #20  |  Link
chainring
Registered User
 
chainring's Avatar
 
Join Date: Sep 2006
Posts: 179
Just wanted to stop in here and say thank you for this awesome tool. I love loading up a .sup, letting OCR rip through and having minimal work to correct errors. I can get through an entire movie in 20 minutes.
chainring is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:29.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.