Log in

View Full Version : SubExtractor - New Sub Ocr App


Pages : 1 2 3 4 5 6 7 8 9 [10]

Thunderbolt8
18th August 2013, 07:16
tappen, could you please add something to show the subtitle line number of each line listed when using advanced word spacing adjustment? when trying to find a certain problematic line in combination with a letter which is used a lot (e.g. non-italic "y") it can take quite some time to find the correct line in that list. if theres some indication of the subtitle line say in the top right corner of the presented subtitle picture, then it would be much faster to find your way through the list till you find the line you are looking for.

Tappen
19th August 2013, 21:26
rhaz: in the first problem, I think you can just match a, then split the accent+, character, then when the accent lights up for matching select it and the a and type in the real character. This shouldn't cause other problems.

in the 2nd problem - I'll look into it.

Thunderbolt8: good idea. I'll see if the line number can be added

Taskforce
20th August 2013, 00:35
Hi, thanks for the terrific tool. I was wondering if it would be possible to change things so that when you are at the saving dialog, it defaults to the directory of the open file, instead of the last directory used (at least when using save as).

It does this when you first open the tool for the first OCR job. However, if you're doing a series, and are doing multiple episodes separately, it defaults to the last directory used instead of the one you've loaded the next ep from. I know this really isn't anything major, but would help speed up processing episodic series a little bit by saving a few mouse clicks.

Weirdo
22nd August 2013, 19:26
Isn't it possible for italics, to simultaneously press CTRL+(key) or is it my Logitech keyboard that won't accept it (I get the Windows error sound)? I'm forced to press Ctrl and use the mouse for italics, which is a bit of a time-waster for the otherwise ultra-smooth SubExtractor.

edit: Just saw the space bar option, nice workaround.

speedoflight
27th August 2013, 00:42
Is the actual app working with older subtitles, even if they are still .sup?? im trying to ocr a pair of subtitles and the results are just incredible horrible.. just like an alien languaje.. xD. I make the ocr,i totally waste about 1 hour , and when i go to the subtitle file, not even a word is correctly ocred. I dunno wat i am doing wrong.

Here is the sub, download it and investigate, i dunno wats going on.. (It is spanish languaje)

https://www.dropbox.com/s/mtcat2dz9sah44j/Extra%203%20-%20spanish.sup

EDIT: Well, since it looks like nobody is answering, i just discarded this program. Im using subtitle edit instead, that , surprisely, it produces far better results (almost in my case). In fact, this program started really well, and i used it in a couple of subtitles, but i realized that is not really a good ocr program, too many bugs, too many errors and it looks like the author is not supporting the main languajes after all.

rhaz
31st August 2013, 14:03
Yeah I agree this thread is very dead. It's two weeks I'm still waiting reply for my subs issue I mentioned in #453 and attached subs below. I can't even OCR those on Subtitle Edit, gives too many errors (at least it opens though).

Thunderbolt8
8th September 2013, 12:11
Thunderbolt8: good idea. I'll see if the line number can be addeddid you already find some time for this? would be really helpful to me now :thanks:

btw. could please also change in the spacing adjustment view that the point of focus jumps back to the first line each time you are doing any change of spacing? if you want to find out how much change/clicks are needed for a character in order to have a visible change in the text and you are dealing with a characters which features many examples in the subtitle, then you have to find and click yourself down to that instance you are inspecting each time again to see if anything has changed. thats just annoying. it would be nice if the point of focus just stays where it is after committing a change.

Thunderbolt8
8th September 2013, 15:29
got a movie in which the exclamation mark "!" is recognized as " '. " (apaostrophe+fullstop). deleting both the characters for fullstops and apostrophes & commas doesnt help, deleting all characters of the movie doesnt do anything either. any idea what I could do here?

Tappen
9th September 2013, 18:07
Thunderbolt8: I'm very busy with other things at the moment. It'll probably be a month or more before I can tackle the list of issues.
I also can't help with your apostrophe+fullstop problem, sorry. I can see how this would happen but the software just isn't set up to deal with the case where the top part of an exclamation mark is identical to an apostrophe. Normally the apostrophe is a much smaller character.

Thunderbolt8
9th September 2013, 18:54
its not that big of a problem, I can just copy & replace all the wrong ones for good. just want to be sure that doesnt translate over to other movies when saving characters. so I deleted all characters for this movie after OCRing.

about the rest, better late than never :D

Thunderbolt8
22nd September 2013, 21:18
are difference made in spacing of letter also saved in the OcrMap.bin?

Tappen
22nd September 2013, 21:38
are difference made in spacing of letter also saved in the OcrMap.bin?

No they are not.

Thunderbolt8
24th September 2013, 11:52
so which files Id have to save in case of a reinstalling windows to keep spacing changes I made?

Tappen
24th September 2013, 22:05
They're saved in the user.config file. It's in the directory that's somewhere like:

C:\Users\AccountName\AppData\Local\DvdSubExtractor\DvdSubExtractor.exe_Url_vxbiiw1ruyu1vjmfw1fhh1ifophvjesa\1.0.1.3

johnsonlam
25th September 2013, 19:35
I got some idx and sub file, that's why I start using Subtitle Extractor, but soon I found it may be still ANSI? Chinese input method simply can't active in the program, however I can copy a Chinese character (or Japanese, in Unicode) into the "Manually Enter Character" and it works fine. Since it's not optimize for this kind of block character, always have duplicate, suggest a fuzzy logic in percentage (+-2%) or tolerance can be added.

And I still trying to figure out how to save the "trained data", since a series of anime simply have same subtitle.

Great program, thanks!

Tappen
25th September 2013, 20:32
johnsonlam: Could you try downloading version 1032d, the latest beta, and checking the box "Wait for Enter Key"? This might solve your problem for entering Chinese characters.
There is a fuzzy logic component but it is only active on HD (720p or higher) subtitles.

johnsonlam
26th September 2013, 05:21
johnsonlam: Could you try downloading version 1032d, the latest beta, and checking the box "Wait for Enter Key"? This might solve your problem for entering Chinese characters.
There is a fuzzy logic component but it is only active on HD (720p or higher) subtitles.

Thanks for your advice. I'm using that function already.

A bit more research, it's the program eat up the special-key combination, so Windows OS can't switch to that input method, using alternate way to enable the language switch seems solved the problem.

Too bad the great fuzzy logic not enabled in lower resolution.

Chetwood
1st October 2013, 11:45
Chetwood, That's a bug that's been around forever and I've never bothered to fix. Maybe now that someone other than me has found it...
Will it be fixed in the next release? The more subs I rip in a row the more likely it becomes I mistype. Just happenede to me again and there's still no Undo for the last item. Thx.

rhaz
22nd October 2013, 11:29
Hi. I have a question. I was using this tool for two months now and I have collected big collection of various characters, versions and etc.

So now for some reason when I start OCR'ing all my collected characters are gone from OCR Matches and I have to start from scratch for no reason. Why's that?

Tappen
25th October 2013, 06:22
The OCR Matches dialog only shows the ones that have been used in the current file (either added or re-used).

I can't think of a way to show all the matches (100s for each character just in the starting database) in a way that would be useful. So it's heavily trimmed down.

Wizzu
27th October 2013, 18:02
Very glad to have discovered this nifty app.

Really helps.

Congrats Tappen! And thanks! :cool:

Wizzu
3rd November 2013, 11:10
After having processed about 25 movies subtitles files with this app, I really want to congrat the developer again.

Everything is so well thought-out, ergonomy is top-notch (I really love the [ctrl-arrow] shortcuts to select characters parts)

Thanks for making my life easier! Where's the "donate" button?

Thunderbolt8
3rd November 2013, 17:28
tappen, just asking: are you still busy? or do you think you might have a bit of time soon to work on those little improvements in the advanced word spacing tab I suggested?

Thunderbolt8
10th November 2013, 17:19
is the fullstop character "." for some reason excluded from the adjusting spacing effect? got a line in which a pistol clip e.g. " .22" is set right next the to the word preceeding it e.g. "the.22" and even when I increase left spacing of "." to 20 and right spacing of "e" to 20, nothing happens (apart from all other "e" characters getting set apart from their following letter)

Tappen
11th November 2013, 00:44
Sorry Thunderbolt8 I'm still busy at work. I think you're right and there's a rule about "." (and other punctuation) spacing that over-rules the word spacing options on the left side. It saves so many mistakes and causes so few I'd hesitate about removing such a rule.

Thunderbolt8
12th November 2013, 00:45
well problem in this case usually only occur in that cases if weapon calibre " .xx". its rather easy to find such cases by searching for digits 0-9 (even though its annoying having to do this potentially for each file). so if such a chance really led to more problems, then Id say its better to keep it as it is.

CoolRaoul
13th November 2013, 13:48
Hello everybody
Just discovered this application which I thought could help me to convert DVB Subtitle streams from my records made with USB Stick TNT Recorder

I demux streams using "TS Doctor", then with Subtitle Exxtractor use file->open to open the .sub file.

Unfortunately after a few seconds a popups appears with "no subtitles found" error message.

What I'm doing bad?

**edit**
I may upload the .sub file somewhere if it can help someone to diagnose.

Tappen
16th November 2013, 03:22
The problem is that this tool doesn't support .sub files, sorry. Only .sup and idx/sub pairs from dvds

CoolRaoul
17th November 2013, 16:22
The problem is that this tool doesn't support .sub files, sorry. Only .sup and idx/sub pairs from dvds

Oh typing error: TS Doctor generate ".sup" files not ".sub"

Would you like a sample one?

NB: answering to random questions when posting in this forum is definitively not easy for newbies like myself!

CoolRaoul
1st December 2013, 11:24
Maybe this thread is not the "official" place to discuss about SubExtractor issues.

In that case could someone give me the correct link?

Thunderbolt8
1st December 2013, 18:59
it is but the creator is busy atm with other things as he said.

CoolRaoul
7th December 2013, 17:37
it is but the creator is busy atm with other things as he said.

Ah ok,
I'll wait then..

(And since I'm not receiving forum email notifications I will have to check this thread periodically)

Note: answering to random questions to validate post here is an horrible thing for non specialist like me!!!

Thunderbolt8
21st January 2014, 04:29
would it be possible to add "er..." and "erm..." to the SHD removal as well?

for " er..." " erm..." the space and the the er(m) need to go

for ",er..." ",erm..." the comma and the er(m) need to be removed

for "Er..." "Erm..." at the beginning of a sentence or line, it should be removed and the first letter of the next word italicised.

I hope that wont break anything.

edit: well not to sure about the beginning of a sentence or line thing, because it could potentially lead to slight loss of sync for that specific sentence/line, depending on how much of a break is there during the speech. thats a boundary we perhaps shouldnt cross.

Betsy25
7th February 2014, 22:36
Regarding pure OCR, this is by far the best tool out there, it would be a shame if this project would just die out.:(

Chetwood
8th February 2014, 07:15
Word. Apparently Tappen is busy with RL so I guess, we'll just have to wait.

Thunderbolt8
26th March 2014, 15:26
tappen, do you know have time for little improvements?

CoolRaoul
29th April 2014, 19:12
Regarding pure OCR, this is by far the best tool out there, it would be a shame if this project would just die out.:(

Unfortunately I've been unable to make it work yet and did'nt find any alternative neither.

It's a pity if the projet would be be discontinued as it seems.

rhaz
21st May 2014, 15:59
Hi. Using this great tool for over a year now. Still using 1.0.3.2, why no updates? Anyway, I have a question. No matter what ♪ I use for that music symbol (ALT 13) when I save it to .srt, all ♪ symbols becomes ? marks. Really pain in the ass then to replace each ? with ♪ manually. So why's that? Maybe it would work if it saved to UTF8 format.

deco20
21st May 2014, 16:14
Hi. Using this great tool for over a year now. Still using 1.0.3.2, why no updates? Anyway, I have a question. No matter what ♪ I use for that music symbol (ALT 13) when I save it to .srt, all ♪ symbols becomes ? marks. Really pain in the ass then to replace each ? with ♪ manually. So why's that? Maybe it would work if it saved to UTF8 format.
Definitely, you have to save it with UTF-8 encoding.

rhaz
21st May 2014, 18:41
How do you do that? There's no option to choose UTF8 when clicking Save as.

deco20
21st May 2014, 18:44
How do you do that? There's no option to choose UTF8 when clicking Save as.
Go to Options and uncheck "Store Srt files as ANSI (instead of UTF-8) Codepage".

rhaz
5th June 2014, 17:36
Edited. Nevermind, solved. Used better ripper to extract subs first.

Thunderbolt8
20th February 2015, 22:16
why cant some HD DVD .sups actually not be opened with this prog?

Thunderbolt8
5th March 2015, 22:12
is SubExtractor actually open source? if so, would anyone like to take over and work on this project? there hasnt been any development for over two years now and there a few bits and pieces which still could need improvement. I still like to use this tool because its really fast with OCRing and has good removal of hearing impaired stuff for certain type of subtitles which need to be .ass in order to retain their original screen line position.

Thunderbolt8
6th December 2020, 18:30
any updates here? still my most reliable & fastest goto program to OCR subs.

locotus
6th December 2020, 19:29
any updates here? still my most reliable & fastest goto program to OCR subs.

Plus 1, hope Tappen is still on line.