Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 5th January 2014, 21:25   #641  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
just a little help
ive uploaded my matrix file for people to grab here

http://www.filedropper.com/sbmatrix

its based on aispam but i did a lot of cleanup on it and have added results from more dvd's


Setup help:
please remember to set sensibility to 1000
and if you have issues with i and the upside down ! in Spanish, then lower "max char top diff..." to 4.


Cleanup:
- Removed all smaller characters (, . ; : ' ") and the alike to avoid wrong detection.
- Removed all "takes with next" characters, since the have a reduced effect to get recognized correctly.
- Removed all instanced of multi character entries
- Manually removed entries based on only partial matching of characters
- Manually removed entries witch contained matching of parts of next/previous character

This was done to reduce false detections as i rather want sub rip to ask me a few more times rather than deliver wrong detections.

Last edited by Sven Bent; 7th January 2014 at 01:53.
Sven Bent is offline   Reply With Quote
Old 13th January 2014, 20:42   #642  |  Link
Dean007
Registered User
 
Join Date: Feb 2011
Posts: 15
Anyone has the same problem, where you have to always type same letters over and over again?



Quote:
Originally Posted by Sven Bent View Post
just a little help
ive uploaded my matrix file for people to grab here

http://www.filedropper.com/sbmatrix

its based on aispam but i did a lot of cleanup on it and have added results from more dvd's
Thank you, might come in handy!
Dean007 is offline   Reply With Quote
Old 14th January 2014, 01:36   #643  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
Quote:
Originally Posted by Dean007 View Post
Anyone has the same problem, where you have to always type same letters over and over again?
hmm sounds like the character matrix files is not getting saved/read.

Is your char matrix files in a write protected area ?
If its because the characters looks a bit different even though they are the same you can lower the sensibility for this dvd only
Sven Bent is offline   Reply With Quote
Old 14th January 2014, 08:42   #644  |  Link
Dean007
Registered User
 
Join Date: Feb 2011
Posts: 15
How do I check for that?

I tried on 2 PCs. First on mine, and then on laptop.
Dean007 is offline   Reply With Quote
Old 4th February 2014, 15:04   #645  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
done with the a section of my wife's dvd list so i thougt it was time to reupload my matrix file

http://www.speedyshare.com/zSgmF/SBmatrix.zip

I've done a bit more cleanup. I removed the (few) "°" entries as the sometimes got detected as the beginning of "%"
Sven Bent is offline   Reply With Quote
Old 9th April 2014, 05:36   #646  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
a new matrix file update

http://speedy.sh/5JRf8/SBmatrix.zip

UPDATES:
I've changed my ways a bit with this one. im now including punctuation signs, since the matrix file is soo big now that it can easily run with even more sensitive settings. I've also includes nodes signs, even when they need a "take with next" approach sin no other character can be confussed with the start of a node sign.

Percentage sign that are only partial detect is now deal as a combination of degree sign, forward slash, degree sign.


RECOMMENDED SETTINGS:
its highly recommend to adjust the OCR sensitivity to the following (Options > Advanced OCR setup > OCR engine setup):
100
1
1
2

also make theses lines in you punct.dic file
°/°
%
to ensure corect handling of percentage signs

Disable "correct capital letters" and "format hole words only"
as they make things worse more often than better
Sven Bent is offline   Reply With Quote
Old 23rd April 2014, 04:25   #647  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
new matrix file updated

http://speedy.sh/sbrpS/SBmatrix.sum

remember to set those ocr setings to
1000
1
1
2
Sven Bent is offline   Reply With Quote
Old 11th September 2014, 15:53   #648  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
New update

http://speedy.sh/unG6F/SBmatrix.zip
* Had a few more dvd's where it had to detect all character. So plenty of stuff updated plus plenty of dvd's with minor updates to the matrix
* 2 characters deleted as they once got misdetected as wrong case
* Also Im now starting to keep entries of multiple characters

Partial detections, noise ftom neighbour characters and take with next are still cleaned out of the matrix to reduce/remove false detections.


Please remember to set OCR sensitivity to:
1000
1
1
2
Sven Bent is offline   Reply With Quote
Old 11th September 2014, 16:21   #649  |  Link
Dean007
Registered User
 
Join Date: Feb 2011
Posts: 15
Thank you! Will come in handy
Dean007 is offline   Reply With Quote
Old 15th November 2014, 20:41   #650  |  Link
gonwk
Registered User
 
Join Date: Aug 2006
Posts: 164
Quote:
Originally Posted by Sven Bent View Post
New update

http://speedy.sh/unG6F/SBmatrix.zip
...

Please remember to set OCR sensitivity to:
1000
1
1
2
Hi Sven Bent,

A Newbie question ...

Q1: Where do I put your file?

Q2: OCR Sensitivity ... for my SubRip, are you talking about under "Advanced OCR Setup" tab and under "OCR Engine Setup" for the numbers that you have suggested?

Because on my I think they are currently as follows ..
980
2
2
6

Am I looking at the right place ... see jpg attached.

Thanks,

G!
Attached Images
 
gonwk is offline   Reply With Quote
Old 24th November 2014, 16:45   #651  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
You can put the file anywhere really. but i prever to put it in the
"ChMatrix" subfoler of where you installws subrip

E.G.
C:\Program Files\SubRip 1.50b5\ChMatrix



yes you are lookign at the right place.

move the slider so its says 1000
and adjust the values below the slider to
1
1
2


this makes the program search for more exact copies.
The trade of is it might not find one the resembles close enough and ask you to define the character. however with this big database it should happen very rarely. and the improved accuracy lower the riscs of getting wrong but similiar character like "i" and the reverse "!" from spanish.

if you want it to "guess" a bit more you can increase it to 2.2.2 to reduce the number of times it will ask you. i would not go above 2 2 2


-- edit --
If you want it to go a bit faster disable "show pict." and the autoscroll in the text window it helps a bit on systems under heavyload

Last edited by Sven Bent; 24th November 2014 at 16:51. Reason: -- more info --
Sven Bent is offline   Reply With Quote
Old 24th November 2014, 22:43   #652  |  Link
gonwk
Registered User
 
Join Date: Aug 2006
Posts: 164
Hi Sven Bent,

Thanks for your dummy-proof explanation. Appreciate the Help.

G!
gonwk is offline   Reply With Quote
Old 5th December 2014, 06:19   #653  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
just a little update I'm now hosting the file myself so i don't need to upload it everytime i update it

http://162.248.14.188:12380/

Note that this is the direct link to my file that i work with, so every time I've done some subtitles work, the file is updated.
so basically its always up to date

Last edited by Sven Bent; 5th December 2014 at 07:30.
Sven Bent is offline   Reply With Quote
Old 7th December 2014, 03:46   #654  |  Link
gonwk
Registered User
 
Join Date: Aug 2006
Posts: 164
Hi SubRip team,

@ SubRip team, I downloaded and tried both SubRip 1.50 b6 and b7 versions and the Text and Outline Color does Not work, I mean it does not give the 3 Color Bar Choices, it stays Blank.

@ Sven Bent, I was wondering if SubRip can do Chinese Characters!?!? I don't read or speak Chinese ... so, I wonder if the Chinese OCR can be made Dummy Proof? Is this possible?

Thanks,

G!

Last edited by gonwk; 7th December 2014 at 03:50.
gonwk is offline   Reply With Quote
Old 7th December 2014, 14:19   #655  |  Link
Dean007
Registered User
 
Join Date: Feb 2011
Posts: 15
Quote:
Originally Posted by Sven Bent View Post
just a little update I'm now hosting the file myself so i don't need to upload it everytime i update it

http://162.248.14.188:12380/

Note that this is the direct link to my file that i work with, so every time I've done some subtitles work, the file is updated.
so basically its always up to date
Thank you for this! Really appreciate it!

I do have a problem! In v1.50b5 when I try to rip subtitles (idx) I get this, and for every single letter


In 1.50b7 same subtitles I get this


Any suggestions?
Dean007 is offline   Reply With Quote
Old 9th December 2014, 16:35   #656  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
i dont know abotu the b7 version it looks like the data is getting corrupted before it hits the ocr part.

In regards to b5 does it ask you for identifying all characters ?
if so you might just have hit a very different font than has ever gone though my subrip.
if you are only going to rip english go to the OCR section and set the last parameter back to the default 6. Lowering this values as i originally suggest, mostly help on spanish/mexican reverse "!".
if you do run it with the third ocr value set to 6, make sure to check for confused "i" vs reverse "!"

If this does not help, you can lower the general OCR accuracy from 1000 to 980 ( or even lower) but you run into a dangerous low accuracy and might see a lot of confused letters ( espceil lower case "v" "x" "o" vs capital version)
To be honest i wouldn't not advise to do this but instead just "bite the apple" and type in the different character into the matrix file.

it should however happen very rarely with my matrix file that you need to type in all the character yourself
Sven Bent is offline   Reply With Quote
Old 9th December 2014, 18:52   #657  |  Link
Dean007
Registered User
 
Join Date: Feb 2011
Posts: 15
Even with suggested options it's still the same!
Dean007 is offline   Reply With Quote
Old 10th December 2014, 21:45   #658  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
Dont think there is nothing else to do than to type in the characters. It should "only" be once for each character.
Im however highly interested in what dvd it is. Maybe I can track it down so i can add this "font" into the matrix
Sven Bent is offline   Reply With Quote
Old 11th December 2014, 00:00   #659  |  Link
Dean007
Registered User
 
Join Date: Feb 2011
Posts: 15
It's from TV show "The Strain"! Subtitles are in IDX/SUB format. If you can't find it tell I'll upload them!
Dean007 is offline   Reply With Quote
Old 4th January 2015, 13:36   #660  |  Link
Sven Bent
Registered User
 
Join Date: Oct 2001
Posts: 145
I haven't been able to find it

I have however with the new version of subrip (v.1.51.1) sorted the matrix so it should go faster now.
I'm still trying to suggest the current developer for a better sorting algorithm for a faster matrix file.

Nevertheless the most up to date version can be found here and gets updated withing seconds after i finish a new subtitle set
http://techcenterdk.ddns.net:12380/SBmatrix.sum
Sven Bent is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:19.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.