Log in

View Full Version : subrip and Japanese kanjis


lankou
20th June 2005, 12:11
Hi,

Does subrip support Japanese kanjis/ hiragana/katakana ?

Thanks and regards
Lankou

ai4spam
20th June 2005, 18:26
The new 1.30 version supports UniCode, so I think it should support Kanji also. Just select the appropriate CharSet/CodePage and type in the characters. I don't know whether you can type in directly (I never tried a Japanese keyboard) or if you need to copy/paste the characters. Also, a feature added since 1.20 helps with Japanese characters: the possibility to extend the selection rectangle when doing OCR, in case the character is formed of several disjoint parts.
Please report in the SubRip topic from now on: http://forum.doom9.org/showthread.php?t=93680
I'd love to hear if it actually works.
PS: Note that only vsFilter supports UniCode subtitles (2 bytes per char), no other player that I know of does.

lankou
21st June 2005, 07:05
The new 1.30 version supports UniCode, so I think it should support Kanji also. Just select the appropriate CharSet/CodePage and type in the characters. I don't know whether you can type in directly (I never tried a Japanese keyboard) or if you need to copy/paste the characters. Also, a feature added since 1.20 helps with Japanese characters: the possibility to extend the selection rectangle when doing OCR, in case the character is formed of several disjoint parts.


Thanks, I will try this and report in the thread you mentioned. I think I forgot to set the codepage because I could only * in the char set.

Thanks and regards
Lankou

ai4spam
21st June 2005, 18:05
Well, the "*" buttons are easier shortcuts for special characters in Latin-based languages (such as À, â and the like), so it won't help you in Japanese. You can assign single chracters that you use often to them by copying a character in Character Map (use the corresponding button in SubRip to execute it), then right-click on a "*" button to paste it.
What I'd like you to try is just type in Kanji characters as you would in a regular application like MS Word. I know Chinese have special keyboard drivers that they can use to type several Latin letters for one Chinese character, but I don't know if it's the same with Japanese.
I'm not sure, but we may need to change the default font, so your report will be very helpful.

ai4spam
21st June 2005, 23:52
BTW, you should probably set the CharSet to SHIFTJIS and leave the CodePage at 932 unless you know otherwise (let me know if the CodePage should be different).
What I need to know is if the menus and other elements in the windows still show up correctly.

Thanks.

lankou
22nd June 2005, 03:23
BTW, you should probably set the CharSet to SHIFTJIS and leave the CodePage at 932 unless you know otherwise (let me know if the CodePage should be different).
What I need to know is if the menus and other elements in the windows still show up correctly.

Thanks.

Hi,

I did not have too much time to test it yesterday. I would like to give some information. My OS is winxp English and I added the east-asian support ( I live in Japan, so this is usefull to me). So the menu of the subrip are english and display correctly. The video I have is a dvd I recorded from my tv, the subs are hardcoded in the vob.

Thanks and regards
Lankou.

ai4spam
22nd June 2005, 05:52
Thnks for the info. I'll be waiting to hear from you if you are able to type in Japanese characters. A new version of SubRip is coming soon, with some bugfixes and a true UniCode font used everywhere.

unmei
22nd June 2005, 19:09
yes you can. At least for hiragana and katakana you simply type in the syllables as normal latin charachters - for other kana it is supposed to work, but i guess it needs a bit of excercise - at least i have a hard time getting them right (maybe i also simply type too slow :D ). Set the language bar to jpn and i think you need to launch one of the helper tools on it but then you simply type in the normal apps input box instead of the helper tool tingy.
This is mostly independent of the application. As long as the apps input box can display the characters it works (else you get question marks..).

Sorry i can't give exact description now because i'm on my work computer which doesn't have asian support installed..

ai4spam
22nd June 2005, 19:20
Thanks for the report, good to know it works.
As for getting meaningful characters instead of question marks, again, set the SHIFTJIS CharSet in the General Options window, then set the appropriate font in the OCR window using the Font button (i.e., use one that has the UniCode range you want).
The next version (due in a few hours) will use Arial UniCode MS by default, if available (it's installed with recent versions of MS Office).
Remember, only vsFilter displays UniCode subtitles at the moment (as far as I know). You can try saving as ANSI, but I have no idea what the results will be for Japanese (I think it works for Hebrew and Chinese Big5).

Edit: Beta 9 allows font selection, just select a font that has the SHIFTJIS CharSet available (it will show up in the list).

lankou
23rd June 2005, 06:43
Thnks for the info. I'll be waiting to hear from you if you are able to type in Japanese characters. A new version of SubRip is coming soon, with some bugfixes and a true UniCode font used everywhere.

Well, I did not have much luck. As shown with the screen capture


I see the main window. I select the movie, I can see it, and the
detection appears to be good, since a new window (New character)
is popping up.

Each Kanji seems well delimited by the little red rectangle, but after I don't know what to do. There is a combo box but there is no Japanese selection inside.
I chose MsMincho as a font (which contains Japanese character). If I click
on character map, I can actually see what are the kanjis inside my chosen font. I tried to type the unicode number in hex, but it did not help.

So I am at lost here on what to do next.

Thanks and regards
Lankou

lankou
23rd June 2005, 11:20
I see the main window. I select the movie, I can see it, and the
detection appears to be good, since a new window (New character)
is popping up.


Actually, it worked :) I forgot to use the ime and switch to Japanese :mad:

I'll post a screen shot.

By the way, is there a way to disable the new character window to flash the
red color ? :)

Thanks and regards
Lankou

ai4spam
23rd June 2005, 12:28
Actually, it worked :) I forgot to use the ime and switch to Japanese :mad:
I'll post a screen shot.
Cool, thanks.

By the way, is there a way to disable the new character window to flash the
red color ? :)
Yes, uncheck the Wake me up! checkbox in the Options window.

lankou
23rd June 2005, 16:23
Yes, uncheck the Wake me up! checkbox in the Options window.

Great !

Thanks for this fantastic software!!

Regards
Lankou.

darkavatar1470
23rd June 2005, 16:48
Actually, it worked :) I forgot to use the ime and switch to Japanese :mad:

I'll post a screen shot.

By the way, is there a way to disable the new character window to flash the
red color ? :)

Thanks and regards
Lankou
Thats cool, I assume it works for Hiragana Characters, right?

Now I can OCR my DVDs with jap subs....
By the way, there is a nice Chinese Sub OCR tool called "SubOCR",
works with both GB & BIG5 VobSub *.idx *.sub files.

ai4spam
23rd June 2005, 18:42
By the way, there is a nice Chinese Sub OCR tool called "SubOCR",
works with both GB & BIG5 VobSub *.idx *.sub files.
What does it do, exactly? I mean, .sub and .idx files contain bitmaps for subtitles, not text. There are at least 2 programs by that name, one by DarkCracker (I think he's French) and another one which is Chinese, but I have no idea how to download it.
SubRip will ask you if you want to save the subs as UniCode, and if not it will try to convert them to ANSI. The basic idea is that there are only 224 characters in the ANSI one-byte set, so I don't know if conversions to GB and BIG5 work. It would be nice to get a confirmation if they do, and to get feedback on whether the default CharSet-CodePage mapping that I put in are valid.

darkavatar1470
25th June 2005, 07:13
I mean it can OCR both traditional & simplified chinese,
and is able to output *.srt in Big5 or GB thats useable in Win98.
the output file is a little bugged, need to use Subresync from Vobsub 2.23 to fix/convert it into something acceptable by other subtitle tools.

it uses OCR alogo ripped from commercial soft so you don't need to teach it 5000 chinese characters... so I'm not actually going to use Subrip on chinese stuff.... sorry.......

lankou
25th June 2005, 09:26
Thats cool, I assume it works for Hiragana Characters, right?


Yes, if you have the IME, it works for Kanji, Hiragana and Katakana.

Lankou

ai4spam
27th June 2005, 22:27
it uses OCR alogo ripped from commercial soft so you don't need to teach it 5000 chinese characters... so I'm not actually going to use Subrip on chinese stuff.... sorry.......
Well, I don't know how Big5 or GB work with only 224 characters available. Is there some escape character, or...? And, what does the rest of the application look like? Is it a rip off SubRip ;)? Do you still type in characters it can't recognize? What font do you use?
As for the 5000... in your experience, how "alike" are the subtitles? I mean, do subtitles from different DVDs look similar? I'm asking because, theoretically, you couls use the "Fill matrix from text" feature (with some tweaks to select the character range, which now defaults to Latin letters only) to fill your character matrix, and at worst wou'd have to press Ctrl-Enter to confirm the guess. I'll see what I can do about allowing users to select the character range, if you're willing to give me some feedback on how it works.
There is a freely-avaiable OCR library that I could use for SubRip, but who will train its neural network on all the Chinese characters?