View Full Version : UNICODE characters in the subs...
ukb008
26th April 2009, 05:22
Hi, PROs
I've made an .avi from one of my DVDs, the language spoken is English, but there are some lines spoken in an Indian language (Bengali) which are translated into English in the subs. I wanted to display exactly what the characters were saying (not translations). There are text editors that work in MS office applications and put Bengali words. But when a subtitle text-file's relevant lines are replaced with this Bengali script, the file can only be saved in an UNICODE format. Saving it in ANSI causes the foreign language parts to appear as "????????????????" in the text file as well as during playback, and subtitle-display during playback apparently can show only ANSI characters.
1. Are there any subtitle-displaying software that will display textual UNICODE characters as such?
2. Is there any way to save UNICODE characters in MS-WORD or Notepad that will save in ANSI but preserve the looks of the UNICODE characters?
3. Are the foreign character always UNICODE? Can't we get them in ANSI?
Maybe the questions are not right. Please tell me what to do, other than making the text subs into picture-subs like .idx + .sub.
Regards.
PROBLEM SOLVED!
I've found that Media Player Classic Home Cinema v 1.2.908.0 (http://sourceforge.net/project/downloading.php?group_id=170561&filename=mplayerc_homecinema_x86_v1.2.908.0.zip&a=77136836) has great UNICODE support. This was mentioned by LUCHOO in a post below.
This is for everyone interested in this issue.
Regards.
Adub
26th April 2009, 08:30
To answer most of your questions, most foreign characters have to be encoded with Unicode, as ANSI was instated with primary respect to the English language and not much else.
However, have you tried using Aegisub and .ass subs? I do believe that they can use Unicode encoding. Actually I know that it does, and here is a good post by one of the authors of Aegisub about Unicode. Clicky. (http://www.aegisub.net/2008/10/unicode-utf-8-utf-16-ucs-2-in-nutshell.html)
jmartinr
26th April 2009, 08:37
Try UTF-8 for your character encoding.
Notepad++ does a fine job converting text to UTF-8.
LUCHOO
26th April 2009, 14:38
I'm adding Japanese characters to a subtitulo spanish .srt and save as unicode,
..and during playback ..I'm looking good in MPC ..
b66pak
26th April 2009, 18:37
if you are using Win XP you must install bengali:
http://www.telegraphindia.com/1070917/asp/knowhow/story_8323317.asp
_
P.S. here is a virtual keybord for bengali:
http://www.gate2home.com/?language=bn
or use the one provided by Win XP:
start > all programs > accessories > accessibility > on screen keyboard
control panel > regional and language options > languages > details > language bar > show language bar on desktop
now with (left ALT + SHIFT) (default) you can switch the on screen keyboard layout (or from the language bar from the taskbar) ...
_
ukb008
27th April 2009, 06:41
Hello jmartinr
Thanks for your post. I have encoded my Bengali language text in UTF-8 in Notepad. Opened in Notepad, it looks fine. In the playback (in vlc/wmp) it looks like
কি পেলে?
Maybe I am doing something wrong somewhere?
Regards.
*************
Hello LUCHOO
Thanks for your post. I always thought MPC is a command-line player, and avoided it. Does it have a GUI or something that an ordinary individual can use?
Regards.
*************
Hello b66pak
Thanks for your post. I figure if I type in a Bengali script from WinXP's own built-in program, then the file can be saved in ANSI as .srt and still retain the foreign script on playback? I'll try this and report.
Regards.
Report of experiment:
WindowsXP's inbuilt language support does indeed put Bengali Script in Notepad, but only saving in UNICODE will retain the fonts. Of course, playback of the characters from this UNICODE file displays: "??????????"
Regards.
*************
Hello Merlin7777
The Aegisub and .ass idea is fine. I'll implement it and report. Thanks and regards,
jmartinr
27th April 2009, 14:09
Hello jmartinr
Thanks for your post. I have encoded my Bengali language text in UTF-8 in Notepad. Opened in Notepad, it looks fine. In the playback (in vlc/wmp) it looks like
কি পেলে?
Maybe I am doing something wrong somewhere?
I suppose you have a SRT-file. Does the SRT-file have a UTF-8 BOM (EF BB BF) in front?
Midzuki
27th April 2009, 16:17
Notepad doesn't have the option to save as UTF-8 without the BOM.
Both VSFilter and ffdshow fully support UTF-8/Unicode subtitles.
You can even display several different charsets at the same time,
without the need for the stupid "codepages". I recommend that you take a long look at the Subtitle forum of VideoHelp dot com.
ukb008
27th April 2009, 20:16
Notepad doesn't have the option to save as UTF-8 without the BOM.
Both VSFilter and ffdshow fully support UTF-8/Unicode subtitles.
You can even display several different charsets at the same time,
without the need for the stupid "codepages". I recommend that you take a long look at the Subtitle forum of VideoHelp dot com.
Hi, Midzuki
Amazing;, I have ffdshow installed, and the icon comes on at the bottom when I play a video file with matching subs. UNICODE characters look like "?????????????????". Why. I wonder! But, OK, I'm going visiting VIDEOHELP subs forum.
Regards.
I suppose you have a SRT-file. Does the SRT-file have a UTF-8 BOM (EF BB BF) in front?
Hi, jmartinr
In front of what? Where? (sorry for being such a bluenose) I've not seen these letters (EF BB BF) in the files.
Regards.
jmartinr
27th April 2009, 23:59
Hi, Midzuki
Amazing;, I have ffdshow installed, and the icon comes on at the bottom when I play a video file with matching subs. UNICODE characters look like "?????????????????". Why. I wonder! But, OK, I'm going visiting VIDEOHELP subs forum.
Regards.
Hi, jmartinr
In front of what? Where? (sorry for being such a bluenose) I've not seen these letters (EF BB BF) in the files.
Regards.
As commented by Midzuki Notepad saves with BOM, so that's probably not the problem.
Have you tried playing the video with MPC? You have to find out whether the problem is with the SRT-file itself, or with playback.
And you might take a look here: http://forum.videolan.org/viewtopic.php?f=2&t=55536
ukb008
28th April 2009, 02:24
PROBLEM SOLVED!
I'm adding Japanese characters to a subtitulo spanish .srt and save as unicode,
..and during playback ..I'm looking good in MPC ..
Hi, LUCHOO
I have downloaded Media Player Classic Home Cinema v 1.2.908.0 (the latest as of now), and found that, as you have said, it has great UNICODE support.
So, the problem of playing foreign scripts in UNICODE has been solved.
Thanks to all of you, and you have my regards.
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.