View Full Version : Trying to render polish subtitles with ffmpeg, many chars isn't showed properly
konstantin1
12th March 2018, 18:59
I have downloaded a polish subtitle file for a movie, and I am trying to render it to a movie with ffmpeg. The downloaded subtitle file encoding is iso-8859-2, as far as I know. Because when I try to convert it into ass subtitle format, I had to use the ffmpeg option -sub_charenc iso-8859-2
After that I can see many accented polish chars in the now utf-8 encoded .ass subtitle file. However some of the accented polish chars don't show up correctly, for example when I open the file with geany text editor, I can see the following:
https://i.imgur.com/M7kWsuK.png
Even when I try to render the text with ffmpeg ass=subtitle_file.ass
option, I get similar results, special polish characters with accents don't appear properly in the rendered text.
What should I do to properly render the subtitles in polish language? Maybe should I use a different (TTF) font? And which (TTF) fonts do support polish language?
Midzuki
13th March 2018, 09:39
I know nearly-nothing of FOSS fonts, so I will recommend some well-known Windows fonts which support the Polish character set:
Arial, Arial Unicode MS, Times New Roman, Tahoma, Georgia, Trebuchet MS, Verdana, Consolas, Lucida Console.
Regarding the subtitle file itself: you'd better convert it to a UTF-16 SSA ou ASS file with a dedicated subtitle editor, not with ffmpeg.
Dulus_No
13th March 2018, 10:33
https://bboxtype.com/typefaces/FiraGO/
Ghitulescu
13th March 2018, 10:40
The OP deserves that if he chooses to go astray from well-known and well-implemnted methods.
If any, the subtitle processing software is ahead of all types of video software, so plenty of choices.
sneaker_ger
13th March 2018, 15:54
Regarding the subtitle file itself: you'd better convert it to a UTF-16 SSA ou ASS file with a dedicated subtitle editor, not with ffmpeg.
UTF-8 is de-facto ASS standard. (And the most common container for ASS - Matroska - uses UTF-8 for all text as well.)
mkver
13th March 2018, 17:06
1. If your font doesn't support some characters, you get a symbol for unknown glyph (often it is a square). This is not what you see here.
2. Your original subtitle file is probably Windows-1250, not ISO 8859-2. Reason: Look at line 105. The IND is unicode code point 0x84, but it is not even included in ISO 8859-2 (https://en.wikipedia.org/wiki/ISO/IEC_8859-2#Code_page_layout); it is undefined there. A proper ISO 8859-2 text can't contain IND and (except for the possibility of bugs in the converter) no IND can appear in a file converted from 8859-2 to unicode. The same goes for STS, codepoint 0x93. But if we look at Windows 1250 we see that 0x84 is „ there are and 0x93 is “ which totally fits. So iconv seems to treat undefined things from the input file as a unicode code point and that explains what you see.
(Similarly ST is ś.)
3. Notice that Windows 1250 and ISO 8859-2 do not agree at all positions where both are defined. 0xB9 is š in ISO 8859-2, but ą in Windows 1250. Line 104 contains "patrzą" if it is treated as Windows 1250, but patrzš if treated as ISO 8859-2 as you have. Google Translate doesn't think that patrzš is proper polish (and Google gives just 438 results for a search for it); patrzą is recognized as Polish and gives 20.000.000 search results.
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.