View Full Version : Subtitle Edit
Nikse555
17th January 2014, 21:29
@lansing:
I just might have added this select even/uneven lines in Edit -> Modify selection.
The "Untitled" issue might have been fixed too... I think it was related to the undo-function.
The waveform mousewheel scrolling direction can be changed in Options -> Settings -> Waveform/spectrogram
http://www.nikse.dk/SubtitleEdit.zip (beta, portable version)
How does it work?
@Nozdrum: You can try to open the .mp4 file in SE via File -> Open - to check if it contains subtitle tracks (vobsub/bluray sup).
SE cannot extract hardcoded/burned-in subtitles, sorry.
lansing
18th January 2014, 09:04
the select even/odd function work flawlessly, thanks for adding it.
I have another request, that is to add a hover-to-focus to the waveform, because I'll have to constantly switching back and forth between the list view and waveform window while syncing subtitle. Now I have to click somewhere inside the waveform in order to have focus on it, but doing so will move the current position of the waveform cursor, which is not good. And it's going to add several hundreds of unnecessary clicks to the workflow.
Nikse555
18th January 2014, 10:13
@lansing: nice idea about the waveform auto-focus on mouse over :) - I've added it in latest svn and here: http://www.nikse.dk/SubtitleEdit.zip
lansing
18th January 2014, 11:01
@lansing: nice idea about the waveform auto-focus on mouse over :) - I've added it in latest svn and here: http://www.nikse.dk/SubtitleEdit.zip
Doesn't quite work yet, it still set focus on the waveform even though I hovered out.
Nikse555
18th January 2014, 14:18
@lansing: Focus something else than current focused control on mouse out/lease sounds complicated/confusing - what should be focused and how to keep focus?
Instead I've made two custom shortcuts: One for shifting focus from list view to waveform + one for shifting focus from waveform to list view... anyone got a better idea?
Test version: http://www.nikse.dk/SubtitleEdit.zip
lansing
18th January 2014, 20:03
@lansing: Focus something else than current focused control on mouse out/lease sounds complicated/confusing - what should be focused and how to keep focus?
Instead I've made two custom shortcuts: One for shifting focus from list view to waveform + one for shifting focus from waveform to list view... anyone got a better idea?
Test version: http://www.nikse.dk/SubtitleEdit.zip
I'm thinking of switching the focus depending on the view window, as they are the main areas that can use mouse scrolling.
So if the current view window is "list view", focus will be set between waveform and list view window when mouse hover: hover over waveform->waveform get focused, hovered out waveform->list view get focused. And if "source view" window is selected, the focus switching will be between waveform and source view window instead.
Betsy25
20th January 2014, 21:11
Hi,
The Hunspell spelling engine is quite clunky at finding words, and things like 'Forrest' pops up pointing that Forrest' is not a recognized word (happens with the dictionary set to Dutch), and all kind of clumsy "mistakes" like 1 letter words.
Wouldn't it be too hard to implement a much smarter engine like Aspell please ?
Nikse555
20th January 2014, 21:45
@lansing: OK, I've added an option to also focus the "list view" on mouse enter: http://www.nikse.dk/SubtitleEdit.zip
@Betsy25: Prompt for unknown "One letter words" is an option in Settings -> Tools. Perhaps you can find a better Dutch dictionary? Hunspell seems to be used more than Aspell... Yet another option is the "Word spell check" plugin.
minhjirachi
21st January 2014, 04:55
Still get center alignment bug. Please fix it.
Nikse555
21st January 2014, 19:46
Still get center alignment bug. Please fix it.
How is this version: http://www.nikse.dk/SubtitleEdit.zip ?
If it still has center alignment issues, then which text + font causes the problem?
minhjirachi
23rd January 2014, 15:19
I have 3 scripts cause centering problem.
First script:
<font color="#FF0000">VAV STUDIO Hân hạnh Giới thiệu bộ phim:
<b><i>ĐỊCH NHÂN KIỆT</i></b>
</font>
It's cause problem on version 3.3.12 and no more problem on 3.3.13.
Second script and third script:
<font color="#FFFF80"><b><i>Phụ đề Việt ngữ bởi
trwng_tamphong ~ phudeviet.org</i></b>
</font>
<font color="#FFFF80">Phim thuyết minh độc quyền tại
<b><i>www.thuyetminh.net</i></b>
</font>
Still happend on the latest version of Subtitle Edit.
Betsy25
23rd January 2014, 16:27
@Betsy25: Prompt for unknown "One letter words" is an option in Settings -> Tools. Perhaps you can find a better Dutch dictionary? Hunspell seems to be used more than Aspell... Yet another option is the "Word spell check" plugin.
Yeah, I know that option, the problem is that the dutch language as loads of occurences with a ' in front of it, like 't for het (it), 's for eens (once), 'n for one (a), etc....
and the Hunspell engine doesn't seem to get it, so does the word check plugin, with that option unchecked it still halts at every occurence of such words :(
Nikse555
23rd January 2014, 18:08
@minhjirachi: thx for the examples :)
Should work now: http://www.nikse.dk/SubtitleEdit.zip
@Betsy25: Hm, it might actually be my code that prevented these 't 's and 'n from working... is this the above version any better?
Also, there's a slightly newer spell check on this page I think: http://www.opentaal.org/bestanden/doc_download/20-woordenlijst-v-210g-voor-openofficeorg-3 (rename oxt to zip and copy .aff and .dic files to SE\Dictionaries)
Ghitulescu
23rd January 2014, 18:56
Any progress with the extraction of subtitles from TS/M2TS streams? In particular from HD broadcasts?
Nikse555
23rd January 2014, 19:06
Any progress with the extraction of subtitles from TS/M2TS streams? In particular from HD broadcasts?
The above version should be able to open and extract bitmap based subtitles from .ts files... and it might work with .m2ts files as well (just use File -> Open).
Please test :)
Betsy25
23rd January 2014, 21:12
@Betsy25: Hm, it might actually be my code that prevented these 't 's and 'n from working... is this the above version any better?
Also, there's a slightly newer spell check on this page I think: http://www.opentaal.org/bestanden/doc_download/20-woordenlijst-v-210g-voor-openofficeorg-3 (rename oxt to zip and copy .aff and .dic files to SE\Dictionaries)
Hi Nikse, using your updated version, way better but still some other instances like 'm for hem (him), 'r for haar (her), 'k for ik (I), 'n is still there.
in the "Word not found" textfield it asks what to do with m etc... (without the antecedent ' ), I can't quite choose to add to user database because that might lead to some really crappy user database quite fast , it might be correct if it presented to add the precedent ' AND the letter together to the database though. (but then it probably would be worse for the case if the language was English, perhaps)
now, the problem pointed out before, now popping up for replacement of occurrences like 'Forrest' not recognized, there is nothing painted in red, and the "Word not found" textfield asks what to do with Forrest' again.
BTW, Replacing the dutch dictionary didn't change anything to the problems I'm having.
Duh, What I found, I can of course add all dutch '{one letter} items to the "User wordlist" as a workaround.
P.S. Sorry for sounding quite difficult or hard to understand, I'm dutch by nature.
Betsy25
23rd January 2014, 22:29
One small misbehavior, regarding the "Break long lines" feature, it might be fine to prevent not breaking inbetween a <br /> and a - sign.
For instance, sometimes it tries to break lines like :
Heeft hij onlangs 166 niet gerepareerd?
- Jack heeft onlangs alle drones gerepareerd.
into...
Heeft hij onlangs 166 niet gerepareerd? -
Jack heeft onlangs alle drones gerepareerd.
Nikse555
24th January 2014, 08:53
@Betsy25: I've tried to improve the 'Forrest' issue + tried to fix the 'break' issue too: http://www.nikse.dk/SubtitleEdit.zip
Let me know how it works!
kalehrl
25th January 2014, 20:53
Hi Nikse
I opened a h264 .ts file with a subtitle captured from a satellite and tried to OCR the subtitle. It went fine as far as character recognition is concerned. However, the resulting .srt file had incorrect start time of the subtitle. They were something like 10 hours shifted - first subtitle was 10:00:03,000 instead of 00:00:03,000 and so on. Please try the attached sample to replicate the issue. Subtitle language is Serbian.
https://mega.co.nz/#!l9gCgDDL!vjLjmKS_wnwSyW90EOBRiDF4sHqigWyn-UnnRASdgf8
Also, with the latest version some .mkv files have no sound but I know directshow works fine because mpc plays it.
Nikse555
26th January 2014, 15:52
@kalehrl: thx for the file :)
SE just takes the raw time codes - perhaps subtracting first video time code will give a better time code? Like this: http://www.nikse.dk/SubtitleEdit.zip
About the .mkv files - do you have latest version of LAV filters? https://code.google.com/p/lavfilters/
kalehrl
26th January 2014, 19:42
No, thank you for fixing this issue :)
I use the latest CCCP codec pack which includes LAV Filters 0.60.1.0-22da8ba and MPC-HC 1.7.1.322 (shows up as 333):
http://www.cccp-project.net/forums/index.php?topic=7105.0
jinkazuya
27th January 2014, 19:21
possible to add Chinese & Japanese spellchecking and ORC auto correction please?
Also it is possible to add words to the dictionaries so in the future we don't have to retype or fix those spellchecking or word errors?
Betsy25
30th January 2014, 18:41
@Betsy25: I've tried to improve the 'Forrest' issue + tried to fix the 'break' issue too: http://www.nikse.dk/SubtitleEdit.zip
Let me know how it works!
Sorry for the late reply.
'Forrest' replace issue seems to work fine :o
Regarding the break issue, it doesn't show up in the 'Fixes' list when doing Fix common errors, however it doesn't even show up when i have the max. line length as low as for example 10 in the settings page. It's still flagged in red in the regular window though, but it doesn't appear in the "Fixes" window. Is this how it supposed to be ?
EDIT: This was the "problem" line :
Heeft hij onlangs 166 niet gerepareerd?
- Jack heeft onlangs alle drones gerepareerd.
BTW : The Dutch language uses a lot of dash-connected words, like "mede-speler" (fellow player), "gevechts-troepen" (fighting troops) etc...etc... Perhaps the spell checker can take that into consideration when the dictionary is set to Dutch ? (Right now, we get popups about what to do with "gevechts" but we need to be able to either replace- , add to user dictionary, or check the Dutch dictionary for, the whole dash-connected word.
( EDIT: I posted this on the SE Issue Tracker : issue 207 )
P.S. : Would it be a good idea to have a direct icon in the default window for the "Fix Common Errors..." function ? I see someone had already asked this, in the issue 122.
Nikse555
4th February 2014, 18:44
@kalehrl: I just use quarts.dll for video player... and for me lav-fitlers works fine alone.
@jinkazuya: Is a hunspell dictionary available? Or some open source code (with c# bindings)?
@Betsy25: SE spell check already tries to spell check dash-connected words.
I'll add a "Fix common errors" icon to the main window if someone can make (or find) an icon that matches with the other icons...
Also, SE has moved to GitHub: https://github.com/SubtitleEdit/subtitleedit
(as google has dis-continued downloads on code.google.com)
mood
4th February 2014, 23:15
icons for commonly used tasks in main window
Fix Common Errors, Ms Word spell check and for window Split Long Lines
will be great if you add this icons to main window
Nikse555
5th February 2014, 22:24
Latest beta is here (with "Fix common errors" available as toolbar button9: http://www.nikse.dk/SubtitleEdit.zip (SE 3.3.13 should be out soon)
I need icons for the toolbar... I'm just not god with gfx ;)
lansing
7th February 2014, 20:16
I've used the Microsoft Office Document Imaging for OCR on chinese traditional character with another subtitle software called IdxSubOcr, and the accuracy is close to 99%. However when I use the same option in subtitle edit, the accuracy is less than 5%, most of the time it didn't regconize the characters, and the process is VERY slow. I tried changing the image palette, but there's hardly any improvement.
Nikse555
7th February 2014, 22:21
@lansing: I don't read chinese, so perhaps you can find and compare the source code with SE? Perhaps they do some image scaling or some other tricks?
I did not have MODI installed, but I do now - http://www.microsoft.com/en-us/download/details.aspx?id=21581 (customize setup and choose only modi)
If someone want to have a go at improving this, you can create a fork of the SE source code on GitHub: https://github.com/SubtitleEdit/subtitleedit/fork
lansing
10th February 2014, 17:14
unfortunately I'm not a coder and IdxSubOcr doesn't seem to be open source.
johner23
12th February 2014, 18:15
Hello, dear all.
@lansing: does IdxSubOcr has an english version? Or an english translation? I could find it, but it's in chinese language. And I can't undestand chinese.
Thanks.
DMD
23rd February 2014, 21:47
SORRY
Edit...
Ghitulescu
9th March 2014, 20:13
The above version should be able to open and extract bitmap based subtitles from .ts files... and it might work with .m2ts files as well (just use File -> Open).
Please test :)
I did this today. However I couldn't find any option nor solution on how to save them as such and not OCRed.
Nikse555
9th March 2014, 21:21
I did this today. However I couldn't find any option nor solution on how to save them as such and not OCRed.
Yes, this is not very obvious... but try to right-click in the list view (also check the attached screenshot).
So the subtitle import from the .ts file worked? :)
Ghitulescu
10th March 2014, 08:08
Yes, this is not very obvious... but try to right-click in the list view (also check the attached screenshot).
So the subtitle import from the .ts file worked? :)
Yes, it recognised the bitmaps :) but I did not scroll enough to see whether the colours have been kept (ARTE HD uses various colours for different characterrs so a blind could see who's talking).
I'll wait for the image ... it's not yet approuved.
Ghitulescu
12th March 2014, 17:43
Well, this is what I did (before the image has been approved). And it worked.
It remains to see what all these options do (like Transparent background) ie they affect the display only or they are saved withing the subtitles.
Anyway a big thank, the SUP saved by your software was recognised by many tools (like tsmuxer or bdsup2sub) unlike the one saved by ProjectX.
Music Fan
14th March 2014, 19:43
I was going to ask how to open DVB-SUB included in TS but I just found, I post it for those who search : file, open, file type, all files, choose ts, open.
Ghitulescu
17th March 2014, 10:12
I think that drag'n'drop works too.
Music Fan
17th March 2014, 18:18
Right, I didn't think to this ;)
Music Fan
17th March 2014, 19:21
I discovered a little OCR bug in french : the t is sometimes (at least one time) seen as a l while it is well detected as a t in another line of the same subtitle (same font, same size, exactly the same t). It was in the word "tes" (which mean yours).
@ Nikse555 : it's in the ts file I sent you a few weeks ago, you can test it if you still have it.
von Suppé
17th March 2014, 19:52
... the t is sometimes (at least one time) seen as a l while it is well detected as a t in another line of the same subtitle (same font, same size, exactly the same t).
Can it be that different adjacent characters/letters/signs also influence OCR sometimes? I tend to feel that way.
Music Fan
17th March 2014, 20:19
Probably ;
défaite : ok
tes : ko (seen as "les")
Astonishing because in both cases, the t is followed by a e.
I don't know if a dictionary is used during OCR, if yes I understand that défaite is not seen as défaile because défaile does not exist in french, while les and tes both exist.
von Suppé
29th March 2014, 09:27
Nikse555, is it much work to make SE be able to remember, and preferably save and load settings in the SUP export window? It doesn't seem to remember all the settings, even within the same session. At least framerate and shadow alpha channel are defaulted back every time.
minhjirachi
31st March 2014, 07:44
The function: "Export BDN XML/PNG" doesn't save the setting. So please fix it's problem.
Thank you so much.
Nikse555
13th April 2014, 13:46
Sorry, I cannot change a lot about how Tesseract works - for more info about Tesseract go here: https://code.google.com/p/tesseract-ocr/
I'll try to make SE remember all values from export in next version...
SE 3.3.15 is out - sub/idx files created by SE should now work in handbrake + gpac/mp4box - thx Ryan for fixing this (added an extra byte of value 255 in image data :)
von Suppé
14th April 2014, 08:01
Thanks Nikse555, much appreciated :)
Ghitulescu
14th April 2014, 08:29
To make it as close to perfection as it may be, it may also remember the position of the DVB subtitles (by default in the middle). Sometimes the subtitles are placed under the character that speaks, to help better identifying it. Others use different colours.
Thanks :)
dsmbr
19th April 2014, 00:12
Windows 8.1 (x64) - German
Subtitle Edit 3.3.15 (NET4) --->
1) NET 2-3.5 crashed on starting OCR via Tesseract, I had to switch to the NET4-version. Doesn't make any sense since all .NET-versions are included in Windows 8.1
2) Error on using OCR via image compare.
System.IO.DirectoryNotFoundException: Ein Teil des Pfades "C:\Users\myusername\AppData\Roaming\Subtitle Edit\Ocr\German_Images.db" konnte nicht gefunden werden.
bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
bei System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy)
bei System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy)
bei System.IO.FileStream..ctor(String path, FileMode mode)
bei Nikse.SubtitleEdit.Forms.VobSubOcr.SaveCompareItem(NikseBitmap newTarget, String text, Boolean isItalic, Int32 expandCount)
bei Nikse.SubtitleEdit.Forms.VobSubOcr.SplitAndOcrBitmapNormal(Bitmap bitmap, Int32 listViewIndex)
bei Nikse.SubtitleEdit.Forms.VobSubOcr.MainLoop(Int32 max, Int32 i)
bei Nikse.SubtitleEdit.Forms.VobSubOcr.mainOcrTimer_Tick(Object sender, EventArgs e)
bei Nikse.SubtitleEdit.Forms.VobSubOcr.ButtonStartOcrClick(Object sender, EventArgs e)
bei System.Windows.Forms.Control.OnClick(EventArgs e)
bei System.Windows.Forms.Button.OnClick(EventArgs e)
bei System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
bei System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
bei System.Windows.Forms.Control.WndProc(Message& m)
bei System.Windows.Forms.ButtonBase.WndProc(Message& m)
bei System.Windows.Forms.Button.WndProc(Message& m)
bei System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
bei System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
bei System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
I had to create this missing file manually.
3) Installed German dictorary, started OCR via Tesseract -> error
System.NullReferenceException: Der Objektverweis wurde nicht auf eine Objektinstanz festgelegt.
bei Nikse.SubtitleEdit.Forms.VobSubOcr.OcrViaTesseract(Bitmap bitmap, Int32 index)
bei Nikse.SubtitleEdit.Forms.VobSubOcr.MainLoop(Int32 max, Int32 i)
bei Nikse.SubtitleEdit.Forms.VobSubOcr.mainOcrTimer_Tick(Object sender, EventArgs e)
bei Nikse.SubtitleEdit.Forms.VobSubOcr.ButtonStartOcrClick(Object sender, EventArgs e)
bei System.Windows.Forms.Control.OnClick(EventArgs e)
bei System.Windows.Forms.Button.OnClick(EventArgs e)
bei System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
bei System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
bei System.Windows.Forms.Control.WndProc(Message& m)
bei System.Windows.Forms.ButtonBase.WndProc(Message& m)
bei System.Windows.Forms.Button.WndProc(Message& m)
bei System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
bei System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
bei System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
Nikse555
29th April 2014, 17:59
Hi guys,
The "OCR via image compare" is surely broken in SE 3.3.15... sorry about that. I was working on better image->letter splitter + more compact/faster file format.
The Tesseract should still work though... just don't install Tesseract via the installer: http://www.nikse.dk/SubtitleEdit/Help#issues ;)
Tesseract should not have anything to do with the .net framework - and win 8 strangely only included .net framework 4.5 last I checked.
(programs compiled to .net framework 2.0 works on machines with .net framework 2-3.5, programs compiled with .net 4 works on machines with .net framework 4-45 - but perhaps .net programs can compile to native soon: http://msdn.microsoft.com/en-US/vstudio/dn642499.aspx )
von Suppé
30th April 2014, 07:00
Hi Nikse555
I encountered playback issues in MPC-HC with SUP files in mkv container. Within the same sup, some lines are displayed, others not. Now, discarding if MPC-HC has some subtitle issues or not, doing some testing I did find that the SUP output of SE is not consistent.
I exported an srt file as SUP several times - with exactly the same export settings of course. The output files show different hash-check numbers. Now, I do not know if this has something to do with the problems in MPC-HC.
I checked several same outputs of EasySUP and GoSUP and all SUP files show the same hash-check numbers.
VLC and my Dune mediaplayer play SE created SUPs without problems though. I also used them while authoring blu-ray. The burned disks play fine; no subtitle problems.
Any thoughts, please?
Thanks in advance :)
Betsy25
13th May 2014, 14:07
Hi Nikse,
- perhaps this asks for a lot of code, but is some kind of "export/import settings" option on the agenda ?
- Another problem, using Dutch subtitles, the "Fix common errors..." (Fix common OCR errors) capitalizes instances where a line starts with 't (dutch abbrev. of the English word It), Example :
't Is geen rugzaktoerist, hè?
't Was alsof ik voor 'n afgrond stond.
get replaced by...
'T Is geen rugzaktoerist, hè?
'T Was alsof ik voor 'n afgrond stond.
(when dutch language, 't abbrevs always are lowercase. When happening at the start of a sentence, the next following word must be capitalized)
:o
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.