How can I strip the timecodes from SRT subtitles? [Archive]

View Full Version : How can I strip the timecodes from SRT subtitles?

Perenista

4th May 2020, 20:46

I have a video with two SRT subtitles.

The first one is synchronized perfectly. The 2nd, it's not, but it would be hard or it would take too much time for me to fix this. However, I need both, and they are different (one is english, the other portuguese, and I want these 2 languages).

Let's call the perfect subtitle A (english) and the wrong one B (portuguese).

So I had the following idea:

1) Save as plain text subtitle B. So all timecodes were stripped from the file. Meaning that instead of:

************

1
00:00:10,969 --> 00:00:12,471
Look.

2
00:00:14,806 --> 00:00:17,600
And you said they were incompatible.

************

We would have:

************
Look.
And you said they were incompatible.
************

For lines 1 and 2 of said TXT.

2) Since subtitle A has the correct timecodes, I was thinking of doing this:

- Remove all text from subtitle A, and preserve only the timecodes.
- Insert all text from subtitle B into A. Then save as a new file.

Is there a way to do this?

Doing manually would take a looooooooooong time, since there are 883 lines in the plain text file.

StainlessS

4th May 2020, 23:17

For anyone that might want to consider this, provide more details might be a good idea.
Both subs all correspond exactly, have exact same number of subtitles,
and each subtitle number has same meaning for each language.

Emulgator

5th May 2020, 00:39

Swap every 5th line ?
Maybe there is a script that can do this in notepad++.
Well, 2-and 3-liners are not covered by that approach...

StainlessS

5th May 2020, 02:05

OK, I've got an avisynth script that parses a srt file OK, all the way to the end. (PAL Alien Eng) [So that is a start at least, not capable of doing what you want just yet]

However, something that only just occurred to me [when I encountered a problem] was that the srt I used was encoded utf8 with BOM,
I needed to convert to utf8 without BOM, otherwise it failed on parsing the BOM (first 4 hidden characters).

PsPad text editor - Menu/Encoding/Unicode UTF8 no BOM (65001)

I know zip about character encoding but once converted the script works [might have lost some wierd accent or special characters].

Are your srt files some wierd encoding ? [Portuguese so I guess so]

StainlessS

5th May 2020, 03:35

OK, try this, needs Avisynth+, and RT_Stats v2.0Beta12 [See Mediafire link below this post in my sig]

1_ParseAndWriteDBase.avs [Run for each srt file, change, set PROC_SRT = 1 for first file and run, then set PROC_SRT = 2 and run, then run script 2]

# 1_ParseAndWriteDBase.avs

/*
Requires Avs+, RT_Stats v2.0 Beta 12

Will Require non weird character encoding, if necessary try UNICODE UTF8 without BOM (65001) [I did my test convert via PsPad text editor]
*/

##################
### CONFIG ######
PROC_SRT = 1 # 1 Process FN1, 2 Process FN2 (and write DB if WRITE_DBASE=true)
WRITE_DBASE= True # If False then does Not write DBase : Need create a DBase for EACH srt File. [FALSE for testing script parsing only]
##
FN1 =".\Alien_1.srt" # MUST Create DBASE for Each SRT file.
FN2 =".\Alien_2.srt" #
##################
##################

FN = (PROC_SRT==2) ? FN2 : FN1
FN=RT_GetFullPathName(FN)
DB=FN+".DB"
TypeStr="s512s512" # DBase, 2 fields both String[512]
(WRITE_DBASE) ? RT_FileDelete(DB) : NOP # Delete any existing DBase if writing new one
(WRITE_DBASE) ? RT_DBaseAlloc(DB,0,TypeStr) : NOP # Create Empty DBase (0 records)
##################
LINES=RT_FileQueryLines(FN)
IN = False
SubN=1 # Subtitle Number
TimeS="" # Times String
Subtitles="" # Subtitle String
SubLines=0 # Lines Gotten for Subtitle
SubStartLine=0 # Line number where subtitle start [ie the subtitle number line, relative 1]
SubIx=0 #
DIGITALP="0123456789"
TIMEALP=DIGITALP+":,"

For(i=0,LINES-1) {
Txt=RT_ReadTxtFromFile(FN,Lines=1,Start=i).RT_TxtGetLine.ChrEatWhite.RevStr.ChrEatWhite.RevStr # Get Line of text, remove EOL and Eat leading & trailing White Space
RT_DebugF("%d] IN=%s %s",i,IN,Txt)
len=Txt.StrLen
if(!IN) {
if(len>0) {
numlen=Txt.StrMatchChrLen(DIGITALP,sig=True)
Assert(numlen>0,RT_String("Line %d Subtitle Number %d NOT FOUND\n'%s'",i+1,SubN,Txt))
Number=txt.RT_NumberValue
Assert(Number==SubN,RT_String("Line %d Expecting subtitle Number %d Got %d\n'%s'",i+1,SubN,Number,Txt))
s=Txt.MidStr(numlen+1)
Assert(s=="",RT_String("Line %d Expecting nothing after subtitle number %d, Got '%s'\n'%s'",i+1,SubN,s,Txt))
TimeS=""
Subtitles=""
SubLines=0
SubStartLine = i+1
SubIx=1 # Expecting Times next
IN = True
}
} else {
if(len>0) {
if(SubIx==1) { # Times
OpenTimeLen=Txt.StrMatchChrLen(TIMEALP,sig=True)
Assert(Len==29,RT_String("Line %d Times expecting 29 characters, got %d \n'%s'",i+1,len,Txt))
Assert(OpenTimeLen==12,RT_String("Line %d OpenTime expecting 12 characters, got %d \n'%s'",i+1,OpenTimeLen,Txt))
s=Txt.MidStr(13)
Assert(s.LeftStr(5)==" --> ",RT_String("Line %d expecting ' --> '\n'%s'",i+1,Txt))
s=s.MidStr(6)
CloseTimeLen=s.StrMatchChrLen(TIMEALP,sig=True)
Assert(CloseTimeLen==12,RT_String("Line %d CloseTime expecting 12 characters, got %d \n'%s'",i+1,CloseTimeLen,txt))
Times=Txt
SubIx=2
} else if(SubIx==2) { # Text
Subtitles=(SubLines==0)?Txt:Subtitles+Chr(10)+Txt
SubLines=SubLines+1
}
}
if(len==0 || i+1>=LINES) {
Assert(SubIx==2,RT_String("Line %d Expecting %s\n'%s'",i+1,SubIx==0?"Subtitle Number":"Times",Txt))
RT_DebugF("###########\n%d] %d\n %s\n %s\n###########",SubStartLine,SubN,Times,Subtitles)
If(WRITE_DBASE) {
RT_DBaseAppend(DB,TimeS,Subtitles)
}
SubN=SubN+1
IN = False
}
}
}

Return (!WRITE_DBASE)
\ ? MessageClip(RT_String("Parse Only\n'%s'",FN))
\ : MessageClip(RT_String("Parse And Write DBAse\n'%s'\n'%s'",FN,DB))

##############################

Function ChrEatWhite(String S) {i=1 C=RT_Ord(S,i) While(C==32||C>=8&&C<=13) {i=i+1 C=RT_Ord(S,i)} return i>1?MidStr(S,i):S}

# Return extent of string S [ie length from the beginning] that matches any character in Chars set of characters [Default case insignificant]. # StrMatchChrLen("1234.567abcd","0123456789.") = 8
Function StrMatchChrLen(String s,String Chars,Bool "Sig") {
Function __StrMatchChrLen_LOW(String s,String Chars,int n) { c=s.MidStr(n+1,1) Return(c==""||Chars.FindStr(c)==0) ? n : s.__StrMatchChrLen_LOW(Chars,n+1) }
Sig=Default(Sig,False) # Default Case Insignificant
s=(Sig)?s:s.UCASE Chars=(Sig)?Chars:Chars.UCASE
__StrMatchChrLen_LOW(s,Chars,0)
}

2_WriteFixSrt.avs [Write output SRT file].

# 2_WriteFixSrt.avs

/*
Requires Avs+, RT_Stats v2.0 Beta 12

NUMBER OF SUBTITLES MUST MATCH ELSE ABORTS

*/

##################
FN1 =".\Alien_1.srt" # Same As in 1_ParseAndWriteDBase.avs
FN2 =".\Alien_2.srt" # Same As in 1_ParseAndWriteDBase.avs
SRT =".\Alien_Out.srt" # Output Srt file
TIMES_FROM = 1 # Which DBase to Get Times From
SUBS_FROM = 2 # Which DBase to Get Subtitles From
##################

FN1=RT_GetFullPathName(FN1)
FN2=RT_GetFullPathName(FN2)
SRT=RT_GetFullPathName(SRT)
DB1=FN1+".DB"
DB2=FN2+".DB"

###

Records = RT_DBaseRecords(DB1)
Records2 = RT_DBaseRecords(DB2)
Assert(Records == Records2,RT_String("DBase Subtitle Count MisMatch DB1=%d DB2=%d",Records,Records2))
Assert(1 <= TIMES_FROM <= 2,"1 <= TIMES_FROM <= 2")
Assert(1 <= SUBS_FROM <= 2,"1 <= SUBS_FROM <= 2")
TDB = (TIMES_FROM==1) ? DB1 : DB2
SDB = (SUBS_FROM ==1) ? DB1 : DB2
RT_FileDelete(SRT) # Prep for write
###

for(i=0,Records-1) {
TimeS = RT_DBaseGetField(TDB,i,0)
SubS = RT_DBaseGetField(SDB,i,1)
RT_WriteFile(SRT,"%d\n%s\n%s\n\n",i+1,TimeS,SubS,Append=True)
}

MessageClip("All Done")

I tried with exact same srt file for each DBase, and wrote output srt file.
Compared output srt and one of the inputs with KDiff and said "Binary Equal", ie exactly same source srt [both input srt were exactly same].
Change the last subtitle time millisecs to "000" in source 1 srt, and a single word in last subtitle text in source 2 file,
and repeated both scripts.
Diff KDiff compare with output and each individual input and KDiff flagged only the deliberate changes in each, so seems pretty spot on if
your subs files are 1:1 exact corresponding.
Will throw error if differing number of subs in each input file.

StainlessS

5th May 2020, 05:31

Thank you HolyWu, I have never used Aegissub [well only ever used SubRip and occasional SubEdit] and had
no idea that you could do that. Cheers.

Emulgator

5th May 2020, 06:15

Thank you both !

Nikse555

5th May 2020, 07:56

In Subtitle Edit you can:

1) Load sub with bad time codes
2) File -> Import time codes... choose sub with good time codes

SE can (as Aegisub) also do column paste.

Alternately you can sync via "Visual sync" or "Point sync via other subtitle".

StainlessS

5th May 2020, 16:27

Thanks for that too Nikse, we learn something new every day.
If I had OP problem, I probably would have used Visual Sync [I sometimes do],
I only wrote that script because OP seemed to want exact same timecodes, and I
thought it would be an interesting way to pass some time [and have a subs SRT parsing
script that could be easily modified for other purposes].

I did leave it for a couple of hours before I did script [started it after Emulgator posted]
and as I had little else to do whilst waiting for an avs SysInfo plugin update, I thought I'de give it a go.
I probably should have just pointed out Visual Sync in SubEdit. [I did not know about "Import time codes"]

sneaker_ger

5th May 2020, 18:55

There are now also a few solutions that try to cleverly sync to existing subtitles or even audio (with speech recognition) without relying on both files matching each other perfectly.

https://subsync.online/
https://github.com/kaegi/alass
https://github.com/saurabhshri/CCAligner

Don't know how good they are, though.

StainlessS

5th May 2020, 19:12

And thanks also SG, got a real awkward one where those first two might come in handy, was also probably the real reason
that I did the SRT parsing thing, was gonna try similar in script.

marsoupilami

15th October 2020, 19:56

@Perenista
I don't know if it is of interest any more...

For syncing 2 different subtitle files I use my crazy tool SubSplicer - although I've written it for a different purpose, it does exactly what you want:
.) You can load your "in sync" english subs with "load upper" into the left-side panel and the "out of sync" portuguese subs with "load lower" into the right one.
.) Select/click the 1st english sub and then the corresponding portuguese one. Click "add link"
.) Do the same for the last english and the last (corresponding) portuguese one, "add link" - now you have two syncing points defined
.) Press the right (portuguese) "Synchronize" button - this will adjust the first and the subtitle exactly, all subs between will be squeezed or stretched like a "rubberband"
.) Now you can save the synchronized portuguese subs by pressing the right "Save As". (Codepage could be changed if required by changing codepage selector near the "Load" button)

This "rubberbanding" doesn't care about exact subtitle count, line numbers or whatever

Happy syncing :)

dev-null

19th November 2020, 20:36

Perenista

13th June 2023, 21:06

If you already have one subtitle in proper sync and just want to time a second subtitle according to it (that may or may not have the exact number of lines) then you can use "alass" (https://github.com/kaegi/alass) for it:

$ alass reference_subtitle_with_proper_sync.srt subtitle_that_needs_to_be_fixed.srt output.srtFor the record, this program worked perfectly. I was able to fix subtitles even from different languages, since I wanted just to export/import timecodes from a fixed one, and put them into a subtitle from a different language, in sync with another video, not the target one I was checking.