View Full Version : X2: X-Men United. Closed Captions are off by about a chapter - ??
JFerguson
20th January 2004, 20:29
Just did Disc 1 of the R1 release. Its DVD structure seems pretty simple; pretty much just the movie on this disc.
I didn't process the non-English tracks or DTS track and kept only the English (W) and English (L) subtitles.
The closed captions are off, though -- it's like they get a late start; by Chapter 2, I start seeing Chapter 1 captions.
Everything else works. Has anyone seen this before? Thanks...
jel
21st January 2004, 02:57
hi JFerguson,
the only thing i can suggest is to open the closed caption .sst file in notepad and see what the times are and if they correspond to how they appear in scenarist as apposed to the original.
beyond that you could try a few manual things -
re-import your closed captions:
guide by Timekills (http://www.doom9.org/index.html?/mpg/cc-addendum.htm)
or take another step back and re-rip your closed captions:
guide by Eyes`Only (http://www.doom9.org/mpg/ccguide.htm)
or for another approach (i believe McPoodle is somewhat of a closed captioning guru) you can check out McPoodles guides and 'tools of the trade':
SCC Tools (http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML)
and for a bit of 'light' reading :rolleyes: type in closed captions / mcpoodle in the search engine.
then again, Eyes could come along at any moment and tell you theres an easy solution that takes 2 seconds and voila....problem solved :D
j
Eyes`Only
21st January 2004, 03:09
Haha I wish I could, jel. But if he's talking about true closed-captions (the .scc file) I really don't know what to do to fix that. Maybe one of McPoodle's tools can fix it, other than that, I have no idea how you'd fix a delay in an .scc...
HyperYagami
21st January 2004, 07:16
see if you have the same problem as I do:
http://forum.doom9.org/showthread.php?threadid=61063
no solution at the mo.
JFerguson
21st January 2004, 19:33
Originally posted by HyperYagami
see if you have the same problem as I do:
http://forum.doom9.org/showthread.php?threadid=61063
no solution at the mo.
I think HyperYagami might be onto something here. I looked at the .SCC file and the timecodes extend out to twice the length of the movie.
And, my .RAW file does have double entries like his example.
Something interesting...when I run SubRip 1.17.1 against the VOB files containing the captions, it pops up a dialog that says "Closed Captions detected (2x).". In the dropdown for its language streams, it lists:
00 - English 0, (closed caption/normal size char) wide
01 - English 1, (closed caption/normal size char) letterbox
So, this seems to be the root of the problem. Captions either exist twice or are incorrectly identified as so and are extracted twice.
What's the solution?
Stopgap, I could write a little script to postprocess that .RAW file to remove the double entries -- then one could just run vobsub2scc it to generate a new .SCC file. However:
are there any other files besides the .RAW file that vobsub2scc depends on?
the .RAW file has double entries - that is known. In examining the file, for each duplicate timecode entry, there seems to be three conditions: both entries contain "80"s, the first entry contains data/second entry contains "80"s, or the first entry contains "80"s/second entry contains data. In the latter two conditions, which entry should be trimmed? I'm guessing whichever one doesn't contain data...
Here are examples for the second bullet above:
Condition 1:
00:00:00.000
80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80
80 80 80 80 80 80 80 80 80 80
00:00:00.000
80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80
80 80 80 80 80 80 80 80
Condition 2:
00:01:49.275
54 20 45 ce 45 cd 49 45 d3 2c 94 6e
00:01:49.275
80 80 80 80 80 80 80 80 80 80 80 80
Condition 3:
00:01:48.808
80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80
80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80
80 80
00:01:48.808
94 20 94 20 13 6e 13 6e 5b 20 57 ef 6d 61 6e 20
5d 80 94 ce 94 ce a2 57 45 20 c1 52 45 20 ce 4f
Two things to note between Condition 2 and Condition 3:
Condition 3 was more prevalent
in the examples above, Condition 3 was the timecode entry just prior to Condition 2
Longterm, this seems like a bug in the software that does subtitle/CC processing. Could the author fix?
HyperYagami
22nd January 2004, 07:13
Yeah, I wrote a little program for that to kill the duplicate set, but and then I get to situation #2 and #3 in the same movie, and I was like "so...should I kill the 1st one or the 2nd one?". I was hoping McPoodle might see the thread that I started but he/she never replied...
I see a new version of cctools there -> http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML
I wonder if the newer VOBSUB2SCC there will fix the problem, although I'm not sure if it's the problem of VOBSUB or VOBSUB2SCC...
McPoodle
25th February 2004, 06:14
I'm sorry that I missed both threads on this subject until now. I have a theory on what could be going on, but I need to confirm it. Could one of you download the diagnostic tool DVBDump (http://www.geocities.com/mcpoodle43/SCC_TOOLS/dvbdump.exe), run it against one of the VOB files (or a ripped MPEG) that is displaying the problem, and send the output to me?
--McPoodle (mcpoodle43 (at) yahoo.com)
JFerguson
20th April 2004, 19:12
McPoodle, I just sent you output for Runaway Jury R1 Widescreen. Same problem, subs extended out to over 4 hours in the .SCC file.
JFerguson
21st September 2004, 04:09
I just did "The Passion of the Christ" R1. The same thing happened again, here. All the same conditions as above: double timecode entries, SubRip v1.1.7 says "Closed Captions detected (2x)", etc. etc.
McPoodle, where are you??? :(
I'm ready to help!
Some things:
. The .SRT file looks fine. I ported this to .SCC using subrip2scc.exe, but the end result (merge) was playback-buggy.
. I used CCExtract.gp (General Parser) to extract the captions from the .M2V file, but the timings were progessively off... 10 minutes by the end of the movie.
p.s. - I'll message McPoodle...
McPoodle
21st September 2004, 06:06
I'm not sure where a 10 minute delay would come from. 1 minute could be drop vs. non-drop, and a half-hour or more could be from caption-doubling.
I'm not sure if it will help, but I've got a tool that will split out duplicate captions (i.e. the problem from April). Download http://www.geocities.com/mcpoodle43/SCC_TOOLS/split.exe. Run it with the argument "-h4 -s2 ", followed by the name of the .bin file created by General Parser. This will create two new .bin files that you can play with. As I said before, this probably won't fix a 10-minute delay, but try it anyway.
By the way, if Gabest happens to read this, your VobSub program is suffering from the same caption-doubling problem--if the closed captions on the DVD are incorrectly formatted (both fields are flagged as 0xff for Field 1), then both are extracted. The correct behavior is to only take the first pair and discard the second.
JFerguson
21st September 2004, 06:28
Hey, McPoodle...
I guess I should've been clearer...
The caption doubling did result from Gabest's Vobsub Ripper Wizard. His .cc.srt file looked fine, but the .sub.cc.raw had the caption doubling. I tried converting the .cc.srt file to .scc using subrip2scc.exe and building with that...the timings looked good, but there were errors in caption playback.
I then tried your CCextract.gp script. It did not double the captions, but the playback timing was off. First, I thought it was off by a few seconds, so I shifted the file, but it was still off. I dumped its .scc file and the one I made from Gabest's .cc.srt file to .ccd files and compared them and Gabest's timings were on the money (but again, not all of his captions played correctly when built).
I guess I need to post-process Gabest's .sub.cc.raw file, but I don't think your SPLIT.EXE utility was designed for this?
D3s7
21st September 2004, 13:51
Jumping into this cold but I'll take a look at vsrip and see where the issue is..
Although, guessing off of what I already know, vsrip just does a raw dump of the packets in the .vob into a .raw file appending the packets w/ the PTS timestamp.
Probably seems more likely we need to look at the vobsub2scc app to get rid of any duplicates...
McPoodle: not knowing much about the CC packets, this: "both fields are flagged as 0xff for Field 1)" can be found where in the packet? Is the correct assumtion that this should be line 21 and the second should be line22 ?
McPoodle
21st September 2004, 16:29
Close--it's Field 1 and Field 2 of Line 21, where Field 1 is the only part that anyone wants.
Each DVD closed caption packet is supposed to look like this:
00 00 01 b2 43 43 01 f8 9e ff 94 20 fe 80 80
ff 94 20 fe 80 80 ff 94 f4 fe 80 80 ...
That's the User Data header (00 00 01 b2), followed by the DVD Closed Caption header (43 43 01 f8), followed by the attribute byte (9e, which means that there are 15 frames of caption data, 0x0f * 2, following the ff...fe pattern, + 0x80). This is followed by each frame's worth of caption data: 0xff followed by two bytes for Field 1 (closed captions), then 0xfe followed by two bytes for Field 2 (XDS). The algorithm I originally used to handle this was to look for 0xff and output the two bytes that follow.
However, for whatever reason there are a number of DVDs that do not follow this pattern. Instead they use 0xff to flag both Field 1 and Field 2. The solution is to use the attribute byte to determine the pattern and then grab alternate pairs of bytes. If the high bit of the attribute byte is set, then the pattern is Field 1 followed by Field 2, while if the high bit is clear, then the pattern is Field 2 followed by Field 1.
Anyway, I'll get to work adding a switch to VOBSUB2SCC to handle duplicates.
D3s7
21st September 2004, 17:21
ah got it ...
I ran across your layout too (http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_FORMAT.HTML) which really helps... - just notced it was yours. hehe..
so you look at bit 7 of the attributes to see if it's 0 or 1 then grab (if 0) the first pair and (if 1) the second pair (FF/FE respectivly)
I have to say the caption bit flag kinda threw me a little - "Caption Count: How many caption segments in the packet (same as number of frames in GOP).", the dvd I was looking at that was set to 3 (0 011)
I can add the code to vsrip to skip the bit too... but might be easier to leave it there and do it via vobsub2scc. I thought about merging the two apps.... adding the conversion directly to vsrip.. generate both a .raw and .scc file but that be a project for another day :)
McPoodle
22nd September 2004, 06:30
Alright, I've got a new version of VOBSUB2SCC (http://www.geocities.com/mcpoodle43/SCC_TOOLS/vobsub2scc.exe) out. The documentation will follow in a few days, but you can see the new options by running the program from the command line without any arguments. Basically, use "-1" to keep the odd-numbered packets, which probably the ones you want. If you end up with an empty file, use "-2" instead to keep the even-numbered packets.
JFerguson
22nd September 2004, 14:46
Hey, McPoodle. I'll give this a try...
I was reading your exchange here yesterday between you and D3s7 and did a little testing with this .sub.cc.raw file.
It doesn't appear that you can uniformly strip one entry or the other. I tried it and the subtitles were truncated both ways. This file that I'm working with exhibits the same conditions (2 and 3) that were documented above, and it was only when I selectively kept the packet that contained data that the captions came out whole.
So, I'm not sure what your utility will do, but I'll try it...
JFerguson
22nd September 2004, 15:18
Ok, I just tried it. You run this against the .sub.cc.raw file, right? The output seemed pretty messed up (reading it via ccasdi disassemble), so I'm not sure if I did something wrong?
D3s7
23rd September 2004, 20:37
hmm.. would it be easier if I changed vsrip to generate 2 files one for field1 and one for field2 given we know the appriate order and don't need to rely on the FF & FE flags?
or would it be better if knowing which was suppose to be FF and which was suppose to be FE I updated the .raw file accordingly to fix these types of issues
Either is do able
although based of what JFerguson just said, the app (one of them) should check which one of the bits has data (80 being empty/padding)
although that condition seems like an authoring error somehow?
McPoodle
24th September 2004, 03:41
OK, the version I posted last night clearly didn't work. I re-worked the file and it can be downloaded from the same location. This time it looks for duplicate timecodes and decides which packet to use. If one of them has a caption in it, the choice is easy, but if both are filler, it uses the second one. Due to the changing pattern bit, sometimes it is the first packet that has the data, and sometimes the second one.
This verion will provide all of the captions, but the times are still a little off (45 seconds late after 2 hours). I'm working on figuring this out and hopefully will get a completely-working version out before the weekend is out.
McPoodle
24th September 2004, 03:49
Answering that latest post, I suppose either solution could work. There are two complications to worry about. First of all, some DVD's change the pattern in the middle of the movie from Field 1, Field 2 to Field 2, Field 1. And second, 0xff is sometimes used to flag both Field 1 and Field 2. The Passion of the Christ captions I'm working on for JFerguson show both problems. If you can work up a reliable way to get just Field 1, then that's what is needed, as Field 2 is always empty (in case you're wondering, Field 2 has some use in the world of broadcasting, but absolutely no use in the world of DVD's). I believe that the attribute byte and counting packets is the most-reliable method available.
McPoodle
27th September 2004, 01:15
I've got a new version of VOBSUB2SCC available for download at http://www.geocities.com/mcpoodle43/SCC_TOOLS/vobsub2scc.exe. I found that the Field 1 and Field 2 packets at each timecode are not always the same time. The pattern for the set I was working on was for Field 1 to be one frame longer, then the next timecode would have Field 1 one frame smaller. Since which of the two packets was Field 1 and which was Field 2 would change with no way of me knowing which was which, I found that if I kept track of which packet had captions last, it allowed me to reduce my error from 45 seconds in 2 hours to only a few frames. I'll never be able to get it perfect without knowing which packet is which, however.
I've also updated the documentation at http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML to cover the latest changes to CCExtract, RAW2SCC, VOBSUB2SCC and CCASDI.
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.