PDA

View Full Version : Closed Captions from ripped PVR stream - how to extract?


Yufi
31st January 2003, 06:52
Before I say anything, I wasn't sure which forum to put this into, but I figured Subtitles is the most relevant, so here it is.

Ok, I have Digital Satellite through Dish Network using a PVR501 receiver. Incase you don't know, these receivers copy the direct stream from the satellite to a hard drive inside the unit. One can then put the hard drive in a computer and extract the streams. Now, most of these streams have video of 544x480. I've been interested in ripping the closed captions from these streams; it's the same as normal TV, with the two or so pixel's worth of lines at the top containing all of the CC code. However, I'm not sure how to convert this into a usable format. I know that there is the graphedit method for DVD vobs, but that doesn't work for this. The text file that I output the raw data to is completely empty when the video is finally done playing. I was hoping that perhaps someone knew of a way to get this working such that I can extract the closed captions and make them into a subtitle format.

Yufi
2nd February 2003, 20:04
Someone here must know how to get the closed captions out of the stream... I'm getting desparate here hehe, I really need to turn them into subtitles for a friend.

paula001
2nd March 2003, 05:11
I have the same Dish receiver. You need some device to grab the caps from the stream. A bunch of cheapie video capture cards (with the BT878 chip) will save the closed captions into memory and dump them to a file when the program is closed. The bad news is that these captions have no time references. That leaves you with a nightmare of manually entering the times.

There are a very few affordable devices around that attach a time reference to captured captions. I found a Hubcap CC from the mid-90s that does this. It is the PC's time so some adjustments are necessary. I use MS Excel for that.

What program are you trying to caption?

McPoodle
3rd March 2003, 05:48
Are you working off of MPEG-2 files? I don't know how Dish Network transmits the captions, but ReplayTV packages them inside the MPEG-2 files as a user data packet (if you open the file in a hex editor you will find the sequence 00 00 01 b2) with the signature of bb 02. If you can see this six-byte sequence in your files, then you should be able to use the DVB2SCC tool (http://www.geocities.com/mcpoodle43/SCC_TOOLS/scc_rip.exe) I wrote for ReplayTV files. Run it from the command line with an argument of "-d" followed by the name of the .mpg file to process. The program will create two files with the same base name as the .mpg, one with an extension of .bin, and another with an extension of .scc. DVB2SCC's not the fastest program in the world, processing 4 MB/second on my PC.

The .scc format, in case you are not familiar with it, is a plain text file listing hexidecimal data for various timecodes (in other words, it's not very readible). If you are interested in using the file to add closed captions for DVD authoring, you leave it in .scc format, but you can edit it using the CCASDI tool (http://www.geocities.com/mcpoodle43/SCC_TOOLS/ccasdi.exe), which converts .scc files back and forth to human-readible .ccd file format. This format is documented here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML#ccd).

On the other hand, if you want to turn the captions into subtitles, then you can use the .bin file as input to the CCParser program, which is available for download on Doom9 (the output is SubRip format). This will only work if you remembered to put "-d" in the command line of DVB2SCC.

If DVB2SCC doesn't work, look for 00 00 01 b2 in the .mpg file and post from there to the next 00 00 01 sequence, so I can see if I can figure it out.

Yufi
25th April 2003, 05:08
I didn't expect a reply after the couple weeks I waited, so I stopped checking this thread hehe. Here's the hex for the points you told me to go to:

000001B20502008001B36905040080018A5B040980808080020A850200000000000000000101

DVB2SCC says it has no captions. The only real popular ripping program for my PVR (the DishNet PVR508) outputs the audio and video seperately; audio to an .mp2 and video to an .m2v. It also has the option to output a transport stream, but I have never tried using it before. Do you think chances would be more likely ripping from the transport stream would work?

McPoodle
27th April 2003, 01:57
OK, I changed my DVB2SCC program to handle this format: get it here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/scc_rip.exe). Take a close look at any file you ultimately get out of the tool and let me know if there's anything extra or missing.

P.S. The captions are stored inside of the video stream, so the transport stream is not necessary.

Yufi
30th April 2003, 05:38
I get the following when I try to run it:
Creating buffy.binCan't call method "size" without a package or object reference
at dvb2scc.pl line 98.

It then creates a 1kb .bin, which contains only:
CC

I just ran it as dvb2scc -d buffy.m2v. I also tried without the -d and it does the same.

McPoodle
30th April 2003, 08:41
I re-posted a version of the file that won't give that error message (use the link in my earlier post), but that doesn't correct the fundamental problem of not getting any captions. If there's any way you can get a sample video file where I can download it, I'll see what I can do. A length between 1 MB and 10 MB would be ideal, as I have a slow modem, but if you can't get an MP2 file smaller than that, then I'll just have to bite the bullet and spend a day downloading. It can't be too small, however, or it might not have any captions in it at all.

Yufi
30th April 2003, 22:48
Ok, I ran that version on my .m2v and it began creating the .bin. I checked the properties for the file and the size was rising, so I knew stuff was being written. After it was done, it created the .scc as well. The .bin was 290kb and the .scc was 234kb.

Opening the .bin, I found it filled with the following:
CC 

 

 

 


 

 

 
That repeated on and on the rest of the file.

The .scc was filled with the following:
00:00:00:05 626d 0000 0000 0000 0000 8080 626d 0000 0000 0000 0000 8080 etc etc
That number sequence was repeated throughout the entire file- never varying. Also, the time codes never went above the 26 minute mark- the file for the show I'm processing is 62 minutes.

I tried CCParser, and it said the creation of the subtitle file went fine, but upon checking, the file was 0kb.

I just checked right now, and the mpeg itself doesn't have any closed captions until the show itself starts- does that make any difference? Would the fact there's no captions for the first 5 or so minutes impact it at all?

If you want, I can try reencoding the file, but cropping so only the top 8 pixels are there, so that the resulting .m2v will be smaller and easier to transfer for you to look at.

Edited in:
Also I just checked the .m2v in a hex editor- it appears as if that strand of hex is different for each .m2v. I just checked a couple others, and they're all different (although they all start with 01B2 0502 I believe). Does this make a difference?

McPoodle
2nd May 2003, 09:47
OK, I wrote a program that will strip just the raw closed caption data from an MPEG file. Download DVBDUMP (http://www.geocities.com/mcpoodle43/SCC_TOOLS/dvbdump.exe) and run it with the name of the file as the only argument. The program is incredibly slow, so I set it to only process a small part of the file by default. The program will create a 100-line text file--open it up and take a look at it. If the lines are all identical, then the program didn't get far enough into the MPEG file to get to the captions--run it again, but stick in an extra argument of "-n500" between "dvbdump" and the name of the MPEG file. Increase the number after "-n" as necessary until the lines in the output start varying from each other. At that point, post your results here (if the file is really huge, post the last few identical lines followed by about a hundred of the differing lines), and I'll try to figure out how to change DVB2SCC to work correctly.

Yufi
2nd May 2003, 22:31
Ok, I ran that program, and after a few tries, for just incase, I ran it with 1,000,000 lines as output (guess that might have been a bit too many... hehe), and it came out at a 9.5mb file. I rar'd it and it's only ~580kb or so.
You can grab it from http://home.attbi.com/~RockStarJenn/angel.-3ms.rar
The -3ms in the name is just to indicate the delay on the audio file; I didn't bother to rename the .txt or .rar or anything.
And I also just wanted to say, thanks for all the hard work you've put into this lately! I didn't expect any real answers, let alone someone writing and rewriting programs to get it working!

McPoodle
3rd May 2003, 04:01
Good thing you sent me the whole file--it turned out that the pattern was rather complicated. You can download the revised DVB2SCC here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/scc_rip.exe). I had two different ways of writing this: one way as slow as that diagnostic program I sent you, and another that was four times faster. This is the faster one, but there's a chance that it might be dropping letters as it goes, so check the first file you run and let me know if it works correctly or not. I've got another program that translates an SCC file into something that's more human-readable--probably a lot easier than authoring a DVD and playing it on your TV just to check the captions. You can download it here (www.geocities.com/mcpoodle43/SCC_TOOLS/ccasdi.exe). Just type "ccasdi" followed by the .scc file--it will create a file with an extension of .ccd that you can open up in Notepad.

Yufi
3rd May 2003, 05:02
I ran dvb2scc again, on the same file that I got those raw results from, and it doesn't appear to have worked; I included the .bin, .scc, and .ccd in a rar at:
http://home.attbi.com/~RockStarJenn/dish.new.results.rar

I'll try dvb2scc on a couple other m2v's as well, and see if it works on them.

Yufi
3rd May 2003, 05:42
I just tried it on two other shows- South Park & Family Guy, both of which have the closed captions lines at the top- and it gives the "has no captions" line after going through the entire file. I think dishnet is just trying to make it hard for us hehe.

McPoodle
3rd May 2003, 10:14
In that case, we try the slow version: download from here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/scc_rip.exe).

Yufi
3rd May 2003, 10:34
Well, I just got done running the slow version on the file; it produced pretty much the exact same .bin & .scc file as the fast version. Do you want me to try cutting out about a 5mb or so chunk out of the .m2v where there are closed captions? I could do it with two different shows incase the data in each .m2v is different or something.

McPoodle
3rd May 2003, 10:45
It looks like there's no other choice.

Yufi
3rd May 2003, 17:33
Ok, I used TMPGEnc and cut out a 10 second clip of two different shows. When rar'd, it comes out to a total 4.8mb- so not too big. If you need longer than 10sec, I can recut them (20sec/each should fit, or just one sample of around 30 or 40 seconds). The link is:
http://home.attbi.com/~RockStarJenn/mpegsamples.rar

McPoodle
4th May 2003, 20:31
Neither file had captions in them--maybe TMPGEnc stripped them out when chopping up the files? You can try for the 20 second clips, but be sure to run dvbdump on them to see if there's anything extractable in them.

Yufi
4th May 2003, 20:46
I thought the two pixels at the top of the video indicated that there was closed captions? When I view those parts of the episodes off my PVR with closed captions enabled, they display the captions. I guess I just don't have that great of an understanding of them hehe. Anyway, I tried making both 20sec and 40sec clips with TMPGEnc, and both of them had no captions according to dvbdump, so I'm totally at a loss for what to do now.

McPoodle
7th May 2003, 08:04
Is there any way you can use your receiver to record something short with captions? I know on Tivo, if you cancel a recording more than about 10 minutes in, it doesn't automatically delete it. That way you'll have a complete file with captions that isn't too big. If that doesn't work, then put up a complete 30 minute file, so long as DVBDUMP will create output from it.

Yufi
8th May 2003, 05:53
I started, and then stopped, recording a clip of 22 seconds or so for the first program I found with captions, Dharma and Greg apparently. DVBDump does work on it, as well does dvb2scc process it and spit out a .bin and .scc (when converted, the .ccd still doesn't contain any of the real captions though). The link for just the video part from it is:
http://home.attbi.com/~RockStarJenn/rawstream.rar

McPoodle
8th May 2003, 07:52
I don't need to download it--I got another file from an e-mail correspondent. It appears that the caption bytes are not being transmitted in order. For example, a line in the CCD that looks like this:

00:00:01:10 {ENM}{}{}
00:00:01:13 {RCL}{ENM}
00:00:01:15 {RCL}OH{1520}{1520}THIN, NO{}G.{}{}{}{}{EDM}{EDM}
00:00:02:02 {EOC}
00:00:02:04 {EOC}


should in fact be this:
00:00:01:12 {ENM}{ENM}{RCL}{RCL}{1520}{1520}OH, NOTHING.{}{}{}{}{}{EDM}{EDM}{EOC}{EOC}

So, if it's transmitted as aabbccaabbcc, it should be displayed as bbccaabbccaa, noting that one command or two letters are transmitted at a time.

I'm currently working on not only disentangling this pattern, but keeping the program from slowing down too much doing it. I hope to have something by the weekend.

Yufi
8th May 2003, 12:05
ahh, ok. Is that related to the issue of dvb2scc apparently not going the entire length of the file too? When I've tried before, each time it only seems to have gone up to the 26 minute mark- no further- on each one of the shows I've tried that has captions. Is that just part of the captions getting scrambled too?

McPoodle
11th May 2003, 00:54
OK, I've got a new version of the file up that doesn't scramble the captions: download it here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/scc_rip.exe). As for ending the file early, let me know if you're still getting that with this version.

Yufi
11th May 2003, 08:10
Yay! It works! The .ccd shows all the text as correct.
The only problem I have now, is, despite using the -d option, CCParser only generates a 0kb file whenever I use it on the .bin. Should I just try it without -d? I'll look around tonight to see if I can find a program that will go from .scc to a subrip format or something similar I can use for SVCD subtitles.

McPoodle
12th May 2003, 06:36
Well actually, I've got an SCC to DVD raw program--maybe you can try that while I figure out why the -d switch doesn't work. Download from here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/scc2dvd.exe).

Oh, and you do know you can use the non -d output to add closed captions to an SVCD, right?

Yufi
12th May 2003, 08:13
oh really? how? the only way I've ever known of adding subtitles to an SVCD is using SubMux- which seems to only accept .sub files. I also need to do a bit of editting before I import it as well, as I have to cut all the commercials out and compensate the times for that. What program can add it from the raw files created without using -d? I've heard I-Author can add subtitles as well, but I've never used that program before.
Once again, thanks for all the help in just getting me this far- I started out with low expectations on ever getting closed captions out of these files, and now that I've been able to I couldn't be happier.

McPoodle
14th May 2003, 06:42
Closed Captions in SVCD's are an alternative to subtitles (they're also the only choice you have with VCD's). They work by putting the raw files in a particular directory of the disc. They will work correctly for any set-top DVD player and any software DVD player that supports captions (not all do). Here's the procedure:

[list=1]
Get the captions in CCD format so you can read them, then adjust the timing until they're the way you want. The times for captions represent when they are built in the off-screen buffer, up to half a second before they appear on screen. If you use the "-a" switch with CCASI, it will automatically adjust the times for you, so that going from SSC to CCD with move the timecodes to align with the display time of the caption, while using the same switch going from CCD to SCC will put the times back where they belong.
Use CCASDI to put the captions back in SCC format.
Use the SCC2RAW program to convert the SCC captions into broadcast raw format. Download this particular tool here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/scc2raw.exe). You'll end up with a .bin file.
Set up the SVCD in whatever tool you're using for that.
Rename the .bin file to match the file on the SVCD it's going to accompany. The captions for the first file on the disk should be named CAPT01.DAT, the captions for the second CAPT02.DAT, and so on. I'm assuming you've got a single menu before the first video, which would contain the episode, so CAPT02.DAT would be the name you'd use. If you are splitting up the video, you need to split up the captions too, with the timecodes starting over for each new video.
Add all of the caption files to the EXT directory. How you do this depends on which tool you are using to build the SVCD. For VCDEasy, you'd go to the ISO Files screen (the last button on the top), click on the EXT directory, and use Add Files to add the caption files. By the way, under the MPEGAV file you will see the video files that will be added to the CD in order. For VCDEasy, if you scroll over, you can see which renamed file is which, so you can be sure you named your caption file correctly.
Burn the SVCD.
[/list=1]

Now if you can get the original MPEG file trimmed exactly the way you want without converting to AVI (i.e. if there's no commercials in the middle), you can use the .bin file created by DVB2SCC directly, eliminating the first three steps.

I've never actually done any of this, but that's what the documentation I've seen says. I'm also assuming that you do have commercials in the middle and the timing is still going to be off, so of course burn to a CD-RW first. Since I've already given you the links to every other closed caption tool I've ever written, you might want to take a look at CCADJ (download here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/ccadj.exe)), which can allow you to move all the timecodes in an SCC or CCD file by the same amount. The arguments are "-o" followed immediately by the amount to adjust by ("-o0:00:01:00" to move everything ahead 1 second, "-o-0:00:01:00" to move everything back one second), then the name of the file to change, and finally the name of the new file to create.

McPoodle
18th May 2003, 02:46
The -d flag now outputs a raw file compatable with CCParser. I've also added a -td flag (to get dropframe timecodes). More importantly, I've added logic to rip captions from DVD MPEG files. This means that DVB2SCC is no longer the right name for this tool, so I renamed it to SCC_RIP. It can be downloaded here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/scc_rip.exe).

Also, I've finally updated my website. Go here (http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML) to download the entire suite of closed caption related tools and to view their documentation.

Yufi
20th May 2003, 07:43
It's been a few days, but I've been preoccupied and haven't been able to try out the latest versions until recently.

And good news- it works amazingly! It ripped the files fine, CCParser converted to SubRip format fine, and then it was just a simple matter of loading the file in SubCreator and redoing the timecodes (for some reason, the .srt/.scc is 6-7 minutes longer than the m2v itself... as if it's gotten stretched... but that's no big deal to me, I don't mind manually syncing it) by loading the encoded mpg into SubCreator and using the Ctrl-A cmd to place the timecodes for each line, so really it only takes the length of the show.

I've also succeeded into converting the editted SubRip file into a submux .sub file so I can mux in selectable subtitles (as well as use the captions you walked me through two posts above as a backup) for people to try (both SVCD style and CVD style subs). The only problem I have now is with VCDXBuild itself- it seems I can't have Subs, Menus, and Chapters at the same time... something about the Subs and PBC. But oh well, that has nothing really to do with your wonderful programs. Once again, I must thank you, I would have never gotten this far without your help, and I was suprised there are actually people out there that are willing to work so well to update their software as soon as a bug is found or they see a feature they can add!