PDA

View Full Version : Closed caption; decode to file


Imlurker2
25th March 2006, 14:43
Hi,
I recorded some programs off the TV (USA) which contained closed captioning (line 21 (?)).
I now want the computer to transcribe the closed captioned data into a text file.
I have been unable to locate a software package for decoding and transcribing. If you know of one please advise
(moderately priced is also acceptable).

Use of VOB files is preferred. The recordings appear to have the cell lengths of approximately 1 to two minutes. That converts into a lot of m2v files. I tried Restream 0.9 to see how it would work but all I got was a very ugly non-ASCII string.

TIA
lurker

mpucoder
25th March 2006, 18:24
McPoodle's site (http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML) should help you.

Imlurker2
27th March 2006, 23:41
Thank you but that merely gets me deeper into the belly of the beasts.
Discussion begins with closed caption and immediately progresses to subtitles.
As such I did not find a definitive answer. Mention was made of CCextract which uses MPEG files. What will be the effect of the conversion from vob to MPEG on the data on line 21? That may end up to be rhetorical question. Thus the result may be there but the path is uncertain.

I was hoping for an answer similar to," use xxx.exe".

If I use the Windows media player, the information can be displayed i.e. accessible.

slk001
28th March 2006, 17:55
Use VSRip to extract the RAW data from your VOBs. It extracts RAW data, because that is the way the data exists in a VOB. Then go back to McPoodle's site and fetch his SCCTools package. Use VOBSUB2SCC to convert your **.cc.raw file to **.scc. Then use CCASDI to convert the **.scc file to **.txt file. Once here, you have the closed captions in true TEXT format. What else do you want?

Imlurker2
29th March 2006, 04:24
Thank you for the response.

[QUOTE=What else do you want?[/QUOTE]
Ouch!

I did run vsrip and got no output.
I thought it due to the recording method.
I wrote the 'cc line 21' post.
I read your post.
Reran vsrip.
Got output.
Concluded operator error on previous attempt.
Excuse me. :)

I will continue on with your suggestions.

:thanks:

VoodooShizzle
5th April 2006, 01:46
There's a program called ATI TV, part of the ATI series. That will do it. I do captioning for a living so I know it works. You might need to use the "Video Magazine" feature, or it may just start automatically when you run the program with video patched in...

slk001
5th April 2006, 16:42
Where do you get ATI TV? Or are you talking about the video card line?

Imlurker2
6th April 2006, 03:39
Thanks. Sounds good.

After I received the info above regarding vsrip, I did try again and it did yield some output but it was woefully lacking. Two hours of video produced about five minutes of text. I did happen to notice the text had line 1 or line 3. My recorder has two inputs: line 1 or line 3. So I began with a new disc and things worked. Bad disc plus whatever.

I do find myself doing a great deal of editing however.

The output looks like:
00:00:06:00 NUESTRO CASA DEL DI{Í}A DE HOY,
00:00:12:00 NO ES DE LA MISMA
00:00:32:01 RAZA AFGANA._

As opposed to: NUESTRO CASA DEL DI{Í}A DE HOY, NO ES DE LA MISMA
(but I can get it!)

The editing is more complicated than merely getting rid of the numbers.

If you could post a sample (or PM) of what the text output should look like I'll appreciate it. I would be purchasing it if it would help me avoid all the editing.

I briefly looked at the ATI stuff and my impression is that I
have to go the "Video Magazine" route.

TIA
lurker

slk001
6th April 2006, 15:31
I re-read your posts, and I cannot determine what you are trying to do. Are you trying to capture CCs from a DVD or off-the-air? What do you want to do with them? If you're trying to convert the CCs to subtitles, then you will HAVE to do considerable editing, since Closed Captions are limited to 32 characters per line.

Imlurker2
6th April 2006, 18:27
I was thinking of future activity RE: the ATI TV thing (off-the- air) but the original post was concerning past activity where shows were already on the dvd. Thus two separate activities.

I had not considered computer plug-in module because I thought it would cost more and simply ignored it.

If the editing would have been substantively easier, I simply would have ignored my current set of off-the-air recordings.

I was using it to study Spanish. For a normally recorded dvds, there seems to be two different Spanish languages. 1 for Gringos and 1 for Spanish speaking individuals. The Spanish language text != the Spanish language verbage. Off-the-air recordings, though not perfect, are much closer in text and speech.

Thanks for the CC line limitation. I assume that the off-the-air method would not include the 00:00:01:02 stuff.

You've been helpful. Sorry about the confusion.

Thanks