PDA

View Full Version : How to convert an image sequence to Vobsub format ?


Le Furet
27th April 2003, 11:27
Here's the problem : I am currently working with a friend on a way to extract hardcoded subtitles (that is : detcting them, not removing).

Although the project is in a very early stage, it works quite well.

I'd like now to try to convert the image sequence resulting of the detection in a .sub Vobsub file, so I could use Subrip to do OCR on it.

So, I'm looking for a soft doing such a conversion, or, even better maybe, a routine from an existing soft that can do this job.

If neither of it exists, could someone tell me where I can find specifications about the .sub vobsub format ?

madoka
28th April 2003, 23:43
I thought SubRip can process image sequences directly, no?

Le Furet
29th April 2003, 08:16
Are you sure ?

I may be dumb, but I don't see. I know Subrip can save image sequences, but not it can read it. Could I have miss something ?

madoka
30th April 2003, 03:55
I thought that's what the menu File->Open Image Sequence does, but I've never tried it...

MasterYoshidino
30th April 2003, 06:15
when you say hardcoded you mean not from a bup file that can be streamed into SubRip but a Permamently encoded sub like a fansub would have. No?

SubRip only works with bup images by opening an IFO or VOB, by streaming the subtitle stream, so hardcoded subs, not optional subs, are impossible for SubRip to do :rolleyes:

if you could make a VOB with sub stream though ...

ppera2
1st May 2003, 14:54
There is a V Dub plugin named Sublog Extractor from vielle@bigfoot.com with same purpose. I suggest that try it before further development. It has option to export pictures in VobSub format, and it works more-less good.
What is not good, by my testing is that plugin is not enough intelligent, and if in clipped picture part near to title is same color, it can't separate it.
No chance to SubRip result, only in case that it is placed in black border.
Maybe your proggy is better in this? (It would be very useful).

Shalcker
1st May 2003, 16:31
I've briefly looked into source of SubRip, and it seems that color selection is disabled by default when you attempt to process image sequence, therefore making it impossible to properly OCR image sequence unless some (unknown to me so far) rules are followed when creating BMP which should make color selection unnecessary... BMP should probably contain no more then 4 colors in proper order...

Originally posted by ppera2
There is a V Dub plugin named Sublog Extractor from vielle@bigfoot.com with same purpose. I suggest that try it before further development. It has option to export pictures in VobSub format, and it works more-less good.
Too bad that source code of SubLog wasn't released :(
...and author doesn't seem to respond too.

What is not good, by my testing is that plugin is not enough intelligent, and if in clipped picture part near to title is same color, it can't separate it.
No chance to SubRip result, only in case that it is placed in black border.
Well, regarding possible separation algorithms, either _extreme_ edge-restricted picture blur (maybe big chain of 2DCleaners or SmartSmootherHQ) or/and temporal analyzis should separate them.

But if scene is static between subtitle appearance and disappearance, then temporal approach will fail; and if colors around subtitle are really close to subtitle color and cover large area blur approach can fail too...

Maybe your proggy is better in this? (It would be very useful).
Color-based approach for separating subtitles from non-subtitles (in my opinion) is doomed from the start and works in quite limited number of cases.

not to mention that in case of SubLog we can only set one color while average fansub can have up to five font colors or more.

Decisions should be made based on spatial and temporal properties of subtitles, with colors used only as "aid" in areas selected by other means.

I've made a similar program to help our russian fansubbers which generates timecodes based on hardcoded subtitles. It works rather well exploiting only basic subtitle spatial properties (close to 95-99% "true" results for subtitle appearance-disappearance detection, and close to 60-80% subtitle change detection - no color selection is needed in most cases).

And i'm currently thinking about simple temporal analyzis to enhance separation of subtitles from non-subtitles, which should allow saving "separated" picture usable for OCR.

ppera2
1st May 2003, 20:33
As I saw, areas with same or similar color as subtitle one were in most cases much larger than characters of sub, so it could be base for some algorithm to remove them.
Problem could happen if characters aren't separated well, but it's rare case.

Le Furet
2nd May 2003, 00:07
Sorry, I have no time to read all Shacker's post.

Just wanted to mention that I knew Sublog. I know also that colour based algorithms are doomed, and I just want to mention that our algorithm uses a far different idea, that is a DCT transform followed by a frequence analysis. It works far better than colour based algorithms (typically, I tested it on the béginning of Wolf's Rain #1, white text on snow, the detection works perfectly : it misses nothing, and adds very few fake subs. Though, the separation subs-backgroung still needs amelioration, and if I have time, I'd like to use then subs extracted as mask for a filter I plan to remove hardcoded subtitles).

I'd like also to use Sublog's conversion routines, and I was on the point to mail the guy who made this tool, but I you say he doesn't answer, it'll be useless. since we're both french, I'll see if it's possible to contact him in another way.

If you have docs on .sub Vobsub format, I'm interested.