Closed-caption issue with VLC (+ video about it) [Archive]

Perenista

2nd June 2025, 00:48

https://www.youtube.com/watch?v=OSCOQ6vnLwU

There is a video explaining that "Closed captions on DVDs are getting left behind".

During that, the author mentions VLC, saying it display those from DVD (CC) wrong. Is that still the case?

Also, that comment explains something missing from it:

+++++++++++++++++
I'm a former chip architect for Set top box, DVD player, BluRay and DTV chips. Nobody forgot. The CC render is not implemented because the IP licensor for EIA captions demanded royalties which were far higher than the market would bear, and the requirement to use the official EIA font meant a 2nd license from the font licensor was also required. So it was decided not to include this tax on every unit sold since the captions were expected to be rendered on the subtitle channel.
+++++++++++++++++

Another user also pointed out:

+++++++++++++++++
FFmpeg maintainer here, and the details behind the caption decoding issues you're seeing in VLC are complex and horrific. They largely stem from how the EIA-608 caption format expects text to be laid out in a monospace grid onscreen, which isn't really how the text rendering stacks used for modern subtitling work (this is probably why changing the font caused problems on those Sony players); beyond that, the behavior can just end up pretty complex, and there's no convenient public-domain corpus of sample files for open-source software developers to test against.

These kinds of issues also affect the Japanese (ARIB) and European (Teletext) formats to varying extents. These days, a lot of the focus ends up being on converting the text into modern Unicode text formats, styled using modern techniques, so direct rendering of the legacy formats hasn't had as much attention lately. If anybody reading this has some badly-behaved samples and wants to contribute, patches to improve the decoding and rendering behavior are definitely welcome, though!

One small correction: at 1:49 and 14:53, you say the players are "decoding" the captions, which isn't quite right. The captioning data embedded in the MPEG-2 bitstream headers (or H.264 in some more recent contexts) is the exact same format found on line 21 of the NTSC TV signal, so the player doesn't actually have to decode anything!

It just takes the 16 bits of caption data for the current field of video, and modulates them into low (0, black) and high (1, white) signals on its analog output, without having to know anything about what those bits actually mean. (EDIT: reworded for clarity, since some people seemed to think I meant that the caption data was literally stored as a row of pixels in the video frames on the disk, which is not the case.)
+++++++++++++++++

Emulgator

4th June 2025, 17:26

If somebody wants to dig into that I can point to a hardware implementation:
Zilog Z86229 NTSC Line 21 CCD Decoder (DIP18-300)
Can decode and overlay for 525- and 625-line systems
NTSC fH = 15,734.265kHz tH=63.555µs
Generated Output:
The active line width is divided into 48 cells, each giving total width for 16 dots.
Character Cell Width: 1.324 µs (tH/48)
Dot: 1/768*fH -> Dot Period: 82.75ns (12.0839MHz)
We sacrifice 2x 7 cells on the sides and use only 34 cells width for the CC canvas.
Box Row Width: 45.018µs 34chars = 34/48*tH
On the sides we further sacrifice 2x 1 cell width for canvas border and yield 32 cells net width for filling with characters.
Char Row Width: 42.370µs 32chars = 2/3*tH
Character Cell: 16x26 dots (2+Character+2, top+2 bottom+6.
The 2px headroom is never used, underlengths will use the 6px Footroom fully
Normal Capital Character: 12x18 dots, stroke thickness: 2 dots

Vertical Rendering is per field upon scan lines 43..237 of 262 (195 scan lines per field in total)

FCC: Caption data can appear in any of the 15 display rows, but a single caption may consist of no more than 4 rows.
Pop-on: After the buffering is completed: 1 complete caption is rendered
Paint-on: As characters are buffered, they are rendered one by one.
Roll-up: As lines are buffered completely, the old line is shifted up and the new one becomes base line,
up to 4 rows are filled, then the upper is deleted as the next moves up.
The Z86229 can display single captions spanning up to 8 rows.

0x00: x is row number. 8 is centered, there are 7 rows above (195 scanlines) and 7 rows (195 scanlines) below.
The canvas height would be 390 scanlines, 81,25% of the total 480 scanlines.
Using all video lines would yield 18,46 cell rows height, so there is headroom/footroom ~1 3/4 cell height each.
The canvas width would be 34/48 total (510 video pixel, 70,83% of the total 720 video pixel).
Characters can use 2/3 total (480 video pixel width, 66,66% of the total 720 video pixel).
Per side 1/6 width stays empty. Top and Bottom: 9/96 (9,375%) height stay empty.

And, IIRC, there is still the Microsoft Line21 Decoder waiting to be fed and hooked into.