Doom9's Forum - View Single Post

Mosu · 13th October 2008, 18:57

Quote:

Originally Posted by KoD

I noticed that mkvextract extracts chapters to utf-8 files with a signature, which is great.

However, when extracting chapters with mkvextract, the contents of a <ChapterSegmentUID format="hex"> element is made of illegal UTF8 characters. Is that the expected behavior ? I think the file is not a legal xml file in this case. And it can't be opened in a text editor either, it has to be opened in a Hex editor who doesn't care about the contents of the file.

You're mixing things here. The XML file itself is encoded in UTF-8, that's correct. From the point of view of the XML layer the content of the ChapterSegmentUID element contains only 17 different characters: 0-9, A-F and spaces. All those are perfectly valid UTF-8 characters.

The fact that mkvmerge translates this into binary data is entirely another matter. This happens after the data has been removed from the XML container, and therefore XML doesn't care anymore what happens to it.