I'm not sure what the goal here is. If it's about storage and being able to split/append and performing similar transformation, you can just mux the subtitles to Matroska.
Otherwise, are you trying to make universal subtitle format that can contain both bitmap subtitles and text subtitles and makes rendering easier by having everything in a unified format? That could be a bit difficult, as different formats allow for a wide range of feature sets. For example, SSA/ASS subtitles allow a wide range of transformations and other things and that format is pretty popular.
|