This is in particular about the decoding of opentype fonts, where data is not always sequentially encoded: byte 100 can contain data that is needed to properly decode byte 20. For instance byte 100 starts
a uint16 that defines the length of an array starting at byte 20. currently, it is hard (maybe impossible) to move to byte 100, decode the length, then move back to decode from byte 20.
This might be a quirk in opentype (1), but I suspect that there are more cases where it would be useful to be able to work with offsets when decoding Bytes
.
If this is indeed common, that means that decoding binary data is not really a sequential process. Currently,
binary decoders and json decoders move from the beginning to the end of the input. Decoding bytes with offsets means jumping through the input to do decoding.
That seems much more complex, so I’m trying to be really cautious. Can we find a nice way to decode this kind of structure? Are offsets actually commonly used in binary protocols?
Some more context
I’m not particularly knowledgable on fonts or binary protocols. My eventual goal (which a bunch of folks in the svg/visualization space have also been thinking about/working on) is using font information for smart layout of svg text, for instance smart label positioning in visualizations and maybe an elm-ui like layout mechanism. I believe font rendering with webgl also interests some.
The opentype spec defines a table at the start, that gives a list of the rest of the tables and their starting position (number of bytes) from the start of the file.
E.g. table “cmap” starts from byte 6234, table “os2” starts at byte 7134, etc.
The most difficult problem is decoding the hhea (horizontal header) and hmtx
(horizontal metrics) tables. Tables are stored without any order, so it can be the case that hmtx
occurs before hhea
, but the header specifies the actual length of the hmtx
table. While a hacky solution might be possible, it would be extremely fragile.
notes
(1): This section at the bottom mentions that the use of offsets allows sharing of data between multiple
fonts in the same file. So maybe this style of encoding is specific to opentype.
(2): offsets might be related to slices (for which a use case is described in this thread), but according to mdn a slice will copy (and thus allocate) which is not really required here.