MUS Serialization Format
MUS (Marshal, Unmarshal, Size) format is quite straightforward and it won’t take you long to figure it out. In its specification, you can find in fact only two things — rules for encoding simple data types (such as byte, int, float, bool, string, list, map, or pointer) and DTM definition (more on it later). “Keep it small and simple!” — states one of the well-known design principles.
As you can see, the format uses almost no metadata, which determines its main advantage — the small number of bytes required for encoding. [144, 3] — this is how, for example, the number 200 of type int looks like, encoded in the MUS (Varint) format. However, none of these bytes will tell us what type is encoded here. It’s good when we know it in advance, but what if we don’t?
The MUS format suggests using Data Type Metadata (DTM):
DTM + data
Now, having decoded DTM, we can find out what type of data is next.
You may ask — “But how does the MUS format encode/decode structural data types?” The fields are encoded in turn, first the first, then the second, then the third, etc. Decoding works the same way — if we know the type of the encoded structure, we know the types and order of all its fields.
You may ask again — “Okay, but data changes over time. What if we want to do something simple like reorder fields, change the field type, or delete a field?” The MUS format proposes not to change the data but to create a new version of it + use DTM to distinguish one version from another. DTM can simultaneously define both the data type and its version, for example:
const FooV1DTM = 1 // type Foo, version V1
However, dealing with multiple versions of the same data (legacy clients support can lead to this) is not something you would wish on someone. So what can we do? Actually, it’s not that bad, because among all these versions, there is only one that we are interested in — the current. For example, having two versions of Foo - FooV1 and FooV2 (current), DecodeFoo can always return the current by migrating:
func DecodeFoo(...) (Foo, error) { // Where Foo is an alias for FooV2.
// Using DTM we can distinguish one version from another:
// - For the first version - migrate it to the current and return.
// - For the second version - simply return it as a current.
...
}
This functionality can be encapsulated in a separate package or module, and the rest of our code won’t even know that different versions exist.
The MUS serializer, DTM support module, and data versioning (migration) support module have already been implemented for Golang. The serializer itself, by the way, is quite fast and simple at the same time. Writing its main code took me only 3 days (of course, without tests)!
And one more thing, the MUS format is also suitable for streaming. Moreover, with it, there is no need to encode the length of the data before the data itself. All we need to know on the receiving side is the data type.
That’s all, thanks for your attention.