Beyond JSON – Alternates
Previous - Beyond JSON – The Dominance
When you want to write data to a file or send it over the
network, you have to encode it as some kind of self-contained sequence of bytes
(for example, a JSON document). Since a pointer would not make sense to any
other process, this sequence-of-bytes representation looks quite different from
the data structures that are normally used in memory.
Thus, we need some kind of translation between the two
representations. The translation from the in-memory representation to a byte
sequence is called encoding (also known as serialization or marshalling), and
the reverse is called decoding (parsing, deserialization, unmarshalling).
Most of the programing languages have built in encoding libraries
for encoding in-memory objects (JAVA:java.io.serializable, Ruby:Marshal etc.).
But the issue with them is that the encoding is language specific. It’s
generally a bad idea to use your language’s built-in encoding for anything
other than very transient purposes.
Moving to standardized encodings that can be written and
read by many programming languages, JSON and XML are the obvious contenders. JSON
is less verbose than XML, but both still use a lot of space compared to binary
formats. This observation led to the development of binary encodings for JSON
(MessagePack, BSON, BJSON, etc.). For data that is used only internally within
your organization, you could choose a format that is more compact or faster to
parse.
In 2005, a new variant of SOA was coined, Microservice. By
2012, there were people adopting and experimenting with it. By 2015, the
architecture style gained the momentum. This is the biggest change, which has
prompted people to look for the alternatives to JSON. The reason for the
alternate are;
1.
Every Microservice should be capable of been
deployed in a separate memory space.
2.
Whole application was divided into small
functions interacting with each other over network.
3.
We needed more efficient way for data exchange
as JSON still has a higher overhead for encoding and decoding.
Everyone was looking back for the binary data exchange to leverage
the benefits of speed, memory footprint, storage size etc. The difference from olden
days was that the community was looking for a binary format, which multiple
languages could exchange to avoid technology lock-in. Let’s first see few of
the binary data serialization formats;
1.
Apache Thrift
Thrift is an interface definition language that is used to
define and create services for numerous languages. It is used as a remote
procedure call (RPC) framework and was developed at Facebook. Thrift’s goal is
“to enable efficient and reliable communication across programming languages”.
Solving many aspects of cross-platform services, it generates RPC code for
clients and servers, providing a compact, deterministic, and versionable
interchange protocol. Thrift is based on the RPC style architecture with
binary data exchange format. So thrift is a complete package with a web service
architecture shift t RPC and binary encoding (ThriftBinaryProtocol and ThriftCompactPotocol)
advantage.
2.
Protocol Buffers
Protocol Buffers is an encoding format by Google. Both
Protocol Buffer and Thrift came about the same time and not surprisingly are
very similar. Protocol Buffers (which has only one binary encoding format) does
the bit packing slightly differently, but is otherwise very similar to Thrift’s
CompactProtocol.
3.
Apache Avro
Avro is a row-oriented remote procedure call and data
serialization framework developed within Apache's Hadoop project. It uses JSON
for defining data types and protocols, and serializes data in a compact binary
format. Avro is the one of the most compact binary encoding format because encoding
simply consists of values concatenated together. A string is just a length
prefix followed by UTF-8 bytes, but there’s nothing in the encoded data that
tells you that it is a string. To parse the binary data, you go through the fields
in the order that they appear in the schema and use the schema to tell you the
datatype of each field.
Apache Avro is used in Apache Kafka and Apache Hadoop. If
you see, both systems are heavy traffic and heavy volume systems.
4.
BSON
BSON is a computer data interchange format used mainly as a
data storage and network transfer format in the MongoDB database. It is a
binary form for representing simple data structures and associative arrays
(called objects or documents in MongoDB). BSON has a huge number of
implementations. Compared to JSON, BSON is designed to be efficient both in
storage space and scan-speed. The key advantage is its traversability, which
makes it suitable for storage purposes, but comes at the cost of over-the-wire
encoding size
5.
MessagePack
MessagePack is a compact binary representation of JSON. Compared
to BSON, MessagePack is more space-efficient. BSON is designed for fast
in-memory manipulation, whereas MessagePack is designed for efficient
transmission over the wire. The Protocol Buffers format aims to be compact and
is on par with MessagePack. However, while JSON and MessagePack aim to
serialize arbitrary data structures with type tags, Protocol Buffers require a
schema to define the data types.
Conclusion
So conclusion of the long story is that we have gone through
evolutions and made choices and evolved in what makes best sense for the
Solution Architectures. Since past few years with the evolution of Deployment
Architecture (Cloud and Containerization), Application Architecture
(Microsevices) and API Design Architecture (REST to RPC) we need to think
through the choice of data interchange encoding than being over obsessed with
JSON.
Comments
Post a Comment