Beyond JSON – Alternates

 Previous - Beyond JSON – The Dominance

When you want to write data to a file or send it over the network, you have to encode it as some kind of self-contained sequence of bytes (for example, a JSON document). Since a pointer would not make sense to any other process, this sequence-of-bytes representation looks quite different from the data structures that are normally used in memory.

Thus, we need some kind of translation between the two representations. The translation from the in-memory representation to a byte sequence is called encoding (also known as serialization or marshalling), and the reverse is called decoding (parsing, deserialization, unmarshalling).

Most of the programing languages have built in encoding libraries for encoding in-memory objects (JAVA:java.io.serializable, Ruby:Marshal etc.). But the issue with them is that the encoding is language specific. It’s generally a bad idea to use your language’s built-in encoding for anything other than very transient purposes.

Moving to standardized encodings that can be written and read by many programming languages, JSON and XML are the obvious contenders. JSON is less verbose than XML, but both still use a lot of space compared to binary formats. This observation led to the development of binary encodings for JSON (MessagePack, BSON, BJSON, etc.). For data that is used only internally within your organization, you could choose a format that is more compact or faster to parse.

In 2005, a new variant of SOA was coined, Microservice. By 2012, there were people adopting and experimenting with it. By 2015, the architecture style gained the momentum. This is the biggest change, which has prompted people to look for the alternatives to JSON. The reason for the alternate are;

1.       Every Microservice should be capable of been deployed in a separate memory space.

2.       Whole application was divided into small functions interacting with each other over network.

3.       We needed more efficient way for data exchange as JSON still has a higher overhead for encoding and decoding.

Everyone was looking back for the binary data exchange to leverage the benefits of speed, memory footprint, storage size etc. The difference from olden days was that the community was looking for a binary format, which multiple languages could exchange to avoid technology lock-in. Let’s first see few of the binary data serialization formats;

1.       Apache Thrift

Thrift is an interface definition language that is used to define and create services for numerous languages. It is used as a remote procedure call (RPC) framework and was developed at Facebook. Thrift’s goal is “to enable efficient and reliable communication across programming languages”. Solving many aspects of cross-platform services, it generates RPC code for clients and servers, providing a compact, deterministic, and versionable interchange protocol. Thrift is based on the RPC style architecture with binary data exchange format. So thrift is a complete package with a web service architecture shift t RPC and binary encoding (ThriftBinaryProtocol and ThriftCompactPotocol) advantage.

2.       Protocol Buffers

Protocol Buffers is an encoding format by Google. Both Protocol Buffer and Thrift came about the same time and not surprisingly are very similar. Protocol Buffers (which has only one binary encoding format) does the bit packing slightly differently, but is otherwise very similar to Thrift’s CompactProtocol.

3.       Apache Avro

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Avro is the one of the most compact binary encoding format because encoding simply consists of values concatenated together. A string is just a length prefix followed by UTF-8 bytes, but there’s nothing in the encoded data that tells you that it is a string. To parse the binary data, you go through the fields in the order that they appear in the schema and use the schema to tell you the datatype of each field.

Apache Avro is used in Apache Kafka and Apache Hadoop. If you see, both systems are heavy traffic and heavy volume systems.

4.       BSON

BSON is a computer data interchange format used mainly as a data storage and network transfer format in the MongoDB database. It is a binary form for representing simple data structures and associative arrays (called objects or documents in MongoDB). BSON has a huge number of implementations. Compared to JSON, BSON is designed to be efficient both in storage space and scan-speed. The key advantage is its traversability, which makes it suitable for storage purposes, but comes at the cost of over-the-wire encoding size

5.       MessagePack

MessagePack is a compact binary representation of JSON. Compared to BSON, MessagePack is more space-efficient. BSON is designed for fast in-memory manipulation, whereas MessagePack is designed for efficient transmission over the wire. The Protocol Buffers format aims to be compact and is on par with MessagePack. However, while JSON and MessagePack aim to serialize arbitrary data structures with type tags, Protocol Buffers require a schema to define the data types.

Conclusion

So conclusion of the long story is that we have gone through evolutions and made choices and evolved in what makes best sense for the Solution Architectures. Since past few years with the evolution of Deployment Architecture (Cloud and Containerization), Application Architecture (Microsevices) and API Design Architecture (REST to RPC) we need to think through the choice of data interchange encoding than being over obsessed with JSON.


Comments

Popular posts from this blog

Hibernate: a different object with the same identifier value was already associated with the session

BeanDefinitionStoreException: Failed to parse configuration class: Could not find class [javax.jms.ConnectionFactory]