Wednesday 20 July 2016

Let's order some bytes to eat

Data. The single most essential element in all of computing. It's what everything is based on. It's what everything is for. Ever wondered how data is represented in a computer? Of course you have. Else why would you be reading this. Computers use binary digits to represent information. Everything is encoded into a sequence of bits and stored in memory. Bits are mostly used in a group of eight. A group of eight bits is called a byte. There you have it. It's settled then.
Coming to the actual point of this post, which has very little to do with what the title is. No, I am not going to order something to eat. Instead I'll  be talking about byte ordering.
Since bytes are a collection of 8 bits, there are two possible ways in which these 8 bits can be arranged (without loosing their sequence and hence their meaning), these are:
  1. Most significant bit first.
  2. Least significant bit first.
These two types are known by their popular names: big endian and little endian. This nomenclature stems from the logic that the most significant bit is one with the highest value among all other bits in a byte and hence is the "big" bit. So if MSB is getting stored first then that's big endian. Little endian also follows similar logic.
As it turns out, none of these two orderings have achieved a monopoly over all the computer machines. Both are used popularly by machines of all sorts and kinds. But within a machine, only one of these two types are employed. This system works well as long computers don't feel the need to interchange data among themselves. But we know data contained by a singular computer constraints it's influence and restricts it's ability. It must be shared. Networking is necessary.
So if computers are to exchange data then their must be a way in which their bytes orders are made to match. Otherwise data on machines following different byte ordering wont mean anything to each other. To make this additional network issue a little less of a concern, the whole world has kind of agreed upon a standard practice which goes like this:

No matter what the byte ordering is on a machine, when data is to be transmitted on to the network from that machine; it must be in big endian format.

In order for this to work, there must be a way for every computer to change the order of bits in a byte suitably by means of nothing more than a simple library function call. And this is the case. Almost all programming languages with networking ambitions offer such abstraction. And even if no such function is provided to you, creating one for yourself is not that hard and is more fun.
Do you realize how easily this resolves the whole issue? Since everything on the network is assured to be in big endian, every machine receiving this data can simply invert the data if it uses little endian internally or otherwise just leave the data be as is. And this my dear reader will preserve the meaning of data communicated over any network between different computers. 

No comments:

Post a Comment