A thermos is an under-appreciated
invention. The entirety of its concept is both simple and beautiful. Its job is
to keep things as they are. The biggest challenge a thermos (or anyone for that
matter) has to face is not to submit to the test of time. The heat quotient (temperature)
of the thing contained by a thermos must remain unchanged (or changed
insignificantly) for a given amount of time. What is this given amount of time
you ask? Well the time for which we need the thing inside our thermos. If I put
boiling water in a thermos which I’ll be needing in an hour, then that water
must retain its fire or else what good the thermos does me. The same must be
true for this thermos when the occasion is different and the water is freezing.
The point is that the thermos
must preserve its contents for a certain amount of time (subject to the thermos’
capability).
Does that remind you of anything
related to computers?
Memory. But for the purpose of
this post, Random Access Memory. Replace in the above line the word “thermos”
with “RAM” and see if the fit isn’t perfect.
Where do we go from here? Well,
what about the fundamental limitation of a thermos?
A thermos cannot contain hot and
cold things at the same time. The water inside of the thermos can either be
cold or hot, never both.
Now can I find a similar behavioral
pattern for RAMs? Of course I can. When we declare a variable in a program, a
place is reserved in memory and is marked with the name of the variable. This
place corresponds to our thermos and the contents of this thermos are the bits
representing the value given to the variable by the programmer. This variable (which
is nothing but a bunch of bits) can be classified into different types just
like water (a bunch of molecules in liquid form) can be classified based on its
temperature. Some of these types for a variable are int, char, float, double,
etc. With this in mind, let me rewrite a line from before with certain
replacements:
A place in memory cannot contain
int and char (and float and double..) at the same time. The bits in this memory
can either be an int or a char (or a float or a double..), never both (all).
Did that make sense? It did to me
when I was new to programming. I knew for a fact that every variable must have
a place in memory exclusively for itself. But then I read about unions.
And of course I read about them being contingent to C++.
Unions (in C++) arrange a block
of memory. This block of memory is used by the members of this union, all at
once. Members of a union are your simple variables with normal names and
liberal types. As an example, consider the following union:
union U
{
int
i;
char
c;
float
f;
};
U obj; //An object of the union U
First thing union U is going to
need is enough place in memory that can contain the largest of its members, in
this case that corresponds to float f. So U will reserve 4 bytes (the size of a
float). Now assume these 4 bytes are somehow filled with some bits. The question
is that what does these bits mean? The answer is that they can mean anything
depending on what the programmer want them to mean. If the programmer wishes to
use this union as a float then he/she could do that using obj.f and the bits
would behave like a float. If the programmer wants this union to behave like an
int then using obj.i will force a portion of the whole union (just 2 out of 4 bytes)
to behave like an int. And finally, if a character is needed, then obj.c would
do that by taking only 1 byte of the union and casting the bits to yield a
char.
You see, bits are bits. There is
no such thing as a bit being hot or cold. The same bits make an integer and the
same bits make a character. It’s simply the choice of the programmer to use
these bits a certain way. And that is exactly what type specifications are for,
telling the compiler the way in which a group of bits are to be treated. Unions
take advantage of this.
Anonymous Unions:
In the above example I used a
union to create a class of sorts (U). Using this I was able to create an object
of this union. This is useful when the program requires you to use a union
multiple times in a program under different cases. But when all you need is a
group of variables sharing the same memory location (the possible reasons for
which I’ll discuss later), then you can choose not to name your union. This way
you will be saved from first having an object of the union and then using the members
via the member operator: dot. These are called anonymous unions. Here’s an
example to clarify things syntactically:
union
{
int
I;
char
c;
float
f;
};
Now you are free to use the
variables i, c and f as you would use any normal variable. But in the
background, these three will be sharing the same memory place.
Need:
Well first of all, it saves
memory. Nah, that’s not it. Saving a few bytes don’t matter much. The real
essence of unions lie in something else, something quite rare.
Humor me for a moment. Imagine
your job is picking people at the airport, people you have never met before. My
question is what size car you’d take with you to carry them? It is possible
that on some days you’ll get a really thin person but on others you can very
well end up with a huge one. The answer is a car which is big enough to hold
any size of a person imaginable. And that is exactly how a union is to be used
in a computer program. When you are not sure what type of data you’re going to
get (from a file or some other source of input), simply use a union to enable
different treatments of the same memory location. A good example can be found
in Flex and Bison, but it’ll be cruel to force that on you just
yet. May be later.
No comments:
Post a Comment