Sunday 12 June 2016

Writing a parser for XML

XML stands for Extensible Markup Language. It is used to tag data for the purpose of making it (data) easier to be learnt and used by a variety of software. But I guess you knew that already. So let me go directly to things that are less obvious. 

Yesterday while being there for a presentation on XML, I engaged in a oddly spirited discussion on why is it that one should learn and then use XML when there are many RDBMS packages available that seem to do the same job. Whether comparing the two is fair or even relevant is a whole other issue. But during the discussion, I found my self automatically inclined towards XML (without any solid reason). And in order to have some leverage over the others in the discussion, I uttered, rather nonchalantly, that XML is easier to parse. If that were true then surely a point goes to XML over RDBMS (again, I am not at all qualified to make this comparison). 

Today, following the presentation, I took upon my self to write a parser for XML. My motivations for this undertaking are two fold:
1)  If I am able to do it, then that will surely mean that it is easy.
2) Even if it turns out to be easy, no one needs to know that. And then it can serve as a great endorsement for my false prowess in computer science. 

Before getting into the details (in subsequent posts) of how I did it (am doing it), I would like to state once and for all, the following about the scope and ambition of my XML parser:

A) Competence and performance wise, it is very modest. No fancy algorithms. Pure and simple hacks in the name of parsing based entirely on my understanding of the matter.
B) It is not complete. At least not as of this moment. This incompleteness is not merely a choice but also a simple implication that writing a full parser for XML would require complete knowledge of it and rigorous coding and testing periods. All of which I intend not to promise. 

I feel like I have provided some essential introduction in this post. You can check out the code at the following address:


Do look for other posts on this very blog to completely understand the code and then may be write one such parser for your own amusement.

No comments:

Post a Comment