Monday 13 June 2016

Writing a parser for XML 2

Any good software must have a defined list of requirements. Requirements that the software needs to fulfill. And by inheritance, any good program must have that virtue as well. Also this requirement set should be ready and locked before any development begins. Although that was not the opted strategy for this program but still I would like to start by specifying functions my parser needs to perform:

1) Check to see if the XML file is valid.
2) Parse and store details about the elements in the XML file and file it self in the program. 
3) Use this stored information to display the contents in a well formatted form.

In this post, I'll be writing how I achieved the first of the above three points i.e. checking the validity of the document.

Checking for the validity of the file:

Before checking the body of document, we must see if it contains a valid header or not. This is easy enough as the header contains only one line that two with a fixed syntax. The major part of this validation consists of validating the body of the document.
To do this check I took advantage of the fact that XML does not allow any leniency in the ordering of the tags. If a tag opens in XML then it can be followed only by it's child elements, data inside the tag or the closing tag it self with the same name. 
Also since the names of the tag can be anything, the question of matching to see if the tags used mean anything to XML goes out the window. For all my program cares, the tags can be any string without any spaces between them.
Now, to check the validity, all I have to see is whether the tag names match and is the order of opening and closing of the tags correct.
This is rather easy. All one needs is a stack. So create one for yourself or use an inbuilt one. 
At the checking stage we do not care about the data contained in the tags. All we need is some come that will recognize all tag openings and closings in the exact order as they appear in the input file. Everything else is to be ignored at this stage. Once we have this, follow along the following algorithm:

1) If the encountered tag is an opening tag:
2)       Push the tag name onto the stack.
3) If the encountered tag is a closing tag:
4)       Pop a tag name from the stack.
5)       Compare the two, the popped tag and the closing one.
6)       If they don't match:
7)              Document is INVALID
8)              exit.
9) Repeat till end of the document is reached.  

This simple algorithm does the job just fine. Some exceptions of course need to be handled for if encountered. For example, there must be some restrictions on what a tag name could be. That must also be taken care of in this algorithm it self. 


The XML Parser


The checkElements() function (look for it in the above link) is used for this job.

No comments:

Post a Comment