Monday, 20 June 2016

Spaces and Tabs in Flex

As I have mentioned in previous blog posts that Flex is a scanner. A scanner that looks for patterns defined/ desired by the programer. The whole thing begins with somethings known as the regular expressions

Regular expressions provide a way of describing textual patterns to the computer or a programming language. They form the basis of many advanced concepts in computer science, with pattern matching it self being one of them. Learning these regular expressions is absolutely crucial if one wishes to learn Flex or any other related scanner. 

In Flex, we have to  specify a regular expression. This tells flex what it should be looking for in the input text. And with each such regular expression we also have to specify some action code. This action code is what will run if the flex is able to find for the provided regular expression. One can use as many regular expression as one needs. 

One of the most commonly used character in any text file is the space. Every word of every sentence, and then every sentence it self is delimited using a space. So it becomes very important for one to deal with spaces in the input file. In most cases, spaces are to be ignored i.e. action code corresponding to the regular expression describing a space shall do nothing. But sometimes when spaces become important to the grammar of our input then we need to identify spaces, acknowledge them, and write apt action code to deal with them. 

I needed to do the exact same thing while writing up a flex file for a STAAD pro input file under the project SIM. I assumed spaces to work the same way as other characters do and hence I directly wrote a simple expression for identifying one or more consequent occurences of a space as:

[ ]+ 

There is a space inside those square boxes which is the character we are looking for. And the + sign indicates that the flex should expect any of the character inside the square brackets (in this case the space) at least once. 

So far so good. But this did not work out well for me. This regular expression was leading me into trouble. Syntax error after syntax error. And it was only then that I realized that there is something wrong with my regular expression. Actually it was pointed out to me by a friend, Amarjeet Singh. The thing is that spaces in regular expressions are not represented by a space. They want extra treatment. So the right way of doing this is:

[ \t]+

That's right, a space followed by a tab. Now that is something to take a note of.  

No comments:

Post a Comment