Charming Python: Parsing with the SimpleParse module
Tutorial Description - Many parsing tools have been written for Python. This column discusses a high-level parsing language built on top of Python. SimpleParse provides an EBNF-style syntax on top of mxTextTools that can greatly clarify the expression of grammars.
Like most programmers, I have frequently needed to identify parts and structures that exist inside textual documents: log files, configuration files, delimited data, and more free-form (but still semi-structured) report formats. All of these documents have their own "little languages" for what can occur within them.
The way I have programmed these informal parsing tasks has always been somewhat of a hodgepodge of custom state-machines, regular expressions, and context-driven string tests. The pattern in these programs was always, roughly, "read a bit of text, figure out if we can make something of it, maybe read a bit more text afterwards, keep trying."
Parsers of the formal variety distill descriptions of the parts and structures in documents into concise, clear, and declarative rules for how to identify what makes up a document. The declarative aspect is particularly interesting here. All my old ad hoc parsers were imperative in flavor: read some characters, make some decisions, accumulate some variables, rinse, repeat. As this column's installments on functional programming have observed, the recipe style of program flow is comparatively error-prone and difficult to maintain.