September 14, 2018 by Roberto Di Remigio and Radovan Bast
Redesigning Getkw _or_ reinventing the wheel: 0. State of affairs and requirements
This is part 0 of a series of posts on designing a general input parsing library.
By general input parsing library we mean that the format of
the input is not fixed by the library itself.
In this post we will cover the state of affairs and lay out in detail the
requirements for our new library.
State of affairs
The Getkw library was designed with the goal of being general.
The principle of operation of Getkw is:
- Give me an input file.
- I’ll parse it and check whether it’s a legal input.
- If it is not legal, stop.
- If input is legal, I’ll dump a machine-readable file to disk. The file has all
possible input parameters set to a value, whether at their default value or the value passed
by the user.
- A reader for your language of choice (currently C, C++, and Fortran) will
gobble the machine-readable file et voilà you have your input parameters.
The structure of the input revolves around the concept of keywords and sections.
Keywords are the basic entity, sections collect keywords, or other
sections. Loosely speaking, keywords are the bottom layer, while sections can be
recursive. We’ll detail this point later.
Each section gets a callback function to perform checks on its contents when the
section is parsed.
Strengths
- Invalid inputs are blocked before CPU cycles are consumed.
- No regexes are used, rather the input is treated as a formal grammar with its
Backus-Naur form.
Weaknesses
- Getkw is:
- Combination of Python module and a compiled language.
The Python module uses
pyparsing.py
to define the grammar for all
valid input files and performs the actual parsing.
- A wrapper in a compiled language. This means dependencies! The C++ layer
depends on Boost Any, which is header-only, but pulls in a lot of other
stuff from Boost.
- Performing checks across sections can be cumbersome.
- The Backus-Naur form is defined in the
getkw.py
Python module and is fixed once and for all.
This falls short of the aim of generality, since it fixes the format of the input.
Requirements for a Getkw for the future
These are the requirements for the new Getkw:
- It has to be a lean dependency.
- Usable from codes in any language.
- The input is structured as sections and keywords.
- Sections can be nested.
- Possibility to easily bypass parsing the file. The data structure holding the
parsed input data should be easily reproducible programmatically, without
having to parse a file.
- Input parser provides a validator mode. This will just check internal
consistency of the input given by the user. This is guaranteed by defining a formal grammar.
- Validation is possible not only within a section, but across the entire tree/dictionary/form, without too
much coupling.
- Make the grammar customisable.
- Self-documenting. Adding new sections and keywords will require documentation
strings to work and there has to be a tool that can parse these and output
documentation in some standard format.
- Composable. New sections/keywords should be addable to the parsers in a decentralized manner.