September 25, 2018 by Roberto Di Remigio
This is part 1 of a series of posts on designing a general input parsing library. By general input parsing library we mean that the format of the input is not fixed by the library itself.
In this post, we will describe the data structure we will use for the input.
In the first installment in this series, we said:
The structure of the input revolves around the concept of keywords and sections. Keywords are the basic entity, sections collect keywords, or other sections. Loosely speaking, keywords are the bottom layer, while sections can be recursive. We’ll detail this point later.
It is now time to give details.
We assume that the input is structured as a collection of sections and keywords. Keywords are 2-tuples (pairs) of a key and a corresponding value:
Anticlimactic eh? We could have used a tuple! Or a
namedtuple! And I agree,
but this is our first attempt at implementing a data structure.
In first instance, sections are collections of keywords. Hence a dictionary would suffice to represent them. The keys in the dictionary would be the the key of the keyword, while the value would the keyword itself:
Question 0 Why not just collect keys and values in the section dictionary? Because the keyword data type might be richer than just a simple 2-tuple. It might contain units of measure, for example. Or type information.
However, we soon come up with the brilliant idea that sections might have subsections. And subsections, sub-subsections! Ad libitum! So really the input is a sort of multi-way (or rose) tree. The root is a section, with multiple sections as nodes and keywords as leaves. And so forth.
A possible implementation of section has a name, a dictionary of keywords, and a dictionary of sections:
In addition, we want to serialize to a standard text format. We start with JSON since
Python has a standard library module for it. For the moment we implement a
toJSON class method for both
Section, later we might decide to
do something fancier.
Thus bringing it all together:
where we added a custom
__str__ method to the
sect2 is a subsection of
sect1 but does not have subsections itself.
The corresponding JSON for
sect2 looks as follows:
Finally, the whole input is built simply as yet another section:
and it JSON shows no particular suprises: