Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Employee - Parsing into structs

It's a common question in the Spirit Mailing List: How do I parse and place the results into a C++ struct? Of course, at this point, you already know various ways to do it, using semantic actions. There are many ways to skin a cat. Spirit X3, being fully attributed, makes it even easier. The next example demonstrates some features of Spirit X3 that make this easy. In the process, you'll learn about:

First, let's create a struct representing an employee:

namespace client { namespace ast
{
    struct employee
    {
        int age;
        std::string surname;
        std::string forename;
        double salary;
    };
}}

Then, we need to tell Boost.Fusion about our employee struct to make it a first-class fusion citizen that the grammar can utilize. If you don't know fusion yet, it is a Boost library for working with heterogeneous collections of data, commonly referred to as tuples. Spirit uses fusion extensively as part of its infrastructure.

In fusion's view, a struct is just a form of a tuple. You can adapt any struct to be a fully conforming fusion tuple:

BOOST_FUSION_ADAPT_STRUCT(
    client::ast::employee,
    (int, age)
    (std::string, surname)
    (std::string, forename)
    (double, salary)
)

Now we'll write a parser for our employee. Inputs will be of the form:

employee{ age, "surname", "forename", salary }

Here goes:

namespace parser
{
    namespace x3 = boost::spirit::x3;
    namespace ascii = boost::spirit::x3::ascii;

    using x3::int_;
    using x3::lit;
    using x3::double_;
    using x3::lexeme;
    using ascii::char_;

    x3::rule<class employee, ast::employee> const employee = "employee";

    auto const quoted_string = lexeme['"' >> +(char_ - '"') >> '"'];

    auto const employee_def =
        lit("employee")
        >> '{'
        >>  int_ >> ','
        >>  quoted_string >> ','
        >>  quoted_string >> ','
        >>  double_
        >>  '}'
        ;

    BOOST_SPIRIT_DEFINE(employee);
}

The full cpp file for this example can be found here: ../../../example/x3/employee.cpp

Let's walk through this one step at a time (not necessarily from top to bottom).

Rule Declaration
x3::rule<class employee, ast::employee> employee("employee");
Lexeme
lexeme['"' >> +(char_ - '"') >> '"'];

lexeme inhibits space skipping from the open brace to the closing brace. The expression parses quoted strings.

+(char_ - '"')

parses one or more chars, except the double quote. It stops when it sees a double quote.

Difference

The expression:

a - b

parses a but not b. Its attribute is just A; the attribute of a. b's attribute is ignored. Hence, the attribute of:

char_ - '"'

is just char.

Plus
+a

is similar to Kleene star. Rather than match everything, +a matches one or more. Like it's related function, the Kleene star, its attribute is a std::vector<A> where A is the attribute of a. So, putting all these together, the attribute of

+(char_ - '"')

is then:

std::vector<char>
Sequence Attribute

Now what's the attribute of

'"' >> +(char_ - '"') >> '"'

?

Well, typically, the attribute of:

a >> b >> c

is:

fusion::vector<A, B, C>

where A is the attribute of a, B is the attribute of b and C is the attribute of c. What is fusion::vector? - a tuple.

[Note] Note

If you don't know what I am talking about, see: Fusion Vector. It might be a good idea to have a look into Boost.Fusion at this point. You'll definitely see more of it in the coming pages.

Attribute Collapsing

Some parsers, especially those very little literal parsers you see, like '"', do not have attributes.

Nodes without attributes are disregarded. In a sequence, like above, all nodes with no attributes are filtered out of the fusion::vector. So, since '"' has no attribute, and +(char_ - '"') has a std::vector<char> attribute, the whole expression's attribute should have been:

fusion::vector<std::vector<char> >

But wait, there's one more collapsing rule: If the attribute is followed by a single element fusion::vector, The element is stripped naked from its container. To make a long story short, the attribute of the expression:

'"' >> +(char_ - '"') >> '"'

is:

std::vector<char>
Rule Definition
employee =
    lit("employee")
    >> '{'
    >>  int_ >> ','
    >>  quoted_string >> ','
    >>  quoted_string >> ','
    >>  double_
    >>  '}'
    ;

BOOST_SPIRIT_DEFINE(employee);

Applying our collapsing rules above, the RHS has an attribute of:

fusion::vector<int, std::string, std::string, double>

These nodes do not have an attribute:

[Note] Note

In case you are wondering, lit("employee") is the same as "employee". We had to wrap it inside lit because immediately after it is >> '{'. You can't right-shift a char[] and a char - you know, C++ syntax rules.

Recall that the attribute of parser::employee is the ast::employee struct.

Now everything is clear, right? The struct employee IS compatible with fusion::vector<int, std::string, std::string, double>. So, the RHS of start uses start's attribute (a struct employee) in-situ when it does its work.


PrevUpHomeNext