There are a lot of ways to evaluate expressions in infix notation e.g 1 + 2 * 8 - 5. One way is to convert the expression from infix notation to reverse polish notation using Shunting-yard algorithm. Evaluating reverse polish notation is very easy with the help of a stack. Another approach is using an algorithm called precedence climbing. However in this post, I will show you how to build a calculator using Flex and Bison.

Flex is an open source lexer generator and Bison is a parser generator. A lexer splits the text into tokens and a parser builds an abstract syntax tree from the tokens. Hand-writing a parser is not easy. However with the help these two tools, we can build a parser quickly. The job of us is to define the grammar of the language we are going to parse.

First, let’s write the lexer file.

%{
#include "parser.h"
%}

%option noyywrap
%option nodefault

%%
"+"         { return PLUS; }
"-"         { return MINUS; }
"*"         { return TIMES; }
"/"         { return DIVIDE; }
"("         { return LEFT_PAREN; }
")"         { return RIGHT_PAREN; }
[0-9]+      { yylval = atoi(yytext); return NUMBER; }
0x[a-f0-9]+ { yylval = strtol(yytext, NULL, 16); return NUMBER; }
\n          { return NEWLINE; }
[ \t]
.           { return UNKNOWN; };
%%

In the lexer file, we define the patterns for the tokens using regular expressions. The lexer we are going to generate will split the text into +, -, *, /, (, ) and numbers, ignoring whitespaces.

Then, we write the grammar file for Bison.

%{
#include <stdio.h>
#include "lexer.h"
%}

%token NUMBER
%token PLUS
%token MINUS
%token TIMES
%token DIVIDE
%token LEFT_PAREN
%token RIGHT_PAREN
%token NEWLINE
%token UNKNOWN

%%
commands: command
        | commands command

command: NEWLINE
       | expr NEWLINE { printf("= %d\n", $1); }

expr: term
    | expr PLUS term { $$ = $1 + $3; }
    | expr MINUS term { $$ = $1 - $3; }

term: factor
    | term TIMES factor { $$ = $1 * $3; }
    | term DIVIDE factor { $$ = $1 / $3; }

factor: NUMBER
      | LEFT_PAREN expr RIGHT_PAREN { $$ = $2 }
%%

In the first part, we define the tokens we are going to use in our grammar. In the second part, we define the context free grammar for our language. Our language is composed of commands. A command is an expression followed by a newline. An expression is one or more terms connected by the + and - operators. A term is one or more factors. A factor is a number or an expression wrapped in parentheses. In the right side of the rules, we embed C code to compute the value for that node.

Finally, we write a program to invoke the parser.

#include "lexer.h"
#include "parser.h"

int main(int argc, char **argv) {
    yyparse();
}

int yyerror(char *s) {
    fprintf(stderr, "error: %s\n", s);
    return 0;
}