Parser with LEX and YACC

Related searches

i'm trying to implement a time parser with LEX & YACC. I'm a complete newbie to those tools and C programming.

The program has to print a message (Valid time format 1: input ) when one of those formats is entered: 4pm, 7:38pm, 23:42, 3:16, 3:16am, otherwise a "Invalid character" message is printed.

lex file time.l :

%{
#include <stdio.h>
#include "y.tab.h"
%}

%%

[0-9]+                {yylval=atoi(yytext); return digit;}
"am"                   { return am;}
"pm"                   { return pm;}
[ \t\n]               ;
[:]                    { return colon;}
.                     { printf ("Invalid character\n");}

%%

yacc file time.y:

%{
void yyerror (char *s);
int yylex();
#include <stdio.h>
#include <string.h>

%}

%start time
%token digit
%token am
%token pm
%token colon

%%

time        :  hour ampm           {printf ("Valid time format 1 : %s%s\n ", $1, $2);}
            |  hour colon minute   {printf ("Valid time format 2 : %s:%s\n",$1, $3);}
            |  hour colon minute ampm {printf ("Valid time format 3 : %s:%s%s\n",$1, $3, $4); }
            ;

ampm        :   am               {$$ = "am";}
            |   pm               {$$ = "pm";}
            ;

hour        :   digit digit             {$$ = $1 * 10 + $2;}
            |   digit             { $$ = $1;}
            ;

minute      :   digit digit         {$$ =  $1 * 10 + $2;} 
            ;

%%
int yywrap()
{
        return 1;
} 

int main (void) {

  return yyparse();
}

void yyerror (char *s) {fprintf (stderr, "%s\n", s);}

compiling with this command:

yacc -d time.y && lex time.l && cc lex.yy.c y.tab.c -o time

I'm getting some warnings:

time.y:17:47: warning: format specifies type 'char *' but the argument has type
      'YYSTYPE' (aka 'int') [-Wformat]
    {printf ("Valid time format 1 : %s%s\n ", (yyvsp[(1) - (2)]), (yyvsp.

This warning appears for all the variables in printf statements. The values are all char, because even the number in the time string is converted with the atoi function.

Executing the program with a valid input throws this error:

./time

1pm

[1]    2141 segmentation fault  ./time

Can someone help me? Thanks in advance.


This (f)lex rule:

[0-9]+                {yylval=atoi(yytext); return digit;}

recognizes any integer, not just a digit. (It allows leading zeros, which is probably appropriate for a date parser.) It assumes that yylval is an int, which is the case if you don't do something to declare the type of yylval.

Meanwhile, this (f)lex rule:

"am"                 { return am;}

recognizes the token am, but does not set the value of yylval.

Now, in your bison file, you have:

hour        :   digit digit       { $$ = $1 * 10 + $2; }
            |   digit             { $$ = $1;}
            ;

Since digit actually represents an entire integer, the digit digit production is incorrect. It would recognize, for example, the input 23 75 (since your flex file ignores whitespace), but it would turn that into the value 305 (10*23 + 75). That hardly seems appropriate. Again, it assumes that the type of the semantic values $$ and $1 is int, which is the default case.

However, the production:

ampm        :   am               {$$ = "am";}
            |   pm               {$$ = "pm";}
            ;

requires that the type of the result semantic value be char * (or even const char*). Since you have not done anything to declare the type of semantic values, their type is int and the assignment is just as invalid as would be the C statement:

int ampm = "am";

So the C compiler issues an error message.

Furthermore, in your production:

time        :  hour ampm           {printf ("Valid time format 1 : %s%s\n ", $1, $2);}

you assume that the semantic values $1 and $2 are strings (char*). BUt the values are actually integers, so printf will do something undefined and probably disastrous (in this case, segfault). (Because of the nature of C this is not a compile-time error, but most C compilers will issue a warning. Apparently, your C compiler does so.)

How this should be fixed depends on your interpretation of the assignment. When it says "print a message (Valid time format 1: input )", does it mean that the literal input string should be printed, or is it ok to print an interpretation of the string? That is, given actual inputs

8:23am
08:23am

Would you want the messages to be

Valid time format 1: 8:23am
Valid time format 1: 08:23am

Or is it appropriate to normalize:

Valid time format 1: 8:23am
Valid time format 1: 8:23am

You should (re-)read the section in the bison manual on semantic types, and then decide whether you want the type to be int, char*, or a union of the two.

Some other things you need to think about:

  1. Your flex file recognizes any integer, but neither hours nor minutes can be arbitrary integers. Both are limited to two digits; normally, the minutes should always be two digits (so that 9:3am is not a way of writing 9:03am). They both have limited ranges of valid values; minutes must be between 00 and 59, while hours is between 1 and 12 if am or pm is specified, and otherwise between 0 and 23. Or perhaps 24. (Actually, there are lots of different possible validity conventions for hours; you might choose to be flexible or strict.)

  2. Your problem description doesn't appear to allow spaces in the time specifications, but your flex file ignores whitespace. So that might lead you to recognize incorrect inputs (depending, again, on how strict you wish to be). Also see the note about output in this case: does the whitespace appear in the output (assuming it is acceptable)?

  3. Your flex file issues an error message when it sees a character it doesn't recognize, but it does not stop lexing. In effect, that means that illegal characters will be dropped from the input stream, so that an input like:

    1;:17rpm
    

    will result in two illegal character messages followed by a message saying that the input was a valid 1:17pm. That is unlikely to be what you wanted.

As a final note, I have to say that in my opinion, understanding C is an absolute prerequisite to using flex and bison. Trying to teach all three at the same time strikes me as pedagogically suspect.

Lex and YACC primer/HOWTO: YACC, A note on grammars and parsers. When YACC saw the light of day, the tool was used to parse input files for compilers: programs. Programs written in a� i'm trying to implement a time parser with LEX & YACC. I'm a complete newbie to those tools and C programming. The program has to print a message (Valid time format 1: input) when one of those formats is entered: 4pm, 7:38pm, 23:42, 3:16, 3:16am, otherwise a "Invalid character" message is printed. lex file time.l:


The error message

time.y:17:47: warning: format specifies type 'char *' but the argument has type
  'YYSTYPE' (aka 'int') [-Wformat]

for that line for example

printf ("Valid time format 1 : %s%s\n ", $1, $2);

says that you specified an %s (which is a C-style string of type char *) but effectively the argument is of type YYSTYPE (which seems to be an integer type).

Yacc, Wish to write a parser to parse a text file without the knowledge of lex/yacc, grammars,regular expressions etc. This documentation assumes familiarity with lex - and yacc -style lexer and parser generators. This is a fork of the parser-tools package. It has a variety of small improvements and bugfixes designed to support the brag parser language, in particular the srcloc structure type (e.g., lexer-srcloc ). But the core lexing and parsing engines are identical.


As @Elyasin pointed out, the error message you're given is telling you exactly what's wrong- YYSTYPE defaults to being an int but you're attempting to use it as a string (this is on every line you get the error). Furthermore you're attempting to use it as an int in some places and a string in others, which is obviously incorrect.

What you can do is create a string to hold your input as you go and concatenate into that. You can do this with a variable in your initial yacc block, so something like this:

%{
void yyerror (char *s);
int yylex();
#include <stdio.h>
#include <string.h>

char time_str[15];
%}

time_str is now available throughout your parser steps, so you can copy into that, then in your final step you can just print out the built up string, like

printf ("Valid time format 1 : %s", timestr);

How to write a Parser / translator using Lex /Yacc, A parser generator is a program that takes as input a specification of a syntax YACC was originally designed for being complemented by Lex. Although Lex and YACC predate C++, it is possible to generate a C++ parser. that, as YACC doesn't know how to deal with it directly. My preferred way to make a C++ parser is to have Lex generate a plain C file, and to let YACC generate C++ code. When you then link your


I've solved the warnings defining a char array for am and pm values and treated as int the YYSTYPE variables (as suggested).

I've also added cases for empty lines, comma separation after each input, validation for hours and minutes, exit command:

%{
void yyerror (char *s);
int yylex();
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char ampm_str[15] = "";

typedef int bool;
bool validFormat = 1;
%}

%start input
%token digit
%token am
%token pm
%token colon
%token sep
%token exit_command

%%
input       : /* empty */
            | input line 
            ;

line        : '\n'
            | list '\n' 
            ;

list        : time
            | time sep list 
            | exit_command  {exit(EXIT_SUCCESS);}
            ;


time        :  hour ampm                {if ($1 > 12 || $1 <= 0)  {printf ("Hour out of range\n");validFormat = 0;} else if(validFormat) {printf("Valid time format %d%s\n", $1, ampm_str); } validFormat = 1;}
            |  hour colon minute        {if ($1 > 24 || $1 <= 0)  {printf ("Hour out of range\n");validFormat = 0;} else if(validFormat) {printf("Valid time format   %d:%d\n", $1, $3); } validFormat = 1;}
            |  hour colon minute ampm   {if ($1 > 12 || $1 <= 0)  {printf ("Hour out of range\n");validFormat = 0;} else if(validFormat) {printf ("Valid time format   %d:%d%s\n", $1, $3, ampm_str); } validFormat = 1;}
            ;


hour        :   two_digits        { $$ = $1; }
            |   digit             { $$ = $1; }
            ;

minute      :   two_digits          { $$ = $1; if ($$ > 59) {printf ( "minute out of range\n");validFormat = 0;}}
            |   digit               { $$ = $1; if ($$ > 59) {printf ( "minute out of range\n");validFormat = 0;}}
            ;

two_digits  :  digit digit          {$$ = 0; $$ = $1 * 10 + $2; }
            ;

ampm        :   am               {strcpy(ampm_str, "am");}
            |   pm               {strcpy(ampm_str, "pm");}
            ;


%%
int yywrap()
{
        return 1;
} 

int main (void) {
printf ("Insert time, and press enter\n");
printf ("Type , after each time\n");
printf ("Valid formats : 2am, 12:00, 13:30pm\n");
printf ("exit to quit\n");

  return yyparse();
}


void yyerror (char *s) {fprintf (stderr, "Invalid character: %s\n", s); validFormat = 0;}

Introduction to YACC, yacc generates parsers, programs that analyze input to insure that it is syntactically correct. lex and yacc often work well together for developing compilers. As� In other words, Lex is a lexical analyzer, and Yacc is a parser. Thus, the main difference between Lex and Yacc is that Lex is a lexical analyzer which converts the source program into meaningful tokens while Yacc is a parser that generates a parse tree from the tokens generated by Lex. References: 1.“Lex (Software).”


For byte parsing in lex file 0x[0-9a-f]{8} { yylval.number = strtoll(yytext+2, NULL, 16); return BYTE_4; } In yacc file You need to declear this number as part of union.

Using lex and yacc Together, Lex and Yacc were the first popular and efficient lexers and parsers generators, flex and Bison were the first widespread open-source versions compatible with the original software. Each of these software has more than 30 years of history, which is an achievement in itself.


If you use the -d flag with the yacc command, the yacc program generates that file from the yacc grammar file information. The y.tab.h file contains definitions for the tokens that the parser program uses. In addition, the calc.lex file contains the rules to generate these tokens from the input stream. The following are the contents of the calc.lex file.


Availability of Lex and Yacc Lex and yacc were both developed at Bell Laboratories in the 1970s. Yacc was the first of the two, developed by Stephen C. Johnson. Lex was designed by Mike Lesk and Eric Schmidt to work with yacc. Both lex and yacc have been standard UNIX utilities since 7th Edition UNIX.


YACC (yet another compiler-compiler) is an LALR (1) (LookAhead, Left-to-right, Rightmost derivation producer with 1 lookahead token) parser generator. YACC was originally designed for being complemented by Lex.