A parser for C# using flex/bison
James Power
Department of Computer Science,
National University of Ireland, Maynooth.
As part of a separate project I knocked together a flex/bison scanner
and parser for C#. I really just wanted a clean C# grammar, but the
one in the spec. had a lot of duplication.
I worked off version 0.28 of the C# Language Specification (of
5/7/2001).
I've tested the scanner/parser on about 3,000 C# programs, and
it seems to be working ok. In all cases, when resolving conflicts in
the grammar I've gone for the "most general" approach - thus, the
parser should accept all valid programs, but will also allow some
invalid ones as well.
There are a few points you should note:
- There's no preprocessor - I just ignore all lines starting with a
"#". This could mean that some technically correct programs get
rejected, but I think this is reasonably unlikely.
- The bison grammar has one s/r conflict - the dangling else. It
also has one r/r conflict related to the ambiguity between types and
expressions, but I don't think this is a problem.
- It's very basic - there's no fancy error handling or anything nice like
that
- As regards the disambiguation - you should read chapter 19 of the
Java Language
Specification, as the comments they make are relevant to C# too.
- I had lots of problems trying to disambiguate rank specifiers
from array element access. In the end, I just make rank specifiers
into a token, rather than suffer any more rewriting of the grammar
rules. This could cause problems if there's something unexpected
(like a comment) in the middle of a rank specifier.
- The lexical specification talks about UNICODE characters - I
haven't done anything about this.
C# has what seem to be "context sensitive keywords" - that is, words
which can be either keywords or identifiers depending on where they
occur in the program. It seems a particularly silly thing to design
into a language. Anyway, I've put in some flex states to deal with
this.
You might also like to check out the mono C# compiler. This
has a parser for a bison-like tool (that generates C#), along with
lots of C# code. (I'm not connected with the mono project in any way).
I presented a paper about the parser's design as
Applying Software Engineering Techniques to Parser
Design at the
Conference of the South African Institute of Computer Scientists and
Information Technologists, in
Port Elizabeth, South Africa, September 16-18 2002.
If you find any bugs in this, or thing there's something else I should
mention here, let me know.
Download
The scanner is csharp-lex.l, and the parser
is in csharp.y. There's just one
(non-generated) header file, lex.yy.h needed to
tie these together.
To make an application out of this, here's a minimal program main.c, and a Makefile.
Or you can get all of these files as a single .zip file: csharp.zip.
I made these using flex version 2.5.4 and bison version 1.28, but I
don't think there's anything too special in there. I do use exclusive
start conditions as well as start-condition stacks in the lexer -
these may not be available in all flex/lex variants.
James Power
Last Modified: 19 Dec 2004