Below are materials of recent courses that I taught, for which I created materal that might be interesting to someone.
In addition to the slides, there were exercises and assignments which I am not publishing. If you wish to see those, send me mail. These are slides on the basics of Prolog. Since the course does not only teach concrete programming languages, but also principles of programming languages, there also is a discussion on programming paradigms.
Explanation of tokenizing. The task of the tokenizer is to group the input (a sequence of characters) into separate blocks, and classify these blocks by their type. If a block represents something that is not a string, for example a number, this number is constructed by the tokenizer.
Tokenizing is usually separated from parsing because it is computationally much easier, and has a few requirements that are hard to express in context-free grammars (that it always creates the longest possible token, ignores comments, possibly takes indentation into account). There are two ways of obtaining a tokenizer: A tokenizer can be written by hand, which seems easy at first but the pain will come later. It is better to use a tool, which means that one has to read a manual, and prepare some starting code. This creates some additional effort in the beginning, but much less in long term. Of course, I shamelessly promote my own great tool, and force students to use it in exercises.
Explanation of parsing. The purpose of parsing is to transform the output of the tokenizer into a tree structure, which represents the structure of the program at hand. As with tokenizing, one can either write a parser by hand, or use a parser generator.
Writing a parser by hand is doable if one knows how to do it, and may sometimes be the better option. In these slides I give a general approach: One needs to write a set of parse functions, one for each left hand side symbol, that recursively call each other. This approach is called recursive descent parsing. It is nearly always necessary to modify the grammar because when there rules with the same left hand side, whose right hand sides can start with overlapping terminal symbols, one does not know which rule to select. The easiest way to modify the grammar, is by allowing regular expressions in right hand sides of grammar rules. There is no general algorithm how to adapt the grammar, but I give examples. I also explain how I think that bottom-up parsing should be implemented.
In most cases, using a parser generator is better than writing the parser by hand. The initial cost of reading documentation, and preparing some required code interface pays off in long term. Once it is set up, extending the grammar or making changes is easy. I explain theory and practice of bottom-up parsing, and promote Maphoon of course.