In: Computer Science
Using the descriptions below, write a scanner for numbers, symbols, comments, arithmetic operators, and parenthesis in Racket or scheme.
- a number is
one or more digits |
zero of more digits followed by a decimal point followed by one or more digits |
one or more digits followed by a decimal point followed by zero or more digits
- a symbol is
one or more characters from the set
[_A-Za-z] followed by zero or more characters from the set
[_A-Za-z0-9]
- a comment is
the literal string "#" followed by zero or
more characters until the end of the line
- an arithmetic operator is
+ |
- |
* |
/
- a parenthesis is
( |
)
Your scanner must convert input text into the following
tokens: NUM, SYM, ADD, SUBTRACT, MULTIPLY, DIVIDE, LPAREN,
RPAREN, ERROR. Turn in your scanner implementations and
test cases. Your test cases should show that your scanner produces
a correct list of tokens when given valid input
and the scanner output should contain an ERROR
token when given invalid input. the output must all be list
containing each output token.
Sample Racket return value from scanner: (NUM NUM NUM ADD NUM MULTIPLY NUM ADD NUM EOF) or (NUM NUM NUM ERROR NUM MULTIPLY NUM ADD NUM EOF)
For Racket, you may use the parser-tools/lex library that comes with Racket.
lang racket ;;; IMPORT ;; Import the lexer tools (require parser-tools/yacc parser-tools/lex (prefix-in : parser-tools/lex-sre) ; names from lex-sre are prefixed with : ; to avoid name collisions syntax/readerr) ;;; REGULAR EXPRESSIONS ;; Names for regular expressions matching letters and digits. ;; Note that :or are prefixed with a : due to (prefix-in : ...) above (define-lex-abbrevs [letter (:or (:/ "a" "z") (:/ #\A #\Z) )] [digit (:/ #\0 #\9)]) ;;; TOKENS ;; Tokens such as numbers (and identifiers and strings) carry a value ;; In the example only the NUMBER token is used, but you may need more. (define-tokens value-tokens (NUMBER IDENTIFIER STRING)) ;; Tokens that don't carry a value. (define-empty-tokens op-tokens (newline := = < > + - * / ^ EOF)) ;;; LEXER ;; Here the lexer (aka the scanner) is defined. ;; The construct lexer-src-pos evaluates to a function which scans an input port ;; returning one position-token at a time. ;; A position token contains besides the actual token also source location information ;; (i.e. you can see where in the file the token was read) (define lex (lexer-src-pos [(eof) ; input: eof of file 'EOF] ; output: the symbol EOF [(:or #\tab #\space #\newline) ; input: whitespace (return-without-pos (lex input-port))] ; output: the next token ; (i.e. skip the whitespace) [#\newline ; input: newline (token-newline)] ; ouput: a newline-token ; ; note: (token-newline) returns 'newline [(:or ":=" "+" "-" "*" "/" "^" "<" ">" "=") ; input: an operator (string->symbol lexeme)] ; output: corresponding symbol [(:+ digit) ; input: digits (token-NUMBER (string->number lexeme))])) ; outout: a NUMBER token whose value is ; ; the number ; ; note: (token-value token) ; returns the number ;;; TEST (define input (open-input-string "123+456")) (lex input) ; (position-token (token 'NUMBER 123) (position 1 #f #f) (position 4 #f #f)) (lex input) ; (position-token '+ (position 4 #f #f) (position 5 #f #f)) (lex input) ; (position-token (token 'NUMBER 456) (position 5 #f #f) (position 8 #f #f)) (lex input) ; (position-token 'EOF (position 8 #f #f) (position 8 #f #f)) ;; Let's make it a little easier to play with the lexer. (define (string->tokens s) (port->tokens (open-input-string s))) (define (port->tokens in) (define token (lex in)) (if (eq? (position-token-token token) 'EOF) '() (cons token (port->tokens in)))) (map position-token-token (string->tokens "123*45/3")) ; strip positions ; Output: ; (list (token 'NUMBER 123) ; '* ; (token 'NUMBER 45) ; '/ ; (token 'NUMBER 3))