Roll A Lisp In C - Part 1
Lisp is often known as one of the oldest programming languages. Indeed, it's conception can be traced back to research done by John McCarthy in 1960. In these series of articles I present an implementation of a Lisp written in the C language. These articles assume some familiarity in a Lisp like Scheme or Common Lisp.
A Lisp interpreter can be thought of as having 3 distinct parts. The Lisp reader, which takes a character string representation of a program and loads it into data for evaluation. The Lisp evaluator, which can compute an expression. The Lisp printer, which can take data and create a character string representation for displaying to a console. So, the first step is to create the Lisp reader so that we can have expressions to evaluate.
The Lisp reader is essentially just a parser. So, we should have a way to lex the input string. Which is to say we want to turn a string representation of our expression into a series of tokens. Thankfully, Lisp is quite simple to lex. The only tokens we care about are parenthesis and symbols.
// We'll have 128 tokens. Each token can be up to 32 characters long.
char token[128][32];
int lexer(char* input) {
int ii = 0; // input index
int ti = 0; // token index
// Loop thru the whole string
while(input[ii] != '\0')
switch(input[ii]) {
// Ignore whitespace and newlines
case ' ':
case '\n':
++ii;
break;
// Turn a left parenthesis into a token.
case '(':
token[ti][0] = '(';
token[ti][1] = '\0';
++ii;
++ti;
break;
// Turn a right parenthesis into a token.
case ')':
token[ti][0] = ')';
token[ti][1] = '\0';
++ii;
++ti;
break;
// Anything else is a symbol
default:
for(int i = 0;; ++i) {
if(input[ii] != ' ' &&
input[ii] != ')' &&
input[ii] != '(' &&
input[ii] != '\n' &&
input[ii] != '\0') {
token[ti][i] = input[ii++];
} else {
token[ti][i] = '\0';
break;
}
}
++ti;
break;
}
return ti;
}
This code will create 3 types of tokens. A left and right parenthesis token and a symbol token. It would be nice to have some way of representing iteration thru the tokens. An interface or a way of talking about the array.
This will be our way of talking about the token array. We can take the next token in the array or, look at the current token in the stream.
Our expressions are held in list structure so, we need list structured memory.
typedef struct {
void* car;
void* cdr;
} Pair;
Pair text[256];
Pair* textptr;
Pair* cons(void* x, void* y) {
textptr->car = x;
textptr->cdr = y;
return textptr++;
}
int ispair(void* x) {
return x >= (void*)&text &&
x <= (void*)&text[256];
}
Here we’re using pairs to represent list memory. Our interface to this memory is cons
and ispair
. cons
does exactly what we would want cons
to do. It makes an new pair from unused memory. ispair
is just way to check if the thing we’re refering to is in list memory or not.
We now have enough infrastructure laid out to begin implementing the reader.
void* read(char* ln) {
// Initialize the lexer and list memory.
curtok = 0;
textptr = text;
lexer(ln);
return read_exp();
}
void* read_exp() {
char* tok = nexttok();
if(tok[0] == '(')
return read_list();
else
return tok;
}
void* read_list() {
char* tok = peektok();
if(tok[0] == ')') {
nexttok();
return NULL;
} else {
void* fst = read_exp();
void* snd = read_list();
return cons(fst, snd);
}
}
This is the Lisp reader. read
will take in a character string representation of our program and, return a pointer to it’s Lisp representation.
Now that we have a representation of lists and symbols we can print them out.
void print(void* exp) {
print_exp(exp);
printf("\n");
}
void print_exp(void* exp) {
if(ispair(exp)) {
printf("(");
print_list(exp);
}
else
printf("%s", exp);
}
void print_list(Pair* list) {
if(list->cdr == NULL) {
print_exp(list->car);
printf(")");
} else {
print_exp(list->car);
printf(" ");
print_list(list->cdr);
}
}
Putting everything together we can make a basic REPL interface.
int main(int argc, char** argv) {
printf("Lisp REPL\n\n");
char buffer[256];
for(;;) {
printf(">> ");
fgets(buffer, 256, stdin);
print(read(buffer));
}
return 0;
}
In the next article we’ll augment this basic representation with syntax and datum and, create an evaluator for our language.