UNIVERSITY OF

ILLINOIS LIBRARY

AT URBANA-CHAMPAIGN

JJUefi)

uiucdcs-R-7^-658

y)uuti

A BASIC LANGUAGE INTERPRETER FOR THE INTEL 8008 MICROPROCESSOR

by

9

June, 197^

Alfred C. Weaver Michael H. Tindall Ronald L. Danielson

Uf - ^ 1974

DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

URBANA, ILLIN<

UIUCDCS-R-7i+-658

A BASIC LANGUAGE INTERPRETER FOR THE INTEL 8008 MICROPROCESSOR

by

Alfred C. Weaver Michael H. Tindall Ronald L. Danielson

June, 197^

Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois

Ill

ACKNOWLEDGEMENT

The authors wish to express their appreciation to Professor James Vander Mey, who originally suggested and subsequently supervised the entire project; to Professor Thomas Wilcox, who consulted on the design of the software techniques; to Gordon Peterson, who wrote an assem- bler for a version of the Intel assembly language; and to Guy Riddle, who wrote an Intel 8008 simulator for the IBM 360/75.

Digitized by the Internet Archive in 2013

http://archive.org/details/basiclanguageint658weav

IV

TABLE OF CONTENTS

page

1. INTRODUCTION 1

2 . OPERATING SYSTEM k

2.1 Text Editing k

2 . 2 Syntax Checking 6

2.3 Run-time Monitoring 7

2 . k Error Handling 7

2 . 5 Immediately Executable Statements . 8

2.6 Input/ Output 8

3 . SYNTACTIC AND SEMANTIC ANALYSIS 10

3.1 Syntactic Analysis 10

3.2 Semantic Analysis 13

k. LEXICAL ANALYZER AND SYMBOL TABLE 15

k.l Lexical Analyzer 15

k.2 Symbol Table Manager 17

1+.3 Symbol Table Structure 18

5 . EXECUTION PACKAGE 20

6. SUMMARY o 21

APPENDIX

A. BASIC LANGUAGE BNF 22

B. DESCRIPTION OF THE STATE TRANSITIONS FOR THE SYNTACTIC RECOGNIZER 2k

C. STATE DIAGRAMS FOR BASIC LANGUAGE TABLE-DRIVEN INTERPRETER.. 31

D. PROGRAM EXAMPLE 37

1. INTRODUCTION

This report concerns the design and implementation of a BASIC language interpreter which is restricted to the environment of the Intel 8008 microprocessor. The value of the project was in overcoming the severe restrictions imposed by an 8-bit machine with only four general purpose registers, one accumulator, and two registers used simultaneously to form one core address. Furthermore, the machine's numerical capability is characterized by its two arithmetic instructions : 8-bit add and subtract. All addresses (ik bits) must be broken into high and low order bytes and stored in the H and L registers, respectively, for any memory reference. With a cycle time of 20 microseconds average, the machine is no number -crunching giant, but is more than adequate for the dedicated processor we envisioned.

This project is the direct result of a seminar course on mini- computers taught at the University of Illinois by Professor James Vander Mey. While the project itself was executed entirely on an Intel simulator, we feel that the techniques developed are directly applicable to the hardware described.

The hardware environment envisioned for this project included: a microprocessor with a l6Kx 8-bit word memory (Intel MCS-8), an ASCII keyboard, a black-and-white television for the display, and a custom- designed hardware interface which continuously displays the contents of a 2K buffer in Intel memory. Design and implementation of the interface was wholly controlled by another group.

The language which has been implemented is a version of BASIC which incorporates all of the most common features of the language and a few versatile extensions. A conscious effort was made to "design with forethought" so that future modifications, either by the authors or others, would not require extensive modification of the existing routines. The BASIC language statements currently implemented include: LET, DIM, DEF, REM, GOTO, IF, FOR, TO, NEXT, GOSTJB, RETURN, INPUT, PRINT, STEP, THEN, STOP, and END. The operators provided are addition, subtraction, multi- plication, division, exponentiation, and the Boolean operators AND, OR, NOT, LESS-THAN, LESS-THAN-OR-EQUAL, GREATER-THAN, GREATER -THAN-OR- EQUAL, EQUAL, and NOT-EQUAL. The extensions to the language include: up to 26 3-dimensional arrays, up to 26 user-defined functions, user functions with nested (but not recursive) definitions 5 levels deep, six built-in functions (sine, cosine, arctangent, exponential, logarithm, and square root), full floating point arithmetic performed with 5 byte operands (h byte mantissa, 1 byte exponent) with results rounded to h bytes (3 byte mantissa and 1 byte exponent), character strings in the PRINT statement, and FOR-loops nested up to 5 levels deep.

The BNF for the BASIC language implemented is supplied in Appendix A. The use of specially encoded keyboard keys for all multi- character keywords (LET, DIM, ..., END, relational operators, and user functions FNx) allows our grammer to be LL(l), thus simplifying the parsing technique.

The project broke neatly into two phases - design and development and the development phase itself was split into three basic areas: (l) the operating system, including master control of the CPU, mode switching, context switching, polling routine for the keyboard,

complete text editing system, screen display management, user source text memory management, and user "output page" memory management;

(2) the syntactic/semantic analyzer, including BASIC syntax recognizer, error detection/location routines, lexical analyzer, and symbol table manager;

(3) the execution routines , including specific utility routines needed by the syntax analyzer (stack control and lexical pointer movement), a floating point arithmetic package (add, subtract, multiply, divide, fix, float, compare, load, store, and error branching),

a transcendental function evaluator (sine, cosine, log, exponential, arctangent, and square root), and a two-way floating -point-to-character- string and character-string-to-floating -point conversion package.

The following sections and appendices present an explicit report of the design philosophy and characteristics of the three major areas.

h

2. OPERATING SYSTEM

The operating system is charged with the responsibility of managing the CPU. As a dedicated processor, the CPU exists only to serve the user, but it does so in one of six possible environments.

2.1 Text Editing

In pure text editing mode the keyboard is used to build text on a display screen using the elementary operations of insert character, delete character, replace character, delete line, scroll up 1 or 8 lines, and scroll down 1 or 8 lines. No syntax checking of any kind is performed in this mode. To aid the user, a moveable cursor (@) is positioned immediately to the left of the character or line to be inserted, replaced, or deleted. Thus the programmer has constant and precise control over the location of any of his changes. The cursor is moved by depressing one of its special control keys :

->

(right arrow) move right one character;

AB@CD becomes AB0@D

(left arrow) move left one character;

AB@CD becomes A@BCD

t

(up arrow) move cursor to the beginning of the current

line or, if it is already at the beginning of a line, to the beginning of the previous line;

ABCD becomes ABCD EP@G @EFG

and

ABCD becomes @ABCD @EFG EFG

I

(down arrow) move cursor to the beginning of the next

line;

AB@CD becomes ABCD EFGH @EFGH

The user's memory space, at addresses lkK through l6K-l} is

bounded by non -ins er table markers at the beginning of the user's source

text (static), end of source text (dynamic), and end of user's data

page (dynamic). The display is initialized as follows:

[~/gi iiiiiiiitiiiiiiirfiiiititttiiiiit*]

where [ represents the beginning -of -text marker

(source page and output page),

@ represents the cursor,

I represents the end-of-line marker,

and ] represents the end-of -memory marker.

The text editor functions are accomplished by manipulating a line of text or the entire program as necessary. A replace character requires no text movement, but an insert or delete character requires that all of the text from the cursor to the end-of -memory marker be shifted one byte in core.

Insertion is typical of the operations performed, and some operation counts show that if an insertion is to occur n characters

to the left of the end-of -memory marker then the elapsed time t from

picking up the character in the polling loop until the CPU returns to

the polling loop is :

20 microseconds/instruction * (50 + 20n) instructions

For example:

n t

10 0.5 milliseconds

100 kl milliseconds

1000 0.^4- seconds

Assuming an average keyboard rate of 3 characters/second, a burst rate of 10 characters/second, and an average text movement of less than 100 characters, we found that the machine "will adequately keep up with the user by using only a single one- byte hardware buffer.

Additional command codes were reserved (but not implemented) for such handy functions as FILE-SCREEN-TO-TAPE, MOVE-TAPE-TO-SCREEN, SORT-SCREEN-BY-SEQUENCE-NUMBERS , etc .

Appendix D documents the operating system/ text editor. Each screen display shows the state of the screen just prior to the receipt of the next character from the keyboard.

2.2 Syntax Checking

When operating in the "BASIC" mode, the operating system provides line-by-line syntax checking of the. text being inserted on the screen, in addition to providing all of the text editing features described in section 2.1 above. Every time the cursor moves right over an end-of - line marker (in response to a right arrow, down arrow, or insertion of an end-of -line character), the syntax analyzer is called to parse the line just created. Whenever the cursor moves left over an end-of -line,

the syntax analyzer (SYNA) is called to "unparse" the line just entered. This feature allows the user to interactively construct a syntactically correct BASIC program. Special command keys are provided to allow the user to change from "BASIC" mode to "TEXT" mode, and vice -versa.

2.3 Run -time Monitoring

When a RUN command is detected by the operating system (OS) command decoder, OS enters the third of its major roles, that of an execution monitor. In response to the RIM command, OS initializes all of the interpreter variables (primarily the symbol table and the parsing stacks), sets the lexical (LEXl) pointer to the first executable state- ment, removes the cursor from the program text, switches the screen display base address to point to the data page, and begins calling SYNA on a line-by-line basis. After each return from SYNA the keyboard strobe (an "in use" bit) is checked and any character on the data bus is read. The only character recognized while executing is a STOP IMMEDIATE command, which causes OS to jump to the immediately executable statement routines described below. Other characters are ignored until execution is term- inated, either normally (by execution of a STOP statement) or as the result of an error. Normal termination leaves OS in the "BASIC" (syntax checking) mode. Commands are provided for jumping from data page to program page and back again. Detection of an error causes SYNA to jump to an OS routine for error handling.

2.k Error Handling

When the error routine is entered the screen base pointer is shifted back to the program page, a message describing the error is inserted on a line following the one in which the error was detected,

8

and the cursor is inserted in the program at the spot nearest the error. This enables the user to quickly locate his error. The operating system then jumps to the text editing/syntax checking routines in anticipation of error correction attempts by the user.

2.5 Immediately Executable Statements

Immediately executable statements are handled as described under "run-time monitoring". The user may stop a running program by pressing the STOP IMMEDIATE key, thus causing the program to halt at the end of the currently executing line. The operating system then allows code to be entered line-by-line in the "immediately executable" mode. Each line must be followed by the EXECUTE IMMEDIATE command key, which causes the line just constructed to be parsed and immediately executed. The user may jump back and forth between the program and data pages, but may not modify the program unless he leaves the immediately executable mode. A RESTART command is provided which causes the user program to begin execution at the point where it was interrupted (unless one of the immediately executed statements caused a branch, in which case execution begins at the branch address). The user may operate the system as a desk calculator by initially jumping to the data page and inserting unnumbered BASIC statements, each followed by the EXECUTE IMMEDIATE command.

2 . 6 Input/Output

The operating system provides input-output facilities which may be used by SYNA. when in the execute or execute immediate mode. All input requests and output text are displayed on the data page. The conversion routines in the floating point arithmetic package are used to provide the ASCII-to-binary and binary-to-ASCII conversions. Input

requests may be made to provide data for a scalar variable or an array element. Array subscripts are taken from SYNA's expression stack. A typical such input request might be:

INPUT X(2,5) > @

The input data is inserted to the right of the great er-than sign, and is converted to internal floating point by the appropriate conversion routine.

There are three separate output routines: one which causes the data page display to skip to a new line , one which outputs numbers , and one which outputs character strings. The character string routine requires the H and L registers to point to the beginning of the string and leaves the LEXI pointer to the right of the terminal quote. The numeric routine displays a number pointed to by the top of SYNA's address stack. Commands are ignored during this I/O activity. The output routine for numbers will print them in its choice of FORTRAN I, F, or E format, depending upon the magnitude and precision of the particular number being output. The input routine accepts the input string in any format.

All of the above routines are written as pure procedures, and in a modular fashion. There is a reasonable amount of optimization possible in the OS routines, particularly in the n-way branches. Total size of the text editor/ operating system is about 3000 bytes.

10

3. SYNTACTIC AND SEMANTIC ANALYSIS

3.1 Syntactic Analysis

SYNA is the heart of the BASIC language capabilities of our system. It performs both syntactic correctness checking and interpretive BASIC language program execution. SYNA is called on a statement-by- statement basis from the operating system; when a statement has been correctly parsed or executed, a normal return is made to OS; if an error is detected, control passes to an error recovery procedure in OS where appropriate recovery measures are taken (these are documented elsewhere).

The syntax analyzer is essentially a stack oriented pushdown automaton. As a statement is parsed, the analyzer makes a sequence of state transitions that depend on the current input symbol (token) in the statement. Entering certain states causes the syntax analyzer to accept the statement, and likewise, other combinations of the analyzer's state and input token cause the analyzer to reject the statement and signal an error.

In an effort to conserve space, the state transitions are encoded in table form, making the syntax analyzer table-driven. A fairly small table driver routine causes new tokens to be input when required by a lexical analysis routine, moves the analyzer through the state table, manipulates the analyzer's stack, and invokes a set of execution routines that perform the actual operations defined in the BASIC language. For a more detailed explanation of the state transitions and a view of the

11

syntax table, see appendices B and C.

In addition to the analyzer's parser stack, a number of other stacks are maintained by SYHA.. These include an address stack, an expression value stack and corresponding operator stack used for expres- sion evaluation, a FOR-loop stack and a user function evaluation stack. The parser stack is used strictly for state transitions by the pushdown automaton. The address stack (2 bytes wide) is used for executing GOTO and GOSUB statements, for array reference resolution, and for any general bookkeeping operation requiring two bytes of stack storage. The expres- sion stack (k bytes wide) is used to store temporary arithmetic components and results involved in expression evaluation. The operator stack (2 bytes wide) uses one byte to identify the arithmetic operator involved and the other byte to hold the operator's F precedence (see D. Gries , Compiler Construction for Digital Computers , chapter 5, "Simple Precedence Grammers"). The stack is used in conjunction with the expression stack in evaluating expressions. The FOR-loop stack is used to keep track of FOR-loop nesting in the program. Each time a new FOR-loop is entered, an entry is pushed onto the FOR-loop stack. When the loop is exited, the stack is popped one level. The last stack is a user function stack (3 bytes wide). One byte points to the symbol table location of the function, and the other two bytes hold the return address for use after the function has been evaluated. The function stack is needed since one function may be defined in terms of another function.

Under the current implementation (easily modifiable) the function stack is 5 deep, the FOR-loop stack is k deep, the expression and oper- ator stacks are each 10 deep, and the parser stack is 15 deep.

The syntax analyzer operates in one of three modes, determined

12

by the operating system. When the programmer pushes the RUN key, all variables (both compiler and program) are initialized to zero and then SYNA is called statement -by-statement throughout the program; SYNA. again performs a syntax check on each statement and then executes the statement. This is the EXECUTE mode, in which all stacks are fully operational and utilized.

As explained in the operating system chapter, the programmer can type in his program in an interactive BASIC environment, giving him a syntax check on his statement as it is entered, but not actually exe- cuting the statement. This is the normal PARSE mode. The FOR-loop stack, parser stack, and address stack are operational in this mode. The expression stack, operator stack, and function evaluation stacks are not used since the statement is not actually executed.

One primary goal for the project was to provide good syntax checking and error detection in this normal PARSE mode. To do this, FOR-loop nesting should be checked even though the loop is not actually being executed (indeed, at the time the analyzer is parsing the FOR statement, the loop contents and the NEXT statement probably do not even exist.). Thus, SYNA builds the FOR-loop stack at parse time, enabling the check on loop nesting. Also, we require array and function definition before use so that we can insure that all uses of the same variable are consistent. To handle this, SYNA sets a flag bit in the symbol table when an array is defined by a DIM statement, and likewise when a function is defined by a DEF statement. Now all uses of the array or function name can be checked for syntactic correctness and previous definition.

The only complication which arises in the above scheme of syntax checking occurs when the programmer backs up the cursor and changes,

13

inserts, or deletes statements which have already been parsed. To adequately handle these occurances, the syntax parser actually needs to back up with the cursor and undo any FOR-loop nest checking or array/ function definitions. For this purpose, the operating system will call S"YNA in the UNPARSE mode whenever the cursor is backed up over an end- of-line. For example, if a NEXT statement is backed over, the FOR-loop stack will be pushed up one level to indicate that we are back inside a FOR-loop again. If a DIM statement is backed over, all the arrays dimensioned within that statement will have their "defined" bit turned off (i.e., "not defined") in the symbol table.

Eventually, the programmer will move the cursor forward again using the right arrow or down arrow keys ; when this happens SYNA is called again in the regular PARSE mode, the statement is parsed again, and the normal parsing actions are taken.

3.2 Semantic Analysis

It is apparent that, depending upon the mode when SYNA is called, different operations are performed when a statement is parsed. As far as the formal parser is concerned, it means that any given state can have different semantic routines associated with it, depending upon the mode. Thus, each state entry in the table has three bits, one for each of the EXECUTE, PARSE, and UNPARSE modes, as well as an execution/semantic rou- tine number. When a state is used in the table the execution/semantic routine is invoked only if the bit corresponding to the current mode is set. For example, the expression evaluation routine is an execution- only routine, having its EXECUTE bit set and its PARSE and UNPARSE bits not set in the SYNA table; thus this routine is called only if the mode is EXECUTE.

Ik

This concludes the discussion of the syntatic/semantic analyzer, The lexical analyzer and symbol table manager are discussed separately.

15

h. LEXICAL ANALYZER AND SYMBOL TABLE

k.l Lexical Analyzer

The lexical analyzer is a duty routine for the syntactic recog- nizer. The lexical pointer LPTR contains in two contiguous bytes the page number (high order 6 bits specif ing which 256-byte page) and byte number (low order 8 bits) of the ik bit addresses within the user area, lk-K. through 16K-I. On entry to LEXI, LPTR points one byte beyond the most recently scanned token (initially SYNA points it to l^K+1, the first byte of user source text). LEXI isolates the next non-blank (possibly the current) character, sets SEM (semantic class variable) to the ASCII code of the character being examined if it is not an identifier or sets SEM to the variable's displacement in the symbol table if it is an identifier, and performs a pseudo binary search to identify the currently scanned symbol as one of 15 token classes. The token class is recorded in SYMB. If the token is an operator, LEXI records its F and G precedence (see Gries) in the variables F and G. The token classes are:

TOKEN ASCII SYMB SEM F G

LET

DIM

DEF

REM

GOTO

IF

FOR

ASCII

SYMB

SEM

129

15

129

130

15

130

131

15

131

132

15

132

133

15

133

13^

15

13^

135

15

135

16

TOKEN ASCII SYMB SEM F G

TO

136

15

136

NEXT

138

15

138

GOSUB

139

15

139

RETURN

11+0

15

11+0

INPUT

ll+l

15

11+1

PRINT

ikz

15

11+2

STOP

1^3

15

11+3

END

Ikk

15

11+1+

STEP

ll+5

15

1^5

THEN

lk6

15

ll+6

NOT

ll+7

3

ll+7

12

13

AND

ll+8

k

ll+8

'5

1+

OR

ll+9

k

ll+9

3

2

FN 15k 7 stp

SIN

155

6

155

7

6

ATN

156

6

156

7

6

COS

157

6

157

7

6

EXP

158

6

158

7

6

LOG

159

6

159

7

6

SQR

160

6

160

7

6

<

161

5

161

12

13

<=

162

5

l62

12

13

>

163

5

163

12

13

>=

161+

5

16k

12

13

=

165

5

165

12

13

#

166

5

166

12

13

+

1*3

1

1+3

9

8

-

1+5

1

1+5

9

8

■*

k2

2

1+2

11

10

/

kl

2

1*7

11

10

t

9h

2

9k

11

10

17

TOKEN

ASCII

(

4o

)

kl

tt

34

5

44

end-of-line

92

A

65

B

66

SYMB

8 9

10

10 10

11 11

SEM

4o

kl

34 44 92

stp stp

0 0

G

Ik

0

90

65 ko

66 4o

90 4o

11

12 12

12

U8

14

k9

14

57

14

16

stp

stp stp

stp

stp = symbol table pointer

A( B(

o

Z(

0

1

others (unknown)

4.2 Symbol Table Manager

If a token is an identifier, a simple one character lookahead resolves the question of whether the identifier is an array or a simple scalar; likewise, after receiving the special FN key, the next character determines which function (FNA, FNB, ..., FNZ) is being referenced. If an identifier is referenced, LEXI calls the symbol table manager (LOOKUP) which, in turn, finds or inserts that identifier in the symbol table and returns the identifier's location as a displacement from the beginning of the symbol table; the displacement is returned in the variable SEM.

18

The symbol table search is based on a linear hash generated by AOT)ing the symbol's ASCII code with 31 (=00011111p) and multiplying the result by 8 (three left shifts). The 26 alphabetic characters now hash uniquely to locations 8, l6, 2k, . .., 208. Then a linear search is begun at the hash address HASH = 8 * mod (ASCII, 32 ) , checking each entry for one of three possibilities: (l) an empty entry (flag byte pointed to by HASH is zero), indicating that the desired entry is presently not in the table and causing insertion; (2) the variable name in the symbol table matches the recognized name in the B,C registers (SEM then returns the offset from STABH, the base address of the symbol table); or (3) the search fails 32 times and then signals an error - "symbol table full".

k.3 Symbol Table Structure

The structure of the symbol table is :

flags

first char

second char

binary value of floating point variables

where

The flag byte is subdivided into its bit fields as

V6b5V3b2blbO

b„ =0 entry empty

= 1 entry in use

b = 0 not an array

= 1 variable is an array

IDs - 0 not a user function

= 1 variable is a user function

19

SYNA uses b to signal "defined" or -undefined" during the parse. The remaining bits could be used individually or in combination to signal other possible data types (e.g., character strings) in a future improved version.

Single character scalar names are stored in the symbol table as [first char] [second char] = [letter] [blank], double character scalars as [letter] [digit], arrays as [letter] [(], and user functions as [FN] [letter]. Thus the type of any entry may be determined from either its flag bits or its name.

For arrays and user functions whose value is not stored in the symbol table, the 5 byte value field is used for other information. In particular, the entries look like:

scalar entry

1

5 byte value

array entry

X

X

upx

up2

up3

user function

high

low

disp

X

X

where: up. is the i upper bound of the array, with all

arrays internally having three dimensions whose default upper and lower bounds are zero;

(high, low) is an address pointer which marks the

return address when the function has been evaluated;

disp is the symbol table displacement of the function's dummy parameter.

Note that the design of the symbol table allows the simultan- eous use of the scalar A, the scalars AO, Al, ..., A9, the array A( , and the user function FNA with no ambiguity and no extra overhead in either LEXI or LOOKUP.

20

5. EXECUTION PACKAGE

No computing system intended for scientific use would be complete without true number -crunching ability. Since our BASIC system is designed for a whole range of applications, it seemed imperative that extensive floating point arithmetic capabilities be included. In an effort not to "reinvent the wheel," we decided to purchase the floating point arith- metic package utilized by the Datapoint 2200, a popular minicomputer whose CPU is similar to an Intel chip. The Datapoint software provides the means for floating point addition, subtraction, multiplication, division, and comparison, as well as both ASCII-character-string -to- internal-floating -point and internal-floating -point-to-ASCII-character- string conversion0

21

6. SUMMARY

The whole project, which extended for a period of approximately five months, successfully illustrated the feasibility of a BASIC inter- preter running in a microprocessor environment. The basic design of the system is good, and serves well as a model for similar interactive systems .

It is regretable that time did not permit us to try our design on real -world hardware. We believe that the actual implementation of such a system might well be a project for future research.

22

APPENDIX A BASIC LANGUAGE BW

<basic program>

<program statement> <sequence number> <basic statement>

<let statement>

<formiila>

<conj-unction>

<Boolean>

<expression>

<term>

<factor>

<primary>

<relative>

<operand>

<variable>

<scalar variable>

<array varia"ble>

<subscript>

<program statement> | <basic program>

<program statement>

<sequence number> <basic statement> (CR) ,2

= <digit> {<digit>}

0

<let statement> | <dim statement> J <def statement>| <rem stateraent> | <goto statements] <if statement> <for statement> | <next statements* | <gos-ub statement> | <return statement>| < input statement> | <print statement>| <stop statement> | <end stateraent> LLET) <variable> f^\ <formula> <con;junction> | <formula> (ORj <conjunction> <Boolean> | <conjunction> rANDj <Boolean> <expression> I <Boolean> <relative> <expression> <term> [ <expression> (+ | -) <term> <factor> | <term> (*|/) <factor> <primary> | <unary> <primary> <operand> | <primary> ( t) <operand>

©10101 ©101©

<variable> | <number> I <function>

<scalar variable> I <array variable>

<letter> [<digit>]

<letter> <subscript>

_(_ <expression> { , <expression>} _ )_

23

<letter> <digit> <unary> <function> <function name>

<number>

<dira statement> <array list>

<upper bound> <statement> <rem statement> <goto statement> <seq"uence number> <if statement> <for statement>

<next statements <gosub statement> <return statement> <input statement> <print statement> <print list> <character string> <stop statement> <end statement>

A | B | C I ... | Z

0 | 1 | 2 | 3 | k | 5 | 6 | 7 [ 8 | 9

= +

NOT)

<f unction name> (<expression>) (FN) <letter> | (SIN) f (cos) | (LOG) [~ (sqr)

(exp) | ft

[+ | -] {<digit>+ [.] | <digit>* 0<digit>+]

[E[+ | -] <digit> [<digit>]]

(DIM) <array list>

= <letter> _£_ <upper bound> { , <upper bound>] _ )

2 ,+

0

{ ,<Letter> _£_ <upper bound> { ,<upper bound>} } <number> such that 0 < number < 255

JDEF) (FN) <letter> {_ <letter> ]_ (^\ <formula>

(REM) <character string> (cr)

K)TQ <sequence number> <number> such that 0 < number < 999

(if) <formula> (THEM <sequence number>

(FORJ <scalar variable> C=) <formula> (TO) <formula> [ (STEFJ <formula> ]

(NEXT) <scalar variable>

^GOSUBy <sequence number>

(return)

(INPUT) <variable> ( , <variable> } (PRINT) <print list> { , <print list>| "<character string>" | <formula> any TTY character except " (STOP] KNm

2k

APPENDIX B DESCRIPTION OF THE STATE TRANSITIONS FOR THE SYNTACTIC RECOGNIZER

The parser that is used in our BASIC system is essentially a one stack push down automata; the parsing technique is top-down goal oriented with the stack used for state sub-calls and recursion. book- keeping. In the following discussion the abbreviations used are:

SYMB: a token's symbol class, i.e., identifier, reserved word, etc.

SEM: a token's concrete representation, i.e., "+", "IF", etc.

In order to describe the parser's state transitions a few state transition diagram conventions or notations have been developed. A transition from one state to another will occur at least once for every new token from the input string. Each state in the diagram will have at least one branch coming out of it. These branches are usually labeled with the conditions necessary for that branch to be taken. An unlabeled branch is taken as a last resort. If none of the branches is satifsfied, it indicates an error condition. If a branch has more than one condition on it, the conditions are each placed on a separate branch and the branches are connected together; to reach the end of a branch, all inter- mediate branches must be satisfied. The end of a branch sequence contains the state transition information,, Usually, just the "next state to move into" is indicated; sometimes a box, such as

<state>

25

may appear at the end of a branch; this indicates an intermediary "state call" must take place before the transition to the "next state" occurs (in other words, save the "next state" on the parser stack and make the state transition to the state in the box)., Likewise, sometimes a symbol such as

may occur in a branch; this indicates the state transition is to the state at the top of the stacko

For example, examine the BASIC GOTO statement: <goto> : : = GOTO <seq.#> EOL The state transition diagram for this statement is:

SI) SEM = "goto" . S2

S2j SYMB = <seg..#> , S3

S3 J SEM = eol . <new statement>

This is interpreted as :

If you are in state SI and the current token is a "GOTO", then move into state S2. If you are in S2 and the current token class is <seq.#>, move into state S3. If you are in state S3,

and the current token is "EOL" (end of line), move to the state for the beginning of a new statement. Note that in all cases here, if the given condition is not met, there is an error.

As another example, consider the IF statement:

<if> ::= IF <expression> THEN <seq.#> EOL

The transition diagram for this is:

SI SM="if

<expression^

S2

S2 J SEM = "then" , S3

SYMB = <sea.#> g Sk

26

SEM = eol # <new statement>

This is interpreted as:

If you are in state SI and the current token is an "if", then push S2 onto the parser stack and make the transition to state <expression> (although not shown, state <expression> will parse an expression as defined by BASIC and, when finished, will make the transition to the state at the top of the stack, namely S2, and then pop the stack). The rest of the diagram is interpreted similarily.

This concludes the discussion of the state transitions for our system.

The table system

Given the state transition diagrams for a language, the remaining problem is to encode the information and actions of the diagrams into a table format. To succeed, the table must contain fields for:

1) Checking the branch conditions.

(a) Checking the SYMB or SEM values of the current token

(b) Ability to check efficiently for all the branches of a state

(c) Ability to detect an error condition

(d) Ability to check multiple conditions, i.e., more than one branch condition to reach the end of the branch

2) Stack operations.

(a) Push a state onto the parser stack

(b) Pop a state off the parser stack and make the transition to that state

3) Normal "next state" transitions.

k) Since taking a branch normally means accepting the current token and moving on to the next token in the input string, there must be someway of making an 'epsilon move', i.e., changing the state while not moving the input pointer.

27

5) In this system we wanted to be able to have the table drive the actual execution of the statement as well as just syntactically parse it. Also, since the system will keep up with the programmer (FOR- loop nesting checks, defined functions and arrays) even during simple parsing mode, the table needs to indicate what routines to perform and when they should be performed.

With these requirements in mind, we describe the actual table system. The system is divided into two parts, the table itself and a routine to interpret the table. Consider the table to be actually a large vector, where each element of the vector is one "entry" in the table. There are four types of entries in the table: HEADER, VALUE, CONTROL, and ERROR.

HEADER

The HEADER entry contains information about the next few entries in

the table. It tells what tests to make, how many of them to make, and what

to do if they all fail. In particular, there are three fields in the entry:

TYPE (2 bits): The TYPE field contains a number (0-3) that ident- ifies the type of branches for that state.

QUAN {k bits): The QUAN field contains the number of branches for the state.

FAIL (2 bits): The FAIL field contains a number indicating what to do if all the branch tests fail.

These fields are discussed in more detail on page 29.

VALUE

The VALUE entry contains only one field. This field holds a number which is to be compared with either the current SYMB or SEM token value, depending on the previous HEADER that was processed. If the field's number matches the current token properly, then the correct branch has been

28

located for this state and the next CONTROL entry should be interpreted. On the other hand, if the field's number did not match the token, then this branch has failed; in this case, the previous HEADER information will determine what table entry to look at next.

CONTROL

The CONTROL entry is interpreted only when a correct branch has been located for a state. The CONTROL entry corresponds to "the end" of a branch in the state transition diagram. It contains fields for Push state, POP, Next state, whether more conditions must be checked, whether this is an epsilon move, and also an execution/parse routine number, along with three bits for the the mode checking - one for each mode: unparse, parse, execute, If the mode bit in the table that corresponds to the actual mode of the system is set, the routine is called. In all cases, if a field value is 0, then. that option is ignored when the CONTROL entry is interpreted.

ERROR

The ERROR entry is interpreted when all the branches have failed for a state and the HEADER'S FAIL field equals 1, indicating an error exists, The ERROR entry contains an error number, which can be extracted and passed on to the error display routine.

These are the entries in the table. The detailed bit implementation follows

HEADER

quantity

fail

type

29

TYPE

The TYPE field determines exactly what branch conditions must be checked for and what the order of the next entries in the table is.

type = 0 Unconditional TRUE branch. The next table entry is a CONTROL entry that is immediately interpreted.

type = 1 Branch condition : SYMB value check.

The following two table entries consist of a VALUE entry followed by a CONTROL entry; the current token's SYMB value is compared to the VALUE table entry; if they are equal, then the CONTROL entry is interpreted; if unequal, then the CONTROL entry is jumped over in the table and different tests can be made (see fail).

type = 2 Branch condition: SEM value check. Same form as type=l.

type = 3 Special case test. QUAN contains routine number. CONTROL follows .

QUAN

The QUAN field determines how many similar branches a state has. It gives the number of VALUE/CONTROL pair entries following the HEADER. For type =3, the QUAN value is a special case routine number.

FAIL

The FAIL field is used only when all the other checks have failed.

fail=0 The next entry is a new HEADER with a new set of condition checks. It is "flowed" into.

f ail = 1 The next entry is an ERROR entry.

fail = 2 The next entry is an alternate CONTROL entry that should be immediately interpreted.

30

VALUE

value

Contains one number to be used according to the previous HEADER information,

ERROR

ERROR #

8

Contains an error number that is passed to the error message writer when an error is detected. CONTROL entry: 18 bits

Unparse mode bit

POP

Push state

Execution

Parse Routine §

Mode bits Parse Execute

End of

branch

bit

Epsilon

move reparse

Next state

1

1

6

6

1 1

1

1

6

The CONTROL entry is divided into three bytes. With the exception of the Unparse bit, all the fields are interpreted in order as they come. For the Execution/Parse/Unparse routine, the Unparse bit is ORed into the third bit position of a byte, and the other mode bits are ORed into the first two bit positions. Then this byte is compared with the current mode switch (which has values: Unparse = h> Parse =2, Execute = l). If the corresponding bit is set the execution routine is called.

APPENDIX C

STATE DIAGRAMS' FOR BASIC LANGUAGE TABLE-DRIVEN INTERPRETER

31

SYMB = Const

err 1

ST1 (2)

STEND (k6)

ST1 (2)

SYMB = rswd

err2

SEM = let

SEM = goto

SEM = if

SEM= stop

SEM= end

SEM = for

SEM= next

SEM = rem

SEM = gosub

SEM= return

SEM= input

SEM= print

SEM= dim

SEM= def

err 3

II (3)

el) Gl (6)

EXPR (37) IP1 (8)

ST0P1 (9) ST0P1 (9) FORI (10) NEXT1 (16)

pelUV (pop) reparse

el) G0SUB1 (17)

jS) G2 (7)

INI (21) (e23j- PR1 (2k)

DIM1 (28)

DEF1 (31)

32

SYMB = ident

.SYMB = array

err

k

e2

ARR1(19)

12(h)

^(^7) reparse

< L2 ( h )) SEM=eq

err 5

EXPR(37)

13(5)

L3(5)L__SM=eol

err

SYMB = const

err 7

-(pop)

reparse

G2(7)

G2(7)l SEM=eol

err

8

pop) reparse

IF1 ( 8 )) SEM=then

err

6

Gl(6)

SEM= eol

err 8

pop) reparse

FORI

,(io),

SYMB = ident

err

F0R2 (11)

SEM= eg

err 5

EXPR(37)

F0R3(12)

F0R3 (12)

SEM= to

err 9

EKPR(37)

-F0RJ+(13)

33

SEM= step

EM= eol

err 6

S'EM = eol

err

SYMB = ident

SYMB= const

err 7

SEM = eol

SEM= comma

SEM= thesis

err

)) SYMB = ident

SYMB = array

err 11

elOl

ell)

Jel2

&

ARR1(19)

EXPR(37)

F0R5(lU)

-F0R6 (15) reparse

-FOR6(15) reparse

pop) reparse

•G2(7)

•G0SUB2(l8)

pop) reparse

EXPR(37)

JVRR2(20)

EXPR(37)

■ARR2(20)

IN3(23) -IN2(22) reparse

IN3(23) reparse

3k

DEF (3D

SEM = comma

SEM = eol

err 11

PR1 ( 2k )) SEM=eol

err

SYMB = array

err 12

SEM= comma

SEM= thesis

err

Sffl= comma

EM= eol

err 13

SYMB = user fen

err

ik

INI (21) 'pop)

.(POP)

^>R2(25)

Pe27l ^pe28\

DIMl(28) (pop)

reparse

reparse reparse

PRM27)

PR3(26) repa:

rse

PR4(27) reparse

reparse

EXPR(37)

_DIM2(29)

EKPR(37)[ DIM2(29)

DIM3(30)

reparse

DEF1(32)

35

SEM = par en

SYMB = ident

err

16

SEM = thesis

/def^A

err 17

SEM = ea

err 5

SEM= eol

SYMB = punlef t

SYMB = ident

SYMB = array

SYMB = const

SYMB = opun/opnot

SYMB = bif

SYMB = user fen

err

SEM= par en err 15

DEF2(33)

DEF^(35)

.(pe3l).

IP°P

ARR1(19)

<e37>

pe33)

EXPR(37)

DEF3(3*0

EXPR(37)

.DEF5(36)

reparse

EXPR(37)| 0PER(^5)

0PER(l+5)

EXl(38) reparse

0PER(1+5)

EXPR(37)

ARGl(^O)

ARGl(^O)

BIF1(39) ■FCNl(te)

0PER(i+5) reparse

0PER(^5) reparse

,ARG2()4l)

36

SEM = thesis

err 17

reparse

EXPRFCN ( kk ) FCM2 r epar s e

(U3)

test

Mode = execute

Special Case 1

SYMB = oplla

SYMB = opbin

SYMB = punrite

SYMB = oprel

SYMB = opcond

STACKTOP=oper

special case 3 -err 15

EXPR(37)

(ek2\ [eh^

Special Case 2 stacktop- o-yevik'o)

OPER(l^) reparse

reparse

.reparse

EXPR(37) EXPR(37)

reparse

EKPR(37) EXPR(37) reparse

^ -UeW

l£(k) repar

se

37

APPENDIX D PROGRAM EXAMPLE

The following pages show the successive (simulated) states of the TV screen in response to various keyboard commands issued in the text -editing mode.

38

The simulated TV screens below show the successive states of the visual image as it responds to insertion, deletion, replacement, and cursor movement under control of the BASIC text editor.

The screens are presented in columns, and should he read top to bottom and left to right.

9 LET A9 LET A = 9 LET A = 6

9

LET 9 LEt A 3 LET A = 63 LET A = 6

IF 9

39

LET A = 6

IF A S

LET A = 6

IF A = 7 THEN 9

LET A = 6

IF A = 7 THEN P

= OS

LET A = 6 IF A = »

LET A = 6

IF A = 7 THEN P3

LET A = 6

IF A = 7 THEN P = Q

3

LET A = 6 IF A = 7a

LET A = 6

IF A = 7 THEM P 3

LET A = 6

IF A = 7 THEN P = 0

Da

LFT A = 6 IF A = 7 ^

LET A = 6

IF A = 7 THEN P = 3

LET A = 6

IF A = 7 THEN P

DE9

= 0

1+0

LFT A = 6

IF A = 7 THEN DEF3

LET A = 6

P = 0 if A = 7 THEN P = DEFINE A<^

LET A = 6

IF A = 7 THEN P

OEFINF At It 2) fl

= 0

LET A = 6

TF A = 7 "HEM DEFI?

P = 0

LET A

IF A = DEFINE

7 THEN P = AtlS)

LET A = 6 IF A = 7 THEN DEFINE All, 2) 3

P = 0

LET A = 6 IF A = 7 DEFINE

'HEN P = 0

LET A = 6

IF A = 7 THEN P =

DEFINE A(l,»

LET A = 6

IF A = 7 THEN P aDEFINF A (1,2)

= 0

LET A = 6

IF A * 7 THEN

DEFINE?

LET A = 6 P = 0 IF A = 7 THEI^ P = DEFINE AC1,25>

LET A = 6

IF A = 7 THEN P

D^EFINE A (Lf 2)

= 0

kl

LET ft = 6

IF ft = 7 THEN P = DEFINE 4(1,2)

LET A' - 6

IF A = 7 THEN P =

DEF3> A<1,2»

LET A = 6

IF A = 7 THEN P =

3EF A (1,2)

Q

LET A = 6

IF A = 7 THFN P =

DEF3INE A(l,2)

LET A = 6

IF A = 7 THEN P =

DE3F A(l,2)

LET A = 6

IF A = 7 THEN P = Q

3F A (1,2)

LET A = 6

IF A = 7 THEN DEFINE A(l,2)

P = 0

LET A = 6

IF ft = 7 THEN P =

03>EF A( 1,2)

LET A = 6

IF A = 7 THEN P =

a A( 1,2)

LET A = 6

IF A = 7 THEN P = 0

DFF^E A(l,2)

LET A = 6

IF ft = 7 THEN P =

3DEF A(l,2)

LET A = 6

IF A = 7 THEN P =

3 A(l,2)

42

LET A = 6

IF A = 7 THFN

DEF a A( I ,?)

p = O

LFT A = 6 TF A = 7 THEN OFF Ml,?) THIS

P = 0

LET A = 6

TF A = 7 THEN

DEF A(l,2)

THIS LH

p = n

LET A = 6

IF A = 7 THFN

DEF Ml, 2)

P = q

LET A = 6 IF A = 7 THFN DEF A (1,2) THISS

P = 0

LET A = 6 IF A = 7 THEN DEF A(l,2) THIS LIN3

P = 0

LET A = 6

tf A = 7 THEN

DEF A (1,2)

T3

P = Q

LET A = 6 IF A = 7 THEN DEF A(l,2) THIS ??

P = 0

LET * = 6

IF A = 7 'HEN

DEF A(l,2)

THIS LI NFS

P = O

LET A = 6

IF A = 7 THEN

DEF A (1,2)

TH3

P = 0

LET A = 6 IF A = 7 THFN DEF A(l,2) THIS LS

P = 0

LET A = 6 IF A = 7 THFN DEF Ad, 2) THIS LINE 9

P = 0

^3

LET A = 6

IF A = 7 THEN DEF A(l,2) THIS LINF W3!

LET A = 6 P = 0 IF A = 7 THEN! P = OEF A (1,2) THIS LINF WILL S>

LET A = 6

IF A = 7 THEN P =

OEF A (1,21

THIS LINE WILL BE

OS)

LET A = 6 IF A = 7 THEN DEF A(l,2) THIS LINF WIS)

LET A = 6 P = Q IF A = 7 THEN P =

DEF A(l,2) THIS LINF WILL B3

LET A = 6 IF A = 7 THEN DEF A(l,2) THIS LINE WILL

P = 0

BE DE3

LET A = 6

IF A = 7 THFN P

DEF A(l,2)

THIS LINE WHS!

LET A = 6 = 0 IF A = 7 THEN P = I OEF A<1,2) THIS LINE WTLL BES>

LET A = 6 IF A = 7 THEN DEF A(L,2) THIS LINE WILL

P = 0

BE DELS*

LET A = 6

IFA=7THFNP=Q DEF A(L,2) THIS LINF WILL^

LET A = 6

IF A = 7 THEN P =

DEF A(l,2)

THIS LINE WILL BE

LET A = 6 IF A = 7 THEN DEF A(l,2) THIS LINE WILL

P = 0

BE DELE 3

kk

LET A = 6

IF A = 7 THFM DEF A (1,2) THIS LINE WILL

p = 0

BE DELETE

LET A = 6

IF A = 7 THEN

DEF A(l,2)

THIS LINE WILL

R9

P = 0

BE DELE_ED

LET A = 6 IF A = 7 THEN DEF 4(1,2) THIS LIME WILL

p = 0

BE DELETE?

LET A = 6

IF A = 7 THEN

DEF A (1,2)

THIS LINF WILL RES!

P = 0

nE DELETED

LET A = 6

IF A = 7 THFN DEF All, 2) ^HIS LINE WILL

p =

KE DELETED?

LET ft = 6

IF A = 7 THEN

DEF A(L,2)

THIS LINE WILL

REPS

p = 0

«E DELFTED

LET A = 6 IF A = 7 THEN DEF AQ,2) THIS LINE WILL

P = 0

BE DELETED

LET A = 6

IE A = 7 THEN

VEF A(lf2)

THIS LINE WILL

RFPL?

P = Q

BE DFLETED

^

LET A = 6

IF A = 7 THEN

DEF A<1,2)

THIS LINE WILL

REPLAY

P = 0

RE DELETED

LET A = 6

IF A = 7 THEN

DEF A (1,2)

THIS LINE WILL REPLACE R^

P = 0

BE DELETED

LET A = 6

IF A = 7 "THEN

DEF A(l»2)

THIS LINE WILL

REPLACE

P = Q

BE DELETED

LET A = 6 IF A = 7 THEN DEF A(l,2) THIS LINE WILL REPLACE P a

P = 0

BE DELETED

LET A = 6

IF A = 7 THEN

DEF A(l,2)

THIS LINF WILL

REPLACED

P = 0

BE DELETFD

LET A = 6

IF A = 7 THEN

DEF A(l,2)

THIS LINE WILL REPLACE P W3

P = 0

BE DELETED

LET A = 6 TF A = 7 THFN DEF A(L,2) THIS LINE WILL

REPLACE a

P = 0

BE DELETED

LET A = 6 IF A = 7 THEN DEF A(lf2) THIS LINE WILL REPLACE P WIS)

P = 0 BE DELETED

k6

LET A = 6 IF A = 7 THEN DEF A(l,2) THIS LINE WILL REPLACE B WITa

P = 0

PE DELETED

LET A = 6 IF A = 7 THEN DEF A(l,2) THIS LINE WILL REPLACE R WI^h

P = 0

BE DELETED L3

LET A = 6

IF A = 7 THEN P = 0

DEF A(l,2)

THIS LINE WILL BE DELETED

REPLACE P Wl^Hfl!

LET A = 6

IF A = 7 THEN P = 0

DEF A (1,2)

THIS LINE WILL BE DELETED

REPLACE R WI~H L S)

LET A = 6

IF A = 7 THEN DEF Ml, 2)

THIS LINE WILL REPLACE R with

P = 0

BE

DFLETFO

LET A = 6

IF A = 7 THEN P = 0

DEF A(l,2)

THIS LINE WILL BF DELETED

REPLACE R WITH L 3

LET A = 6

IF A = 7 THEN P = 0

DEF A(l,2)

THIS LINF WILL BE DELETED

RE°LACE p WITH 3

LET A = 6

IF A = 7 THEN P = o

DEF A (1,2)

THIS LINE WILL BE DELE REPLACE P WITH L P

TD

k!

LET A = 6

IF A = 7 THEN P = 0

DEE AM, 2)

THIS LINE WILL BE

REPLACE R WITH L

9

DELETED P

LET A = 6

IF A = 7 THEN P = 0

DEF AM, 2)

THIS LINE WILL BE DELETED

REPLACE R WITH L La

LET A = 6

IF A = 7 THEN P = 0

DEF A (1,2)

THIS LINE WILL PE DELETED

REPLACE R WITH L P3

LET A = 6

IF A = 7 THEN

P = Q

DEF AM, 2)

THIS LINE WILL BE DELETED ^REPLACE P WITH L L

LET A = 6

IF A = 7 THEN P = Q

DEF AM, 2)

THIS LINE WILL BE DELETFD REPLACE R WITH L ?DR

LET A = 6 IF A = 7 THE>^ DEF AM, 2) 3THIS LINE WIl L REPLACE P WITH

P = 0

BE

L

DELETED L

LET t = 6

IF A = 7 THEN P = 0

nEF AM, 2)

THIS LINE WILL BE DELETED

REPLACE P WITH L ^P

LET A = 6

IF A = 7 THEN P = 0 DEF AM, 2) ^REPLACE R WITH L

48

LET A = 6

IF A = 7 THEN

OEF A (1,2)

REPLACE R WITH 3

P = 0

LET A = 6

IF A = 7 THEN P = DEF Ad, 2) REPLACE R WITH L

BIBLIOGRAPHIC DATA SHEET

1. Report No.

UIUCDCS-R-7U-658

3. Recipient's Accession No.

4. Title and Subtitle

A BASIC Language Interpreter for the Intel Microprocessor

5. Report Date

June 197^

7. Author(s)

Weaver, A.C., Tindall, M.H. ,

Danielson, R.L.

8- Performing Organization Rept. No. UIUCDCS-R-7U-b50

°. Performing Organization Name and Address

Department of Computer Science

University of Illinois at Urbana-Champaign

Urbana, Illinois 6l801

10. Project/Task/Work Unit Nc

11. Contract/Grant No.

12. Sponsoring Organization Name and Address

Department of Computer Science

University of Illinois at Urbana-Champaign

Urbana, Illinois 6l801

13. Type of Report & Period Covered

14.

IT :»iij pk-mentary Notes

'6. Abstracts

A BASIC language interpreter has been designed for use in a microprocessor environment. This report discusses the development of 1) an elaborate text editor and 2) a table-driven interpreter. The entire system, including text editor, interpreter, user text buffer, and full floating point arithmetic routines fits in l6K 8-bit words.

17. K v Words and Document Analysis. 17a. Descriptors

BASIC language interpreter microprocessor text editor

17b. (Jentificrs Open-Ended Terms

17c. ( OSATI Kield/Group

IK. til.ibility Statement release unlimited

19. Security ( lass (This Report )

UNU.AS.SU-Il.p

20. Sa urn v ( lass ( Th i Page

UN( LASSIFIED

21. No. ol I

k8

ages

M N Til IB I 10- 70)

22. I'n, e

no charge

USCOMM-DC 40'jJI-l' Jl

FEB 1 7 1981