kForth Technical Information

Technical Information

Release Specifications
Implementation
Benchmarks and Tests
VM Error Codes
Source Code Map
Embedding kForth

Release Specifications

The current kForth release is:

Versions: 1.5.2 (x86-linux), 1.4.1 (ppc-osx, and x86-cygwin)
Last Release Date: 2011-03-05
Systems: Linux (x86), Mac OS X (ppc), Windows 98/NT/2000/XP with Cygwin(x86)

Implementation

ANS Forth Compliance

kForth is specified as a subset of the ANS Forth standard, given in DPANS94. Code written for kForth is portable to ANS-compliant Forth systems with the use of trivially defined extensions (see the Special Features section below). The compliance with ANS Forth may be checked using John Hayes' suite of tests for the core words of an ANS Forth system: tester.4th and core.4th. Tests involving unsupported words such as HERE and , and C, have been commented out, as well as tests involving the BEGIN ... WHILE ... WHILE ... REPEAT ... THEN structure, and some weird variants of CREATE and DOES> usage. Compliance with the ANS Forth extension words for working with double length numbers may be checked using dbltest.4th. Tests are commented out for words which are not implemented in kForth.

Threading Model

kForth is an indirect threaded code (ITC) system. The kForth compiler/interpreter parses the input stream into a vector of pseudo op-codes or Forth Byte Code. Upon execution, the vector of byte codes is passed on to a virtual machine which looks up the execution address of the words and performs either a call or an indirect jump to the next execution address. The type of threading used in the virtual machine is a hybrid of indirect call threading and indirect jump threading. The kForth virtual machine is implemented as a mixture of assembly language, C, and C++ functions. Only the assembly language portion of the virtual machine utilizes indirect jump threading.

Signed Integer Division

kForth versions 1.2.10 and earlier implement symmetric integer division. An alternative form of signed integer division is called floored integer division. Both symmetric and floored division yield identical results when the two operands, dividend and divisor, are either both positive integers or both negative integers. However, when the two operands differ in sign, symmetric and floored integer division can give different results. For example,

Floored Division: -8 3 / . -3 ok

Symmetric Division: -8 3 / . 2 ok

Similarly, the word MOD yields different results on floored and symmetric division systems. Under floored division, MOD is truly a modulus operator (i.e. the result of n1 n2 MOD is a number in the range [0, n2)), while under symmetric division, MOD simply returns a remainder. The following paper provides a discussion of integer division in computing languages: Division and Modulus for Computer Scientists by Daan Leijen.

Floored integer division was guaranteed by the Forth-83 standard. However, the DPANS94 standard revoked this guarantee and allowed system implementors to choose either symmetric or floored integer division. The rationale in revoking a fixed standard was to allow Forth systems to implement whatever form of integer division was best supported by the microprocessor hardware. Most microprocessors which provide signed integer division implement symmetric division. In kForth, the original rationale for using symmetric division was simply to maintain consistency with the GNU C implementation, which mandates the use of symmetric integer division per the ISO C99 standard (the symmetric version of MOD corresponds to the % operator in C). In general, floored division is considered by computer scientists and mathematicians to be the more useful form of signed integer division.

A significant problem with the DPANS94 standard is that, in practice, implementors of ANS-compliant Forth systems for a single hardware platform such as Intel x86 have chosen to use different forms of division. Consider the behavior of the Forth systems below, all running under Linux on a Intel PII:

gforth:   -8 3 MOD .  -2 ok
pfe:      -8 3 MOD .   1 ok
kforth:   -8 3 MOD .  -2 ok
iforth:   -8 3 MOD .  -2 ok
bigforth: -8 3 MOD .   1 ok

Therefore, a Forth program using signed integer division words (/ MOD /MOD */MOD) may produce different outputs under two different ANS-compliant Forth systems. The DPANS94 standard addresses the portability issue by calling for use of the explicit floored and symmetric division words FM/MOD and SM/REM whenever it is important to explicitly specify the type of division. However, it is highly likely that Forth programmers will casually use signed integer division words such as MOD without always remembering the portability issue.

Double Numbers

kForth supports working with signed and unsigned double length numbers, and implements nearly all of the optional double number word set specified by DPANS94, either intrinsically or in the form of Forth source definitions (see ans-words.4th for the latter). In addition to the ANS Forth tests involving double numbers given in core.4th, further tests of double number words implemented in kForth are given in dbltest.4th.

One significant departure in kForth from typical Forth systems which provide double numbers is the method of entry of double length numbers. Traditional Forth recognizes the decimal point as a marker for a double number, e.g.

234.

is interpreted as a double number. kForth does not permit double number entry in this manner. The rationale behind this restriction is that such entries may easily be confused with floating point numbers. Such confusion will likely be common for new Forth users who have previously used other computer languages such as C. Even experienced Forth users who make frequent use of floating point calculations are also susceptible to such confusion. Since kForth uses the data stack to hold floating point numbers, and since a floating point number also occupies two stack cells (see next section), mistakes arising from misinterpreting entries with a decimal point may not be as readily apparent, leading to hard-to-find bugs.

The prohibition on standard double number entry in kForth demands that an alternate method be provided for entry of double numbers. This may be easily accomplished by using a string to double number conversion word. There are two ways to accomplish this. The first method is simple, but it is specific to kForth, while the second is more complex, but portable to other ANS systems. In the simple method, we may make use of the non-standard word, NUMBER?, to convert a counted string to a signed double length number, as follows.


c" -20123456789" NUMBER? DROP

NUMBER? actually returns a flag indicating whether or not the conversion succeeded, but we drop the flag in the above example for simplicity. If the conversion did not succeed, a double length zero will result.

The second method should be used if it is desired to port the code to other ANS Forth systems. ANS Forth provides >NUMBER for converting a string to an unsigned double number. A more general string to double number conversion word, handling both signed and unsigned double numbers, may be written as follows.


variable dsign

: >d ( a u -- d|ud | convert string to a signed/unsigned double )
    0 0 2SWAP
    \ skip leading spaces and tabs
    BEGIN OVER C@ DUP BL = SWAP 9 = OR WHILE 1 /STRING REPEAT
    ?DUP IF
	FALSE dsign !
	OVER C@
	CASE
	    [char] - OF TRUE dsign ! 1 /STRING ENDOF
	    [char] + OF 1 /STRING ENDOF
	ENDCASE
	>NUMBER 2DROP
	dsign @ IF DNEGATE THEN
    ELSE DROP THEN ;

Using the above definition of >D, examples of double number entry are:


  s"  20123456789"  >d
  s" -20123456789"  >d
  s" +20123456789"  >d

It should be noted that the method used above is not needed if the double number being entered fits within the bounds of a signed single number. Most cases of double number entry fit this scenario. In such a case, we may simply enter the single number, followed by S>D, e.g.


-234         S>D
 2147483647  S>D
-2147483649  S>D

Floating Point Implementation

The ANS Forth specification allows floating point numbers to be stored either on the data stack or on a separate floating point stack. kForth uses the data stack for holding floating point numbers. Even though many current Forth systems for PCs feature a separate floating point stack, the rationale for using the data stack for floating point operations in kForth was to allow legacy code written for earlier Forth systems (in particular the Forths from Laboratory Microsystems Inc.) to run without significant modifications under kForth. In kForth, a floating point number on the stack occupies two cells. Thus, under 32-bit Windows or Linux, floating point numbers are 64-bit double-precision numbers (equivalent to C's double).

The quality of the floating point arithmetic in kForth may be checked using the program, paranoia.4th.

Special Features

Special features of kForth are described in a two-part article in Forthwrite magazine, issues 116 and 117.These features are:

The kForth dictionary is dynamically allocated as new definitions are added. Thus kForth does not implement a monolithic, fixed size dictionary, but can use as much memory as provided by the host operating system. Several side effects result from using dynamic memory allocation to grow the dictionary:
- There is no HERE address in kForth.
- There is no , (comma operator) in kForth.
- There is no C, operator in kForth.
Owing to the fact that HERE does not exist, the word ALLOT not only allocates the requested amount of memory, but also has the non-standard behavior that it assigns the address of the new memory region to the parameter field address (PFA) of the last defined word. In kForth, the use of ALLOT must always be preceeded by the use of CREATE. A variant of ALLOT, named ?ALLOT is also provided. ?ALLOT has the same behavior as ALLOT plus it returns the start address of the dynamically allocated region on the parameter stack. ?ALLOT has the following equivalent definition under ANS Forth:

: ?ALLOT ( u -- a ) HERE SWAP ALLOT ;

?ALLOT is particularly useful in writing defining words in the absence of HERE and the comma operators. For example, to write your own integer constant defining word:

: CONST ( n -- ) CREATE 4 ?ALLOT ! DOES> @ ;

or to write an address constant defining word (see below):

: PTR ( a -- ) CREATE 4 ?ALLOT ! DOES> A@ ;
kForth maintains type stacks corresponding to both the data and return stacks. The type stacks contain a type code for each corresponding data stack cell or return stack cell. This allows kForth to perform some rudimentary type checking, for example when an address is being accessed kForth verifies that the value's type is that of an address. Address values that are stored in variables must be retrieved with the word A@ instead of @ so that the type can be validated. Code written for kForth may be ported to other ANS Forth implementations by defining A@ as follows:

: A@ @ ;
Unlike a conventional Forth interpreter which executes each token as it is interpreted, kForth continues to build up a vector of byte codes, until a keyword or end of line in the input stream necessitates execution. Deferred execution in interpreter mode is implemented by extending the normal concept of precedence in Forth. Instead of a single precedence-bit associated with each word, kForth uses a precedence-byte having two significant bits to describe the behavior of each word in both compiled and interpreted modes. Thus, a word may have one of four possible precedence values:

0 not IMMEDIATE DEFERRED
1 IMMEDIATE DEFERRED
2 not IMMEDIATE NONDEFERRED
3 IMMEDIATE NONDEFERRED

To understand the execution behavior of a word in each of these states, it is helpful to view a table of execution modes for each precedence value and for the two compilation states: interpret and compile. We define the following execution modes:
- E0 -- no execution, the opcode for the word is compiled into the opcode vector.
- E1 -- execute current opcode vector up to and including current opcode.
- E2 -- execute only current opcode and remove it from the opcode vector.
Precedence Interpret Compile
0 E0 E0
1 E2 E2
2 E1 E0
3 E1 E2

The ability to defer execution in interpreter mode allows "one-liners" to be executed from the kForth prompt without having to define a word. For example, the following line can be typed directly at the kForth prompt:

10 0 do i . loop
Ordinary Forth interpreters do not allow do-loop, begin-while-repeat, and if-then structures to occur outside of word definitions. kForth can interpret and execute such structures as long as they are completed on a single line of input.

Words which are NONDEFERRED are those for which interpretation of the rest of the input line will depend on the execution of the word. Thus, the following intrinsic words in kForth have the nondeferred precedence attribute:

\ .( BINARY DECIMAL HEX
WORD ' CREATE FORGET COLD
ALLOT ?ALLOT CONSTANT FCONSTANT VARIABLE
FVARIABLE CHAR >FILE CONSOLE

Only in very special cases will it be necessary for a programmer to use the NONDEFERRED keyword to set explicitly the interpretation precedence of a word. This is due to the automatic inheritance of the nondeferred attribute: if a word definition includes a nondeferred word, then the new word is automatically nondeferred also. Thus, for example, any word which has a definition including WORD is also a nondeferred word. Another example is a defining word, i.e. one which uses CREATE. Since CREATE is nondeferred the new defining word is also nondeferred.

The most common case in which the NONDEFERRED keyword should be explicitly used is in the definition of a word which changes the number base. For example,

DECIMAL : BASE3 3 BASE ! ; NONDEFERRED BASE3 21

If BASE3 was not declared to be a nondeferred word, then 21 in the above line would be interpreted as decimal 21 rather than as decimal 7 (which is 21 in base 3).
kForth can be started up in debug mode using the command line switch -D. Compiled op-codes and other debugging information are displayed in this mode. It is useful primarily for programmers interested in extending and debugging their own versions of kForth.

Benchmarks and Tests

Versions of standard benchmark programs for measuring kForth execution speed may be found in the ftp site under /software/kforth/examples/benchmarks.

The following Forth source files provide tests for ANS compliance of core and standard extension words in Forth-94, for words which are specific to kForth, and for floating point arithmetic. Most of the test files require ttester.4th and tester.4th.

core.4th
coreplus.4th
memorytest.4th
filetest.4th
searchordertest.4th
stringtest.4th
dbltest.4th
to-float-test.4th
regress.4th
asm-x86-test.4th
divtest.4th
fatan2-test.4th
ieee-fprox-test.4th
ieee-arith-test.4th
fpzero-test.4th
fpio-test.4th
paranoia.4th

VM Error Codes

Non-zero return codes from the virtual machine (VM) indicate the following conditions:

Value on the stack did not have type addr.
Value on the stack did not have type ival.
Value on the stack has unknown type.
Division by zero.
Return stack has been corrupted.
Invalid kForth op-code encountered.
Stack underflow.
Return code for QUIT (not seen by user).
Attempted to re-ALLOT memory for a word.
Failed on CREATE (bad word name).
End of string not found.
No matching DO.
No matching BEGIN.
ELSE without matching IF.
THEN without matching IF.
ENDOF without matching OF.
ENDCASE without matching CASE.
Cannot open file.
Address outside of stack space.
Division overflow.

Executing the word ABORT will reset the stack pointers. This procedure should be used to recover from VM errors 5 and 7, and whenever there is a suspicion that the stacks have been corrupted.

Source Code Map

Source code for kForth consists of the following C++, C, and assembly language files:

kforth.cpp ForthCompiler.cpp ForthVM.cpp vmc.c vm-common.s vm.s vm-fast.s fbc.h ForthWords.h ForthCompiler.h ForthVM.h kfmacros.h
The source code is made available to users under the GNU General Public License. The Linux version is provided as source code only and must be built locally on the user's machine (see installation). Under Linux, the standard GNU assembler, GNU C and C++ compilers, and the C++ Standard Template Library (STL) are required to build the executable. The Windows 95/98/NT console application was built using the free Cygwin port of the GNU development tools.

Embedding kForth

The file kforth.cpp serves as a skeleton C++ program to illustrate how the kForth compiler and virtual machine may be embedded in a standalone program. XYPLOT for Linux is a more complex GUI program which embeds kForth to allow user extensibility. The file xyplot.cpp shows how to set up hooks for calling C++ functions in the host program from the embedded kForth interpreter and vice-versa.

0	not IMMEDIATE	DEFERRED
1	IMMEDIATE	DEFERRED
2	not IMMEDIATE	NONDEFERRED
3	IMMEDIATE	NONDEFERRED

\	.(	BINARY	DECIMAL	HEX
WORD	'	CREATE	FORGET	COLD
ALLOT	?ALLOT	CONSTANT	FCONSTANT	VARIABLE
FVARIABLE	CHAR	>FILE	CONSOLE

Precedence	Interpret	Compile
0	E0	E0
1	E2	E2
2	E1	E0
3	E1	E2