Technical Information




  1. Release Specifications
  2. Implementation
    1. ANS Forth Compliance
    2. Threading Model
    3. Signed Integer Division
    4. Double Numbers
    5. Floating Point Implementation
    6. Special Features
  3. Benchmarks and Tests
  4. VM Error Codes
  5. Source Code Map
  6. Embedding kForth



Release Specifications

The current kForth release is:

Versions: 1.5.2 (x86-linux), 1.4.1 (ppc-osx, and x86-cygwin)
Last Release Date: 2011-03-05
Systems: Linux (x86), Mac OS X (ppc), Windows 98/NT/2000/XP with Cygwin(x86)


Implementation

ANS Forth Compliance

kForth is specified as a subset of the ANS Forth standard, given in DPANS94. Code written for kForth is portable to ANS-compliant Forth systems with the use of trivially defined extensions (see the Special Features section below). The compliance with ANS Forth may be checked using John Hayes' suite of tests for the core words of an ANS Forth system: tester.4th and core.4th. Tests involving unsupported words such as HERE and , and C, have been commented out, as well as tests involving the BEGIN ... WHILE ... WHILE ... REPEAT ... THEN structure, and some weird variants of CREATE and DOES> usage. Compliance with the ANS Forth extension words for working with double length numbers may be checked using dbltest.4th. Tests are commented out for words which are not implemented in kForth.

Threading Model

kForth is an indirect threaded code (ITC) system. The kForth compiler/interpreter parses the input stream into a vector of pseudo op-codes or Forth Byte Code. Upon execution, the vector of byte codes is passed on to a virtual machine which looks up the execution address of the words and performs either a call or an indirect jump to the next execution address. The type of threading used in the virtual machine is a hybrid of indirect call threading and indirect jump threading. The kForth virtual machine is implemented as a mixture of assembly language, C, and C++ functions. Only the assembly language portion of the virtual machine utilizes indirect jump threading.

Signed Integer Division

kForth versions 1.2.10 and earlier implement symmetric integer division. An alternative form of signed integer division is called floored integer division. Both symmetric and floored division yield identical results when the two operands, dividend and divisor, are either both positive integers or both negative integers. However, when the two operands differ in sign, symmetric and floored integer division can give different results. For example,

Floored Division: -8 3 / . -3 ok

Symmetric Division: -8 3 / . 2 ok

Similarly, the word MOD yields different results on floored and symmetric division systems. Under floored division, MOD is truly a modulus operator (i.e. the result of n1 n2 MOD is a number in the range [0, n2)), while under symmetric division, MOD simply returns a remainder. The following paper provides a discussion of integer division in computing languages: Division and Modulus for Computer Scientists by Daan Leijen.

Floored integer division was guaranteed by the Forth-83 standard. However, the DPANS94 standard revoked this guarantee and allowed system implementors to choose either symmetric or floored integer division. The rationale in revoking a fixed standard was to allow Forth systems to implement whatever form of integer division was best supported by the microprocessor hardware. Most microprocessors which provide signed integer division implement symmetric division. In kForth, the original rationale for using symmetric division was simply to maintain consistency with the GNU C implementation, which mandates the use of symmetric integer division per the ISO C99 standard (the symmetric version of MOD corresponds to the % operator in C). In general, floored division is considered by computer scientists and mathematicians to be the more useful form of signed integer division.

A significant problem with the DPANS94 standard is that, in practice, implementors of ANS-compliant Forth systems for a single hardware platform such as Intel x86 have chosen to use different forms of division. Consider the behavior of the Forth systems below, all running under Linux on a Intel PII:

gforth:   -8 3 MOD .  -2 ok
pfe:      -8 3 MOD .   1 ok
kforth:   -8 3 MOD .  -2 ok
iforth:   -8 3 MOD .  -2 ok
bigforth: -8 3 MOD .   1 ok
Therefore, a Forth program using signed integer division words (/ MOD /MOD */MOD) may produce different outputs under two different ANS-compliant Forth systems. The DPANS94 standard addresses the portability issue by calling for use of the explicit floored and symmetric division words FM/MOD and SM/REM whenever it is important to explicitly specify the type of division. However, it is highly likely that Forth programmers will casually use signed integer division words such as MOD without always remembering the portability issue.

Double Numbers

kForth supports working with signed and unsigned double length numbers, and implements nearly all of the optional double number word set specified by DPANS94, either intrinsically or in the form of Forth source definitions (see ans-words.4th for the latter). In addition to the ANS Forth tests involving double numbers given in core.4th, further tests of double number words implemented in kForth are given in dbltest.4th.

One significant departure in kForth from typical Forth systems which provide double numbers is the method of entry of double length numbers. Traditional Forth recognizes the decimal point as a marker for a double number, e.g.

234.

is interpreted as a double number. kForth does not permit double number entry in this manner. The rationale behind this restriction is that such entries may easily be confused with floating point numbers. Such confusion will likely be common for new Forth users who have previously used other computer languages such as C. Even experienced Forth users who make frequent use of floating point calculations are also susceptible to such confusion. Since kForth uses the data stack to hold floating point numbers, and since a floating point number also occupies two stack cells (see next section), mistakes arising from misinterpreting entries with a decimal point may not be as readily apparent, leading to hard-to-find bugs.

The prohibition on standard double number entry in kForth demands that an alternate method be provided for entry of double numbers. This may be easily accomplished by using a string to double number conversion word. There are two ways to accomplish this. The first method is simple, but it is specific to kForth, while the second is more complex, but portable to other ANS systems. In the simple method, we may make use of the non-standard word, NUMBER?, to convert a counted string to a signed double length number, as follows.


c" -20123456789" NUMBER? DROP


NUMBER? actually returns a flag indicating whether or not the conversion succeeded, but we drop the flag in the above example for simplicity. If the conversion did not succeed, a double length zero will result.

The second method should be used if it is desired to port the code to other ANS Forth systems. ANS Forth provides >NUMBER for converting a string to an unsigned double number. A more general string to double number conversion word, handling both signed and unsigned double numbers, may be written as follows.


variable dsign

: >d ( a u -- d|ud | convert string to a signed/unsigned double )
    0 0 2SWAP
    \ skip leading spaces and tabs
    BEGIN OVER C@ DUP BL = SWAP 9 = OR WHILE 1 /STRING REPEAT
    ?DUP IF
	FALSE dsign !
	OVER C@
	CASE
	    [char] - OF TRUE dsign ! 1 /STRING ENDOF
	    [char] + OF 1 /STRING ENDOF
	ENDCASE
	>NUMBER 2DROP
	dsign @ IF DNEGATE THEN
    ELSE DROP THEN ;


Using the above definition of >D, examples of double number entry are:

  s"  20123456789"  >d
  s" -20123456789"  >d
  s" +20123456789"  >d

 

It should be noted that the method used above is not needed if the double number being entered fits within the bounds of a signed single number. Most cases of double number entry fit this scenario. In such a case, we may simply enter the single number, followed by S>D, e.g.

-234         S>D
 2147483647  S>D
-2147483649  S>D


Floating Point Implementation

The ANS Forth specification allows floating point numbers to be stored either on the data stack or on a separate floating point stack. kForth uses the data stack for holding floating point numbers. Even though many current Forth systems for PCs feature a separate floating point stack, the rationale for using the data stack for floating point operations in kForth was to allow legacy code written for earlier Forth systems (in particular the Forths from Laboratory Microsystems Inc.) to run without significant modifications under kForth. In kForth, a floating point number on the stack occupies two cells. Thus, under 32-bit Windows or Linux, floating point numbers are 64-bit double-precision numbers (equivalent to C's double).

The quality of the floating point arithmetic in kForth may be checked using the program, paranoia.4th.

Special Features

Special features of kForth are described in a two-part article in Forthwrite magazine, issues 116 and 117.These features are:



Benchmarks and Tests

Versions of standard benchmark programs for measuring kForth execution speed may be found in the ftp site under /software/kforth/examples/benchmarks.

The following Forth source files provide tests for ANS compliance of core and standard extension words in Forth-94, for words which are specific to kForth, and for floating point arithmetic. Most of the test files require ttester.4th and tester.4th.

core.4th
coreplus.4th
memorytest.4th
filetest.4th
searchordertest.4th
stringtest.4th
dbltest.4th
to-float-test.4th
regress.4th
asm-x86-test.4th
divtest.4th
fatan2-test.4th
ieee-fprox-test.4th
ieee-arith-test.4th
fpzero-test.4th
fpio-test.4th
paranoia.4th


VM Error Codes

Non-zero return codes from the virtual machine (VM) indicate the following conditions:

  1. Value on the stack did not have type addr.
  2. Value on the stack did not have type ival.
  3. Value on the stack has unknown type.
  4. Division by zero.
  5. Return stack has been corrupted.
  6. Invalid kForth op-code encountered.
  7. Stack underflow.
  8. Return code for QUIT (not seen by user).
  9. Attempted to re-ALLOT memory for a word.
  10. Failed on CREATE (bad word name).
  11. End of string not found.
  12. No matching DO.
  13. No matching BEGIN.
  14. ELSE without matching IF.
  15. THEN without matching IF.
  16. ENDOF without matching OF.
  17. ENDCASE without matching CASE.
  18. Cannot open file.
  19. Address outside of stack space.
  20. Division overflow.
Executing the word ABORT will reset the stack pointers. This procedure should be used to recover from VM errors 5 and 7, and whenever there is a suspicion that the stacks have been corrupted.

Source Code Map

Source code for kForth consists of the following C++, C, and assembly language files:

kforth.cpp
ForthCompiler.cpp
ForthVM.cpp
vmc.c
vm-common.s
vm.s
vm-fast.s
fbc.h
ForthWords.h
ForthCompiler.h
ForthVM.h
kfmacros.h

The source code is made available to users under the GNU General Public License. The Linux version is provided as source code only and must be built locally on the user's machine (see installation). Under Linux, the standard GNU assembler, GNU C and C++ compilers, and the C++ Standard Template Library (STL) are required to build the executable. The Windows 95/98/NT console application was built using the free Cygwin port of the GNU development tools.



Embedding kForth

The file kforth.cpp serves as a skeleton C++ program to illustrate how the kForth compiler and virtual machine may be embedded in a standalone program. XYPLOT for Linux is a more complex GUI program which embeds kForth to allow user extensibility. The file xyplot.cpp shows how to set up hooks for calling C++ functions in the host program from the embedded kForth interpreter and vice-versa.