asm-x86 is an assembler written in Forth, for use from within a Forth environment. Since Forth itself is usually interactive, the asm-x86 assembler may be considered an interactive assembly language programming environment. The version provided with kForth may be used to create words using the 80x86 instruction set. Such words may be executed interactively, or used within the definitions of words written in Forth, just as with any ordinary high-level Forth definitions. Words defined in assembly language provide the advantage of speed and precise control of the processor, and in certain cases, simplicity, when compared with the higher level Forth definitions. Some notes on the development of the asm-x86 assembler are given in the comments at the beginning of asm-x86.4th.
Some basic familiarity with assembly language programming on Intel x86 and compatible processors, and the architecture of these processors, is assumed for this introduction. However, a few of these basics will be covered in the tutorial for using asm-x86.
asm-x86 is loaded just as any other Forth program from kforth:
Assembly language word definitions begin with CODE
and end
with END-CODE
. Between these two statements are the assembly
language statements which explicitly define the processor instructions to be
performed by the word. The words CODE
and END-CODE
also introduce some instructions for the purpose of creating an interface
between the Forth environment and the assembly code, for example in setting
the value of the EBX machine register to point to the top of the stack upon
entry into the word. Thus, the structure of word written for asm-x86 is
of the form
CODE name assembler statement " " " " : : END-CODE
name
is the name of the word appearing in the
dictionary. The new word may be executed from the Forth environment, just like
an ordinary Forth word, and arguments may be passed to it on the Forth data stack.
We will refer to such words as "CODE words".
An assembler statement consists of zero, one, or more operands, and one
machine instruction, i.e. a built-in operation of the 80x86 processor.
asm-x86 is a postfix assembler, meaning that the operands are specified
before the instruction. This is consistent with the stack-oriented nature of
Forth. An operand may be a machine register, a memory
reference, or an immediate value. Ultimately an operand
is, of course, a numbers but within the context of a particular instruction,
the associated number indicates which physical source supplies the value of the
operand. Much of the work of the assembler is then to translate the specified
operands into the appropriate numbers, for a given instruction. These numbers
form the machine code which is stored in the memory associated with the
word, and which is actually executed by the processor when the word is used.
An example of a typical assembler statement is
4 [ebx] eax mov,
8B 43 04
Notice that in our example of an assembler statement above, the instruction is
"MOV," with the comma being part of the instruction. In asm-x86, all
instructions have the comma suffix. Also, note the order of our
operands. Even though there are three operands, the first two operands specify
a source, and the third a destination. In asm-x86, the order of
operands is such that the source precedes the destination:
source destination instruction
eax ebx add,
As alluded to earlier, the Forth environment must place its stack pointer in a
location accessible to the assembler statements, allowing a CODE word access
to arguments passed to it on the Forth data stack. Although one might use a
variable in which to store the stack pointer, it is much more convenient and
faster to place the stack pointer in a CPU register. The particular register
is EBX, and the address of the top of the stack (TOS) may be assumed to be stored in
this register upon entry into the CODE word:
With the above preliminaries, we are now in a position to illustrate some
examples of acutal CODE words, and discuss further the specification of
operands in asm-x86. Some examples are taken from
asm-x86-examples.4th, provided with kForth.
CODE adrop ( n -- | drop an item from the Forth stack using assembly code ) 4 # ebx add, END-CODEThe above example is equivalent to the Forth word DROP. It consists of a single assembler statement, which adds the immediate value 4 ( 1 cells ) to the EBX register, thereby advancing the stack pointer. The assembler word "#" is used to inform the assembler that "4" is an operand of type immediate value, and its role is to ensure that the operand is not confused for some other type such as a register, or a memory reference.
We may also use ordinary Forth CONSTANTs and VARIABLEs to supply immediate
values:
1 CELLS CONSTANT TCELL CODE adrop ( n -- ) TCELL # ebx add, END-CODE
Now, define a variable "V".
VARIABLE v
v # edx mov,
v #@ edx mov,