THE BISC ARCHITECTURE

Let me describe My Favourite Processor Architecture (MFPA). If you don't
like Intel processors, stop reading now, unless you find them too orthogonal
and you dislike their relative lack of complexity. If you wish the good
features of VAX, IBM370, Prime and Intel were combined into one single
machine, then I think you line MFPA.

I dub this architecture BISC (Beautiful Instruction Set Computer). CISC is
for processors whose instruction set description fits into a single volume.
Some people suggest that the B should stand for BLOATed where BLOAT means
Bulky Large Overhead Automation Technology, but I don't think this
description fits my elegant architecture very well.

What makes my design unique is that I designed it from scratch _as if_ it
had been the result of a long evolution. You know, all good processors are
the result of a long evolution. When the PR1ME 200 first arrived, it was a
16 bit machine that could address 16 kilowords of memory. Two bits of the
address were reserved for special purposes. Later machines had special modes
that dropped one or even two of the reserved address bits, so they could
address up to 64 kilowords. Later machines added virtual memory mode and
still later designs had full 32-bit data and address capabilities, of course
with all earlier modes still there. It is supposed that you are familiar
with the line of 8086 -> 80286 -> 80386 -> 80486 -> Pentium processors.

In my favourite processor architecture, you will find that the skill of
assembly language programming is rewarding, albeit with a very steep
learning curve. If you _know_ what instructions you have and in which modes
they are available, you never need to write long programs. You know you have
instructions for checking the castling rules in chess and for conversion of
numbers to Roman numerals. OK, you need to convert the bastardized EBCDIC
back to ASCII, but that's all there is to it.
 

1 MODES MAKE FLEXIBILITY, EXTENSIBILITY AND COMPATIBILITY

Does it ever occur to you? You have an excellent computer design with a very
elegant instruction set. One day you decide that you want more precision,
more memory, more instructions or all of the above. But you already used
nearly all opcodes and memory addresses and you must keep the new design
compatible with the old version. Then a miracle happens and you add an
Extended Mode. Use one unused opcode or status flag to switch to the
Extended Mode and you have all instructions and memory you want. Back to the
old mode? Why should you? You can always press reset to do this!

1.1 A LITTLE HISTORY FICTION.

As I said before, good architectures come with a long evolution. Such
designs have a rich history. As I designed the whole design from scratch, I
had to make up the history myself. I'm not a very good fiction writer, so
this history is a bit boring. But so are most real histories of great
computers.

The history of MFPA starts in, guess what?, Sweden! In the late fifties one
of the first transistorized computers, SABINA, was built there for the
execution of programs written in 'kvikkalkul'. So far the real part of the
story, if one can trust the veracity of an anonymous posting anyway. SABINA
was a 15 bit design with 14 bit addressability. One address bit was for
indirection.  Not that the machine had 16 kilowords of core anyway! I'd
guess 1 or 2k. It used one's complement number representation. On overflow
it would generate the special -0 value with all bits set. The multiply and
divide instructions were optimized for fractional numbers. It had a hardware
random generator on board that was too unreliable for any practical use. It
was a single accumulator design, the 10 kvikkalkul 'registers' were just
memory variables. SABINA's instructions had no mnemonics, just numbers.

1.2 BD and MBD modes

Around 1960 the English army buys the SABINA design from Sweden. They want
to do real work with it and they add more core than SABINA ever had. They
call it the Ballistic Data computer, hence the BD mode. The BD computer has
the same instruction set as SABINA, but the hardware random generator is a
bit more reliable. The English even devise a set of Assembly mnemonics for
it. In 1961 they decide they need more memory and they could also use a bit
of extra precision. They add the MBD mode, 'More Ballistic Data'. This mode
has 16 bit data and the memory addressability is extended by Data Object
Groups (or DOGs). There are two DOG registers, one usually used for programs
and one for variables. The 16th address bit serves as the DOG register
selector. The DOG registers are 16 bits, one specifies the direction and the
other 15 specify the DOG start address, which could be any multiple of 8
within a 256 kiloword space. Two index registers were added and you could
address with autoincrementing and (by selecting a DOG that runs in the
reverse direction) autodecrementing.

1.3 SM Mode

In 1965 it's time for a bit more serious computing. They design a 25 bit
computer with a radically different architecture from BD. Numbers were
sign-magnitude instead of ones complement. A 50-bit floating point format
was added. Addresses were 18 bit, so 256 kilowords could be accessed. They
don't get the money to build it, because the higher management wants an
improved BD. Finally the new design is implemented as an extended mode on
top of MBD. The concept of DOGs (though unnecessary at the moment) is added
to SM (SM means sign-magnitude) mode as well. The normal address size is
reduced to 15 bits and 4 DOG registers are added. The DOGs do not start at
an 8 bit boundary, but at boundaries whose space increases with the value of
the DOG register (varies from 1 to 32768).

1.4 SBD mode and IAP, FAP, CAP, JAP and LAP application program modes.

In 1969 the British Army decides that they want to go for 32 bit computing.
They also want multitasking. They design a glorious new machine on top of
the existing stuff. The end result is the SBD mode, which is fully 32 bit.
The address registers have a size of 16 bits and the memory is
byte-addressable. The SBD mode is a system programming mode. DOGs have a
size up to 65536 bytes and they can start at any address. There is a set of
kennel descriptor tables that store the start addresses and sizes of the
DOGs. Each program has its own local kennel descriptor table. The programs
themselves do not run in the SBD mode, but in any of five application
program modes. The SBD mode runs the kernel and at each context switch the
CPU mode is also switched. In 1971 demand paging is added to this mode.
In 1972 fat DOGs are added, which can contain up to 16 megabytes.

The IAP mode is for integer applications written in assembler. It has a
two's complement number representation. It is the only mode that has ADD
with Carry instructions, bitwise logical operations etc, the things that
make assembly programming so attractive.

The FAP mode is for FORTRAN applications. It has special instructions for
polynomial root calculation and Bessel functions. There is even a SIMPSN
instruction for numerical integration. The functions for formatted I/O use
the CDC6600 character set.

The CAP mode is for COBOL. It has multiprecision BCD arithmetic instructions
and lots of file I/O instructions. There are also instruction for LSD
arithmetic (pounds, shillings, pennies). It is ideal for accounting and
payroll administration, but not for much else. This mode uses the BBCDIC
character set, a British version of EBCDIC.

The JAP mode is for Jovial programs. Jovial is a British Algol dialect. The
floating point format is different from that of the FAP mode, which is again
different from the SM mode. This is the only mode that supports a frame
pointer to access local variables in recursive functions.

The LAP mode is for Lisp applications. This is the only 36 bit mode. CAR,
CDR and CONS are all machine instructions in this mode. As this mode is 36
bit instead of 32, the SBD mode operating system cannot access the topmost
four bits of each machine word. It cannot even load any LAP mode machine
code as the top most 4 bits are essential for that. Months after the machine
had been built, someone discovered that in the SM mode a shift instruction
with a memory operand shifted the whole 36 bits instead of just the 25
lowest bits and thus it could access the topmost 4 bits. An SM mode program
had to load a LAP mode code loader, then the SBD mode could run LISP
programs.

1.5 CS and CG mode.

The hot topic in the seventies becomes, guess what, AI. Due to bureaucratic
reasons that permit the development of BD compatible machines only, they
implement the Cognitive Science mode on top of the LAP mode. This mode is
also 36 bits, but one cannot switch back to any other mode. Chess specific
instructions are part of this mode and the British computer was the best
chess player in the world in 1973. Famous are the instructions 'vcasp'
(verify castling permission) and 'capep' (capture en passant). In 1975 a new
unit of memory allocation is introduced, the CAT (Certain Arbitrary Thing).
The CS mode inherits all the DOGs from the LAP mode application from which
it was entered and no new DOGs can be added. But now a DOG can be subdivided
into several CATs. 

In 1977 they decide to make a more useful CS-like mode. The chess
instructions are removed and string instructions are added. This is called
the CG mode. CG stands for catgut, cause that's what strings are made of. 

1.6 TLA Mode.

In 1981 it's time for something more useful. The TLA mode is introduced. TLA
means 'The Last Architecture'. They decide to clean up all the errors of
previous models in the BD line. From the TLA mode one can jump back to the
BD mode again. One can jump from CG to TLA and one can jump back and forth
between SBD and TLA mode. Finally one can reach every mode from every other
mode. TLA is a hybrid 32/64 bit mode. All 36 bits of the lisp-like modes
are accessible from it. Memory management features both DOGs and CATs (plus
COW 'copy on write' pages). The DR (dung remover) manages the free memory.
There is only one instruction set for both the system and applications. As C
became the language of choice, C specific instructions such as 'vsprintf'
and 'vsscanf' were introduced. They even used ASCII! Nowadays they are
useless again due to the new ANSI C standard. Floating point is standardized
to a preliminary version of IEEE 754 in which the denorms and NANs are just
a bit different.

1.7 GUI, AC and RISC modes.

TLA was not really the last architecture. In 1988 they decide it is time for
some serious graphics. They introduce the GUI mode, in which you can find
instructions to draw windows, to scroll bitmaps, to draw lines, etc. All for
a monochrome screen of 760x570 pixels. The proposed 'tetris' instruction was
voted down by a very small majority, but some tetrisoid block drawing
instructions did make it. This mode is little used today, as it is
restricted to monochrome and one resolution.

In 1990, MFPA enters the nineties with super computing. The AC mode
(advanced computing) is added. This mode is truly 64 bit. This mode would be
ideal for finite element computations, SPICE simulations, ray tracing etc.
were it not for (a) the lack of speed and (b) a few bugs that cause the
result of the division operation to be a bit inaccurate for some operands.
Some programs use EPC (error precompensation) and this seals the fate for
this mode. It has to remain bug compatible in next generations.

Finally in 1993, MFPA follows Intel in the RISC race. RISC isn't that boring
after all. What you lose in raw instruction set complexity, you can gain in
pipeline dependencies. MIPS (Microprocessor without Interlocked Pipeline
Stages) has lots of interesting rules as to which instructions you can
combine within 1, 2 or 3 cycles. In some RISC machines the pipeline stalls
if you put two instructions too close together. This waste cycles. In MIPS
(or in MFPA) one can often combine interdependent instructions if one knows
exactly what one is doing. Sometimes the result depends on a race condition
that is temperature dependent. The program 'softherm' is a thermometer in
software that measures the temperature by examining the chance of one
particular race condition happening. If will be a true challenge to provide
an exact compatible mode in future generations of the chip that even runs
the softherm program correctly.

1.8 Summary.

Eager to see the complete picture? Here it is in figure 1. Everything
revolves around the TLA mode. Both the OS and most applications are written
in it. RISC is increasing in popularity. GUI and AC never caught on. Chess
programs are faster in RISC than they were in CS and it is too much trouble
to switch back and forth between both. Therefore the chess instructions see
very little use today. An SBD mode system with FAP and CAP applications can
run under the usual TLA system. BD, MBD and SM mode are for booting only,
though some programs use the true hardware random generator of BD mode. 

                                    AC Mode
                    GUI Mode <------- ^    ------> RISC Mode
                                    | |    |
                                    v v    v
                                --- TLA Mode <-------------
                                |    ^                    |
Reset                           |    |                    |
  |------------------------------    |                    |
  v                                  v                    |
BD Mode -> MBD Mode -> SM Mode -> SBD Mode <-------     CG Mode
                                  ^   ^   ^   |   |       ^
                                  |   |   |   |   |       |
                                  v   v   v   v   v       |
                                 IAP FAP CAP JAP LAP -> CS Mode
                                 
Fig 1: Mode Transition Diagram.

2 CATS AND DOGS HELP STRUCTURED PROGRAMMING AND DATA SECURITY.

When the SABINA architecture came to England, it was not equipped with DOGs.
They say this was to avoid the six month quarantine period. Just the address
was enough to designate a memory location. With MBD the Data Object Group
(DOG) concept was introduced. Addresses were still small, but by putting
them in different DOGs one could span a larger range. The DOGs were chained
to fixed addresses, 8 words apart. For each DOG there was another DOG that
spanned the same address range but that ran in the reverse direction.
Addresses were inverted before they were added to the DOG base. 
In SM mode, DOGs were still chained, but VDS (variable DOG spacing) was 
introduced. DOG 0 and DOG 1 were 1 word apart, but the highest numbered DOGs
were 32678 words apart. Thus one could be very memory efficient and one
could span a large range. Chained DOGs helped structured programming, but
they did not offer much protection.

With SBD mode and the application modes under it, this changed completely.
Each DOG had its own start address and a length. A program could only access
the DOGs in its own kennel descriptor table and hence it could not access
the memory of other programs. The start addresses and lengths of the DOGs
could be changed by the OS, so the DOGs were no longer chained. They were
called floating DOGs. 

Application programs could not themselves create new DOGs. This was a pain
in the CS and CG mode programs, which inherited the DOGs of the LAP mode
program that switched to CS mode, but which could not call the SBD system to
make new DOGs for them. To fix this, the CAT (Certain Arbitrary Thing) was
introduced. Within one DOG, several CATs could be created by the
application. CATs offered a very limited degree of protection but it was
better than nothing. 

When TLA mode was added, both DOGs and CATs were retained. Thus the new
mode had hierarchic segmentation, which is not a bad thing to have. TLA
mode had protection rings, like the 386, to offer more protection. As 
opposed to the SBD mode, this was needed because applications now ran
in TLA mode as well and not in some application specific mode. Of course
there were also Task State DOGs and Colgates in this mode. The best
protection, however, was offered by STO, Security Through Obscurity.

The GUI, AC and RISC modes are again application specific submodes under
TLA. Programs in these modes inherit a certain set of DOGs. Depending on the 
protection ring they are started from, they can or cannot create DOGs or
CATs of their own. 

3 PERSONALIZED REGISTERS HELP OPTIMIZATION.

What's in your tool chest? Twenty Swiss-army knives that you can use as 
screwdrivers, pliers, saws or anything else you need? I guess not! 
General-purpose tools are not a good idea and neither are general-purpose 
registers. 

This text would be too long if I discussed all registers of all modes, so
I only discuss the register set for the TLA mode. Registers of different modes
are partially overlapping. The ALBERT register of TLA is known as PAULA in the
SBD mode. 

The following registers are available to application programs in TLA. Note
that the register set is not very homogeneous. Every register is tailored to
a dedicated function and if it's multipurpose, it serves two or three
purposes at best.

Male Registers.

ALBERT          The address register, 32 bits. 
BRIAN           The base address, 32 bits. 
COLIN           The Count register, 32 bits.
DICK            Floating point data register, 64 bits.
EDWARD          The extended data register, 64 bits.
FRED            The frame pointer 32 bits 

Female Registers.

ANNIE           The accumulator, 32 bits.
BETTY           The byte register, 8 bits.
CHRISTINE       The code register, 16 bits.
DIANA           The Destination address, 32 bits.
ELIZA           The string pointer register, 32 bits.
FLORENCE        The floating point accumulator, 64 bits..

DOG Registers, all 16 bits.

ROTWEILER       Data DOG    16 bits
SHEPHERD        Stack DOG   16 bits
TERRIER         Program DOG  16 bits

Other Registers.

PUKE            Program Counter.     32 bits.
FAINT           Flags Register.      17 bits.
ITCH            Instruction register 24.5 bits.
SICK            Stack pointer        32 bits.

CAT Registers. (designed by a cat hater?)

FUR             Floating point CAT.  13 bits
BONES           Byte CAT.            15 bits
DOGFOOD         Data CAT.            14 bits

Most instructions accept only one or two different registers for each of 
their operands. This keeps the code size down and it relieves the programmer
or compiler writer from having to choose among many registers. 

Add and subtract can only take registers from the same set, multiply and divide
can only take registers from different sets. They are only permitted
on male and female registers anyway. Floating point addition is only permitted
between DICK and EDWARD, but not between FLORENCE and any other register.
You can add memory operands to FLORENCE, but their addresses have to come from
the female register set. Only ELIZA will do, as DIANA can only be used
for writing. FLORENCE is the only true FP accumulator so you pretty much
_have_ to use it if you want status information (overflow detection and
such). The DICK/EDWARD addition won't generate status flags or INFs or NANs on
overflow. The bottom line is this: for frequently used instructions such as
FPADD, you don't have to take the trouble to make a sensible choice of
registers; there's only one. For FPMUL and FPDIV you can choose from the
male instruction set for the other operand, so you can use DICK, EDWARD
and the addresses in ALBERT, BRIAN and FRED. But you really don't need these
instructions that often so you don't have to make this difficult choice
often. 

4 APPLICATION ORIENTED INSTRUCTIONS KEEP PROGRAMMING COST DOWN.

In the long and rich history of MFPA many, many instructions were
introduced. You know that a programmer can write 20 lines of code per day
(including such unimportant things as design, debugging and testing). A
numerical integration routine using Simpson's rule could easily take 10 lines
of FORTRAN, maybe 100 lines of Assembly. But in MFPA it's only 1
instruction! So instead of a whole programmer's week it's just 25 minutes!

You need to print Roman numerals? CRN is in the CAP mode. You must compute
the root of a complex polynomial using Newton's method? It's in the FAP
mode. The JAP mode has a single instruction for Ackermann's function, but
that one does cheat. You need a chess program? Half of the gory subroutines
are already there in the CS mode. Need some serious graphics? Use GUI mode!
Choosing the right mode for YOUR application is half the fun of MFPA
programming. If you're really good and you KNOW how to switch modes really
fast without using the operating system, then you can use more modes in one
program.  I've yet to see a program that combines the best of the RISC and
the CS modes to beat the socks of those dinky human chess players such as
Kasparov.

Of course most of the work is done in the TLA and RISC modes today, but 
good old COBOL and FORTRAN programs still feel best at home in the CAP and
FAP modes, unless they are memory hogs. If you know what your errors are
like, the AC mode can be really cool for numerical applications.  
Some of the best compilers run in the CG mode. Some of the string and list
processing instructions are put to really creative uses, far beyond their
original intentions. You probably won't believe that the DSW (Deutsch Schorr
Waite) instruction is at the heart of a FORTRAN optimizer! Some people who
know the GUI mode very well, once used it for some very specific matrix
operations that they needed for numerical work.

Even BD, SBD and SM modes aren't yet dead, but they are a bit awkward to use
and you have to know the back doors of your OS pretty damn well to ever get
there. Chained DOGs and multiple indirection can make a lot of speed on
modern machines. We all lost that long ago with the SBD mode. Every DOG
register load will read a complete kennel descriptor table entry which slows
things down. And there is a true hardware random generator in the BD mode. 

The instruction set descriptions of MFPA comprise 12 volumes at the moment. 
There's some more volumes for the secret features and opcodes. Some
instructions are described by fifty-odd lines of COBOL or FORTRAN without
any hint towards their function. It's here that the real programmer really
can save time and money for the boss. 

5 KNOW WHAT YOU WANT WITH THOSE 100 MILLION TRANSISTORS OF 1998.

MFPA does not fit into a single chip. It's more like 20 chips with 2 million
transistors each. It is a bit expensive for the average desk top, but that
may radically change in as little of 3 years. The estimated maximum
transistor count for 1998 is 100 million. That's 2.5 times as much as MFPA
will need to fit on a single chip. We can have more than twice the
complexity of MFPA by 1998 in an ordinary PC and the question is what will we 
use the extra complexity for. Do we want basically the same instruction set
architecture but with superscalar execution? Or do we go for more bizarre
modes? We could patch a pentium and a 68040 in a tiny corner of the chip,
just to make it Windows/DOS/Mac compatible. But we also could use the same
space for special instructions to draw "Abort Retry Ignore" windows under
the then current version of Windows '95. 

Multimedia will certainly be part of our life in 1998, but what do we need
in our CPU to support it? I guess we will have dedicated video hardware for
all kinds of display manipulation and hence we won't need loads of graphics
instructions in our CPU. But maybe I'm completely mistaken at that. What we
do need is lots of AI, even if it were only for speech recognition. But it's
uncertain if we need a souped up LISP Application Mode, a fast RISC mode or
a speech-specific mode for it.

It would be wise though to reserve a tiny piece of the chip area for a copy
of the Amiga chipset, because Amiga emulators will be in great demand for a
few years to come and, as opposed to many other computers, Amigas are
impossible to emulate without a dedicated chipset. 

Maybe we will go for parallel computers. One could think of multiple
parallel RISC machines, each with a dedicated cache, that can all be started
up from the TLA mode and run concurrently with it. It would be really cool
to exploit all the features of such an arrangement. Apart from the problems
of parallel computers in general, you have the possibility of going from the
TLA mode to a different mode while some of the RISC applications are still
running with their inherited TLA mode DOGs and CATs that don't really exist
in BD mode.

Maybe the rage will be supercomputing instead of RISC. The AC mode is
broken, but nothing prevents us from adding yet another souped up advanced
computing mode. And why not in the form of several parallel machines?
Of course with the same interesting consequences as parallel RISC machines. 

Of course we need a power saving mode as well. Even a lousy pentium has it,
so there. Without a power saving mode the next generation MFPA will run so
hot that the 'softherm' program would be trivial to implement. It only needs
to return the maximum temperature it can measure. That would be no challenge
for the preservation of compatibility with previous designs of the RISC mode. 

All in all, there's enough room to improve upon MFPA for the 1998 single
chip version.