THE BISC ARCHITECTURE Let me describe My Favourite Processor Architecture (MFPA). If you don't like Intel processors, stop reading now, unless you find them too orthogonal and you dislike their relative lack of complexity. If you wish the good features of VAX, IBM370, Prime and Intel were combined into one single machine, then I think you line MFPA. I dub this architecture BISC (Beautiful Instruction Set Computer). CISC is for processors whose instruction set description fits into a single volume. Some people suggest that the B should stand for BLOATed where BLOAT means Bulky Large Overhead Automation Technology, but I don't think this description fits my elegant architecture very well. What makes my design unique is that I designed it from scratch _as if_ it had been the result of a long evolution. You know, all good processors are the result of a long evolution. When the PR1ME 200 first arrived, it was a 16 bit machine that could address 16 kilowords of memory. Two bits of the address were reserved for special purposes. Later machines had special modes that dropped one or even two of the reserved address bits, so they could address up to 64 kilowords. Later machines added virtual memory mode and still later designs had full 32-bit data and address capabilities, of course with all earlier modes still there. It is supposed that you are familiar with the line of 8086 -> 80286 -> 80386 -> 80486 -> Pentium processors. In my favourite processor architecture, you will find that the skill of assembly language programming is rewarding, albeit with a very steep learning curve. If you _know_ what instructions you have and in which modes they are available, you never need to write long programs. You know you have instructions for checking the castling rules in chess and for conversion of numbers to Roman numerals. OK, you need to convert the bastardized EBCDIC back to ASCII, but that's all there is to it. 1 MODES MAKE FLEXIBILITY, EXTENSIBILITY AND COMPATIBILITY Does it ever occur to you? You have an excellent computer design with a very elegant instruction set. One day you decide that you want more precision, more memory, more instructions or all of the above. But you already used nearly all opcodes and memory addresses and you must keep the new design compatible with the old version. Then a miracle happens and you add an Extended Mode. Use one unused opcode or status flag to switch to the Extended Mode and you have all instructions and memory you want. Back to the old mode? Why should you? You can always press reset to do this! 1.1 A LITTLE HISTORY FICTION. As I said before, good architectures come with a long evolution. Such designs have a rich history. As I designed the whole design from scratch, I had to make up the history myself. I'm not a very good fiction writer, so this history is a bit boring. But so are most real histories of great computers. The history of MFPA starts in, guess what?, Sweden! In the late fifties one of the first transistorized computers, SABINA, was built there for the execution of programs written in 'kvikkalkul'. So far the real part of the story, if one can trust the veracity of an anonymous posting anyway. SABINA was a 15 bit design with 14 bit addressability. One address bit was for indirection. Not that the machine had 16 kilowords of core anyway! I'd guess 1 or 2k. It used one's complement number representation. On overflow it would generate the special -0 value with all bits set. The multiply and divide instructions were optimized for fractional numbers. It had a hardware random generator on board that was too unreliable for any practical use. It was a single accumulator design, the 10 kvikkalkul 'registers' were just memory variables. SABINA's instructions had no mnemonics, just numbers. 1.2 BD and MBD modes Around 1960 the English army buys the SABINA design from Sweden. They want to do real work with it and they add more core than SABINA ever had. They call it the Ballistic Data computer, hence the BD mode. The BD computer has the same instruction set as SABINA, but the hardware random generator is a bit more reliable. The English even devise a set of Assembly mnemonics for it. In 1961 they decide they need more memory and they could also use a bit of extra precision. They add the MBD mode, 'More Ballistic Data'. This mode has 16 bit data and the memory addressability is extended by Data Object Groups (or DOGs). There are two DOG registers, one usually used for programs and one for variables. The 16th address bit serves as the DOG register selector. The DOG registers are 16 bits, one specifies the direction and the other 15 specify the DOG start address, which could be any multiple of 8 within a 256 kiloword space. Two index registers were added and you could address with autoincrementing and (by selecting a DOG that runs in the reverse direction) autodecrementing. 1.3 SM Mode In 1965 it's time for a bit more serious computing. They design a 25 bit computer with a radically different architecture from BD. Numbers were sign-magnitude instead of ones complement. A 50-bit floating point format was added. Addresses were 18 bit, so 256 kilowords could be accessed. They don't get the money to build it, because the higher management wants an improved BD. Finally the new design is implemented as an extended mode on top of MBD. The concept of DOGs (though unnecessary at the moment) is added to SM (SM means sign-magnitude) mode as well. The normal address size is reduced to 15 bits and 4 DOG registers are added. The DOGs do not start at an 8 bit boundary, but at boundaries whose space increases with the value of the DOG register (varies from 1 to 32768). 1.4 SBD mode and IAP, FAP, CAP, JAP and LAP application program modes. In 1969 the British Army decides that they want to go for 32 bit computing. They also want multitasking. They design a glorious new machine on top of the existing stuff. The end result is the SBD mode, which is fully 32 bit. The address registers have a size of 16 bits and the memory is byte-addressable. The SBD mode is a system programming mode. DOGs have a size up to 65536 bytes and they can start at any address. There is a set of kennel descriptor tables that store the start addresses and sizes of the DOGs. Each program has its own local kennel descriptor table. The programs themselves do not run in the SBD mode, but in any of five application program modes. The SBD mode runs the kernel and at each context switch the CPU mode is also switched. In 1971 demand paging is added to this mode. In 1972 fat DOGs are added, which can contain up to 16 megabytes. The IAP mode is for integer applications written in assembler. It has a two's complement number representation. It is the only mode that has ADD with Carry instructions, bitwise logical operations etc, the things that make assembly programming so attractive. The FAP mode is for FORTRAN applications. It has special instructions for polynomial root calculation and Bessel functions. There is even a SIMPSN instruction for numerical integration. The functions for formatted I/O use the CDC6600 character set. The CAP mode is for COBOL. It has multiprecision BCD arithmetic instructions and lots of file I/O instructions. There are also instruction for LSD arithmetic (pounds, shillings, pennies). It is ideal for accounting and payroll administration, but not for much else. This mode uses the BBCDIC character set, a British version of EBCDIC. The JAP mode is for Jovial programs. Jovial is a British Algol dialect. The floating point format is different from that of the FAP mode, which is again different from the SM mode. This is the only mode that supports a frame pointer to access local variables in recursive functions. The LAP mode is for Lisp applications. This is the only 36 bit mode. CAR, CDR and CONS are all machine instructions in this mode. As this mode is 36 bit instead of 32, the SBD mode operating system cannot access the topmost four bits of each machine word. It cannot even load any LAP mode machine code as the top most 4 bits are essential for that. Months after the machine had been built, someone discovered that in the SM mode a shift instruction with a memory operand shifted the whole 36 bits instead of just the 25 lowest bits and thus it could access the topmost 4 bits. An SM mode program had to load a LAP mode code loader, then the SBD mode could run LISP programs. 1.5 CS and CG mode. The hot topic in the seventies becomes, guess what, AI. Due to bureaucratic reasons that permit the development of BD compatible machines only, they implement the Cognitive Science mode on top of the LAP mode. This mode is also 36 bits, but one cannot switch back to any other mode. Chess specific instructions are part of this mode and the British computer was the best chess player in the world in 1973. Famous are the instructions 'vcasp' (verify castling permission) and 'capep' (capture en passant). In 1975 a new unit of memory allocation is introduced, the CAT (Certain Arbitrary Thing). The CS mode inherits all the DOGs from the LAP mode application from which it was entered and no new DOGs can be added. But now a DOG can be subdivided into several CATs. In 1977 they decide to make a more useful CS-like mode. The chess instructions are removed and string instructions are added. This is called the CG mode. CG stands for catgut, cause that's what strings are made of. 1.6 TLA Mode. In 1981 it's time for something more useful. The TLA mode is introduced. TLA means 'The Last Architecture'. They decide to clean up all the errors of previous models in the BD line. From the TLA mode one can jump back to the BD mode again. One can jump from CG to TLA and one can jump back and forth between SBD and TLA mode. Finally one can reach every mode from every other mode. TLA is a hybrid 32/64 bit mode. All 36 bits of the lisp-like modes are accessible from it. Memory management features both DOGs and CATs (plus COW 'copy on write' pages). The DR (dung remover) manages the free memory. There is only one instruction set for both the system and applications. As C became the language of choice, C specific instructions such as 'vsprintf' and 'vsscanf' were introduced. They even used ASCII! Nowadays they are useless again due to the new ANSI C standard. Floating point is standardized to a preliminary version of IEEE 754 in which the denorms and NANs are just a bit different. 1.7 GUI, AC and RISC modes. TLA was not really the last architecture. In 1988 they decide it is time for some serious graphics. They introduce the GUI mode, in which you can find instructions to draw windows, to scroll bitmaps, to draw lines, etc. All for a monochrome screen of 760x570 pixels. The proposed 'tetris' instruction was voted down by a very small majority, but some tetrisoid block drawing instructions did make it. This mode is little used today, as it is restricted to monochrome and one resolution. In 1990, MFPA enters the nineties with super computing. The AC mode (advanced computing) is added. This mode is truly 64 bit. This mode would be ideal for finite element computations, SPICE simulations, ray tracing etc. were it not for (a) the lack of speed and (b) a few bugs that cause the result of the division operation to be a bit inaccurate for some operands. Some programs use EPC (error precompensation) and this seals the fate for this mode. It has to remain bug compatible in next generations. Finally in 1993, MFPA follows Intel in the RISC race. RISC isn't that boring after all. What you lose in raw instruction set complexity, you can gain in pipeline dependencies. MIPS (Microprocessor without Interlocked Pipeline Stages) has lots of interesting rules as to which instructions you can combine within 1, 2 or 3 cycles. In some RISC machines the pipeline stalls if you put two instructions too close together. This waste cycles. In MIPS (or in MFPA) one can often combine interdependent instructions if one knows exactly what one is doing. Sometimes the result depends on a race condition that is temperature dependent. The program 'softherm' is a thermometer in software that measures the temperature by examining the chance of one particular race condition happening. If will be a true challenge to provide an exact compatible mode in future generations of the chip that even runs the softherm program correctly. 1.8 Summary. Eager to see the complete picture? Here it is in figure 1. Everything revolves around the TLA mode. Both the OS and most applications are written in it. RISC is increasing in popularity. GUI and AC never caught on. Chess programs are faster in RISC than they were in CS and it is too much trouble to switch back and forth between both. Therefore the chess instructions see very little use today. An SBD mode system with FAP and CAP applications can run under the usual TLA system. BD, MBD and SM mode are for booting only, though some programs use the true hardware random generator of BD mode. AC Mode GUI Mode <------- ^ ------> RISC Mode | | | v v v --- TLA Mode <------------- | ^ | Reset | | | |------------------------------ | | v v | BD Mode -> MBD Mode -> SM Mode -> SBD Mode <------- CG Mode ^ ^ ^ | | ^ | | | | | | v v v v v | IAP FAP CAP JAP LAP -> CS Mode Fig 1: Mode Transition Diagram. 2 CATS AND DOGS HELP STRUCTURED PROGRAMMING AND DATA SECURITY. When the SABINA architecture came to England, it was not equipped with DOGs. They say this was to avoid the six month quarantine period. Just the address was enough to designate a memory location. With MBD the Data Object Group (DOG) concept was introduced. Addresses were still small, but by putting them in different DOGs one could span a larger range. The DOGs were chained to fixed addresses, 8 words apart. For each DOG there was another DOG that spanned the same address range but that ran in the reverse direction. Addresses were inverted before they were added to the DOG base. In SM mode, DOGs were still chained, but VDS (variable DOG spacing) was introduced. DOG 0 and DOG 1 were 1 word apart, but the highest numbered DOGs were 32678 words apart. Thus one could be very memory efficient and one could span a large range. Chained DOGs helped structured programming, but they did not offer much protection. With SBD mode and the application modes under it, this changed completely. Each DOG had its own start address and a length. A program could only access the DOGs in its own kennel descriptor table and hence it could not access the memory of other programs. The start addresses and lengths of the DOGs could be changed by the OS, so the DOGs were no longer chained. They were called floating DOGs. Application programs could not themselves create new DOGs. This was a pain in the CS and CG mode programs, which inherited the DOGs of the LAP mode program that switched to CS mode, but which could not call the SBD system to make new DOGs for them. To fix this, the CAT (Certain Arbitrary Thing) was introduced. Within one DOG, several CATs could be created by the application. CATs offered a very limited degree of protection but it was better than nothing. When TLA mode was added, both DOGs and CATs were retained. Thus the new mode had hierarchic segmentation, which is not a bad thing to have. TLA mode had protection rings, like the 386, to offer more protection. As opposed to the SBD mode, this was needed because applications now ran in TLA mode as well and not in some application specific mode. Of course there were also Task State DOGs and Colgates in this mode. The best protection, however, was offered by STO, Security Through Obscurity. The GUI, AC and RISC modes are again application specific submodes under TLA. Programs in these modes inherit a certain set of DOGs. Depending on the protection ring they are started from, they can or cannot create DOGs or CATs of their own. 3 PERSONALIZED REGISTERS HELP OPTIMIZATION. What's in your tool chest? Twenty Swiss-army knives that you can use as screwdrivers, pliers, saws or anything else you need? I guess not! General-purpose tools are not a good idea and neither are general-purpose registers. This text would be too long if I discussed all registers of all modes, so I only discuss the register set for the TLA mode. Registers of different modes are partially overlapping. The ALBERT register of TLA is known as PAULA in the SBD mode. The following registers are available to application programs in TLA. Note that the register set is not very homogeneous. Every register is tailored to a dedicated function and if it's multipurpose, it serves two or three purposes at best. Male Registers. ALBERT The address register, 32 bits. BRIAN The base address, 32 bits. COLIN The Count register, 32 bits. DICK Floating point data register, 64 bits. EDWARD The extended data register, 64 bits. FRED The frame pointer 32 bits Female Registers. ANNIE The accumulator, 32 bits. BETTY The byte register, 8 bits. CHRISTINE The code register, 16 bits. DIANA The Destination address, 32 bits. ELIZA The string pointer register, 32 bits. FLORENCE The floating point accumulator, 64 bits.. DOG Registers, all 16 bits. ROTWEILER Data DOG 16 bits SHEPHERD Stack DOG 16 bits TERRIER Program DOG 16 bits Other Registers. PUKE Program Counter. 32 bits. FAINT Flags Register. 17 bits. ITCH Instruction register 24.5 bits. SICK Stack pointer 32 bits. CAT Registers. (designed by a cat hater?) FUR Floating point CAT. 13 bits BONES Byte CAT. 15 bits DOGFOOD Data CAT. 14 bits Most instructions accept only one or two different registers for each of their operands. This keeps the code size down and it relieves the programmer or compiler writer from having to choose among many registers. Add and subtract can only take registers from the same set, multiply and divide can only take registers from different sets. They are only permitted on male and female registers anyway. Floating point addition is only permitted between DICK and EDWARD, but not between FLORENCE and any other register. You can add memory operands to FLORENCE, but their addresses have to come from the female register set. Only ELIZA will do, as DIANA can only be used for writing. FLORENCE is the only true FP accumulator so you pretty much _have_ to use it if you want status information (overflow detection and such). The DICK/EDWARD addition won't generate status flags or INFs or NANs on overflow. The bottom line is this: for frequently used instructions such as FPADD, you don't have to take the trouble to make a sensible choice of registers; there's only one. For FPMUL and FPDIV you can choose from the male instruction set for the other operand, so you can use DICK, EDWARD and the addresses in ALBERT, BRIAN and FRED. But you really don't need these instructions that often so you don't have to make this difficult choice often. 4 APPLICATION ORIENTED INSTRUCTIONS KEEP PROGRAMMING COST DOWN. In the long and rich history of MFPA many, many instructions were introduced. You know that a programmer can write 20 lines of code per day (including such unimportant things as design, debugging and testing). A numerical integration routine using Simpson's rule could easily take 10 lines of FORTRAN, maybe 100 lines of Assembly. But in MFPA it's only 1 instruction! So instead of a whole programmer's week it's just 25 minutes! You need to print Roman numerals? CRN is in the CAP mode. You must compute the root of a complex polynomial using Newton's method? It's in the FAP mode. The JAP mode has a single instruction for Ackermann's function, but that one does cheat. You need a chess program? Half of the gory subroutines are already there in the CS mode. Need some serious graphics? Use GUI mode! Choosing the right mode for YOUR application is half the fun of MFPA programming. If you're really good and you KNOW how to switch modes really fast without using the operating system, then you can use more modes in one program. I've yet to see a program that combines the best of the RISC and the CS modes to beat the socks of those dinky human chess players such as Kasparov. Of course most of the work is done in the TLA and RISC modes today, but good old COBOL and FORTRAN programs still feel best at home in the CAP and FAP modes, unless they are memory hogs. If you know what your errors are like, the AC mode can be really cool for numerical applications. Some of the best compilers run in the CG mode. Some of the string and list processing instructions are put to really creative uses, far beyond their original intentions. You probably won't believe that the DSW (Deutsch Schorr Waite) instruction is at the heart of a FORTRAN optimizer! Some people who know the GUI mode very well, once used it for some very specific matrix operations that they needed for numerical work. Even BD, SBD and SM modes aren't yet dead, but they are a bit awkward to use and you have to know the back doors of your OS pretty damn well to ever get there. Chained DOGs and multiple indirection can make a lot of speed on modern machines. We all lost that long ago with the SBD mode. Every DOG register load will read a complete kennel descriptor table entry which slows things down. And there is a true hardware random generator in the BD mode. The instruction set descriptions of MFPA comprise 12 volumes at the moment. There's some more volumes for the secret features and opcodes. Some instructions are described by fifty-odd lines of COBOL or FORTRAN without any hint towards their function. It's here that the real programmer really can save time and money for the boss. 5 KNOW WHAT YOU WANT WITH THOSE 100 MILLION TRANSISTORS OF 1998. MFPA does not fit into a single chip. It's more like 20 chips with 2 million transistors each. It is a bit expensive for the average desk top, but that may radically change in as little of 3 years. The estimated maximum transistor count for 1998 is 100 million. That's 2.5 times as much as MFPA will need to fit on a single chip. We can have more than twice the complexity of MFPA by 1998 in an ordinary PC and the question is what will we use the extra complexity for. Do we want basically the same instruction set architecture but with superscalar execution? Or do we go for more bizarre modes? We could patch a pentium and a 68040 in a tiny corner of the chip, just to make it Windows/DOS/Mac compatible. But we also could use the same space for special instructions to draw "Abort Retry Ignore" windows under the then current version of Windows '95. Multimedia will certainly be part of our life in 1998, but what do we need in our CPU to support it? I guess we will have dedicated video hardware for all kinds of display manipulation and hence we won't need loads of graphics instructions in our CPU. But maybe I'm completely mistaken at that. What we do need is lots of AI, even if it were only for speech recognition. But it's uncertain if we need a souped up LISP Application Mode, a fast RISC mode or a speech-specific mode for it. It would be wise though to reserve a tiny piece of the chip area for a copy of the Amiga chipset, because Amiga emulators will be in great demand for a few years to come and, as opposed to many other computers, Amigas are impossible to emulate without a dedicated chipset. Maybe we will go for parallel computers. One could think of multiple parallel RISC machines, each with a dedicated cache, that can all be started up from the TLA mode and run concurrently with it. It would be really cool to exploit all the features of such an arrangement. Apart from the problems of parallel computers in general, you have the possibility of going from the TLA mode to a different mode while some of the RISC applications are still running with their inherited TLA mode DOGs and CATs that don't really exist in BD mode. Maybe the rage will be supercomputing instead of RISC. The AC mode is broken, but nothing prevents us from adding yet another souped up advanced computing mode. And why not in the form of several parallel machines? Of course with the same interesting consequences as parallel RISC machines. Of course we need a power saving mode as well. Even a lousy pentium has it, so there. Without a power saving mode the next generation MFPA will run so hot that the 'softherm' program would be trivial to implement. It only needs to return the maximum temperature it can measure. That would be no challenge for the preservation of compatibility with previous designs of the RISC mode. All in all, there's enough room to improve upon MFPA for the 1998 single chip version.