In the early 1980s it appeared that conventional CPUs were reaching their performance limits. Up to this point in time designers had been limited primarily by the amount of circuitry they could place on a chip due to manufacturing issues. But as the "fabbing" process continued to improve, soon the problem became that the chips could hold more circuitry than the designers knew how to use. Soon the traditional CISC designs were reaching a performance plateau, and it wasn't clear it could be surpassed.
It appeared that the only way forward was to increase the use of parallelism, the use of several CPUs that would work together to solve several tasks at the same time. This depended on the machines in question being able to run several tasks at once, a process known as multitasking. Multitasking had generally been too difficult for previous CPU designs to handle, but more recent designs were able to run it effectively. It was clear that in the future this would be a feature of all operating systems.
A side effect of most multitasking design is that it often also allows the processes to be run on physically different CPUs, in which case it is known as multiprocessing. A low-cost CPU built for multiprocessing in mind could allow the speed of a machine to be increased by adding additional CPUs, potentially for far less money than adding a single faster CPU design.
The Transputer (Transistor Computer) was the first general purpose microprocessor designed specifically to be used in parallel computing systems. The goal was to produce a family of chips ranging in power and cost that would then be wired together to form a complete computer. The name was selected to indicate the role the individual Transputers would play, numbers of them would be used as basic building blocks, just as transistors had earlier.
Originally the plan was to make the Transputer cost only a few dollars per unit. INMOS saw them being used for practically everything, from operating as the main CPU for a computer, to acting as a channel controller for disk drives in the same machine. Spare cycles on any of these Transputers could be used for other tasks, greatly increasing the overall performance of the machines.
Even a single Transputer would have all the circuitry needed to work by itself, a feature more commonly associated with microcontrollers. The idea in this case was to allow the Transputers to be connected together as easily as possible, without the requirement for a complex bus (or motherboard). Instead you simply supplied power and a simple clock signal, you did not have to provide RAM, a RAM controller, bus support or even an RTOS—these were all built in.
The basic design of the Transputer included serial links that allowed it to communicate with up to four other Transputers, each at 5, 10 or 20Mbps—which was very fast for the 1980s. Any number of transputers could be connected together over even longish links (tens of meters) to form a single computing "farm". A typical desktop machine might have two of the "low end" Transputers handling I/O tasks on some of their serial lines (hooked up to appropriate hardware) while they talked to one of their larger cousins acting as a CPU on another. Transputers could be booted over the network links (as opposed to the memory as in most machines) so a single Transputer could start up the entire network.
There were limits to the size of a system that could be built in this fashion. Since each Transputer was linked to another Transputer in a fixed point-to-point layout, sending messages to a more distant Transputer required the messages to be forwarded off by each chip on the line. This introduced a delay with every "hop" over a link, leading to long delays on large nets. To solve this problem INMOS also provided a zero-delay switch that connected up to 32 Transputers (or switches) into even larger networks.
Supporting the links was additional circuitry that handled scheduling of the traffic over them. Processes waiting on communications would automatically pause while the networking circuitry finished its reads or writes. Other processes running on the Transputer would then be given that processing time. It included two priority levels to help avoid deadlocks. The same logical system was used to communicate between programs running on a single Transputer, implemented as "virtual network links" in memory. So programs asking for any input or output automatically paused while the operation completed, a task that normally required the operating system to handle as the arbiter of hardware. Operating systems on the Transputer did not have to handle scheduling, in fact, one could consider the chip itself to have an OS inside it.
In order to include all this functionality on a single chip, the Transputer's core logic was simpler than most CPUs. It used a RISC-based design, but unlike the more common register-heavy load-store RISC CPUs, the Transputer was a stack-based system with only a few registers. This allowed for very fast context switching by simply changing the stack pointer to the memory used by another program (a technique used in a number of contemporary designs). The Transputer also included three "normal" registers, but they were in fact mirrors of the top three stack positions, used to allow for zero-address instructions.
The Transputer instruction set was comprised of 8-bit instructions broke into two nibbles. The "upper" nibble contained the instruction code, making it truly RISC with only 16 basic instructions. The "lower" nibble contained data, either as a constant or, more commonly, as an offset into the stack pointer. Larger constants and offsets could be used, but they required additional bytes of address to be fetched and decoded. Additional less frequently needed instructions were supported via the Operate (Opr) instruction code, which decoded the data constant as an extended opcode, providing for almost endless and easy instruction set expansion as newer implementations of the Transputer were introduced. Processes with smaller contexts thus ran faster, but the whole idea of the Transputer was to run many small processes anyway.
Transputers were typically programmed using the Occam programming language. In fact it is fair to say that the Transputer was built specifically to run Occam, in much the same fashion that contemporary CISC designs were built to run languages like Pascal or C. Occam supported thread-style tasks in the language, and in most cases simply writing a program in Occam resulted in a threaded application. With the task support and communications built into the chip and the language interacting with it directly, writing code for things like device controllers became a triviality—even the most basic code could watch the serial ports for I/O, and would automatically sleep when there was no data.
The first Transputer models were the 16-bit T212 and the 32-bit T414, announced in 1983 and released in 1984/5. In keeping with their role as microcontroller-like devices, they included 2kB of RAM and a built in RAM controller which allowed you to add more memory without any additional hardware. Unlike other designs the Transputers did not include I/O lines, this was to be added with hardware attached the existing serial links. Nor did the Transputer include an MMU, although in a stack based system this isn't terribly important, as addresses are almost always offsets and don't require complex translation.
The next major version was the T800 in 1987, which included a 64-bit floating point unit and three additional registers for floating point use. it also increased the RAM to 4k. Several new generations of all of these CPUs, known as the T-2, T-4 and T-8 families respectively, were released over the next few years to improve programming and debugging.
While the Transputer was simple, but powerful, compared to many contemporary designs, it never came close to meeting its goals to be used universally in both CPU and microcontroller roles. In the microcontroller realm the market was dominated by 8-bit machines and cost was the only serious consideration. Here even the T-2s were too powerful and expensive for most users. The Transputer's lack of support for virtual memory inhibited the porting of mainstream variants of the UNIX operating system, though ports of UNIX-like operating systems (such as Minix and IDRIS from Real Time Systems) were produced.
In the desktop/workstation world the Transputer was fairly fast, operating at about 10 MIPS at 20MHz. This was excellent performance for the early 1980s, but by the time the FPU-equipped T800 was shipping, other RISC designs had already surpassed it. This could have been mitigated to a large extent if machines used multiple Transputers, but the T800 cost about $400 each when introduced, so the price/performance ratio wasn't there.
INMOS attempted to correct this with the introduction of the T9000. The T9000 shared most features with the T800, but moved several pieces of the design into hardware, and added several features for superscalar support. Unlike the earlier models, the T9000 had a true 16kB high speed cache instead of RAM, but also allowed it to be used as memory and included MMU-like functionality to handle all of this (known as the PMI). For additional speed the 9000 cached the top 32 locations on the stack, instead of three as in earlier versions.
The 9000 used a five stage pipeline for added speed. An interesting addition was the grouper which would collect instructions out of the stack and group them into larger packages of 4 bytes to feed the pipeline faster. Groups then completed in a single cycle, as if they were single larger instructions working on a faster CPU.
The link system was upgraded to a new 100MHz mode, but unlike the previous systems the links were no longer downwardly compatible. The 9000 also added new networking hardware called the VCP which changed the links from point-to-point to a true network, which allowed for the creation of any number of virtual channels on the links and meant programs no longer had to be aware of the physical layout of the connections.
Long delays in the T9000's development meant that the faster load-store designs were already outperforming it by the time it was to be released. In fact it consistently failed to reach its own performance goal of besting the T800 by ten times, when the project was finally cancelled it was still only about 36 MIPS at 50MHz. The production delays gave rise to the quip that the best host architecture for a T9000 was an overhead projector.
This was too much for INMOS, who didn't have the funding needed to continue development. The company was sold to SGS-Thomson (now ST Microelectronics ) who cancelled the 9000 and produced the 212/414's design for microcontroller lines as the ST10 and ST20 families. These are no longer built, although parts of the technology are included in special-purpose chipsets (a GPS set for instance).
Ironically it was largely through additional parallelism that conventional CPU designs got faster. Instead of using a heavyweight explicit system like the Transputer, modern CPU designs are parallel only at the instruction level, looking at the code being run and then distributing what it can be sure of across a fixed number of cores. Nevertheless it appears this form of parallelism, known as superscalar, is much more suitable to general purpose computing.
See also:
The naming often includes a hyphen, "T-414" for instance.Background
Design
Implementations
Note: