Main Page | See live article | Alphabetical index

Acorn RISC Machine

The Acorn RISC Machine (or ARM) is a RISC processor architecture that is widely used in a number of applications. It is a very "pure" RISC implementation, and is considered one of the most elegant modern processors.

Table of contents
1 History
2 Design notes

History

The ARM design was started in 1983 as a project at Acorn, Ltd. After being refused access to the upcoming Intel 80286 for newer generations of their computer line, they responded by starting up a team to design and build a new RISC based CPU, known as the Acorn RISC Machine.

The team, led by Roger Wilson and Steve Furber, started development of what in some ways represents an advanced MOS Technologies 6502. Acorn had a long line of computers based on the 6502, so a chip that was similar to program could represent a significant advantage for the company.

The team completed development samples called ARM1 by 1985, and the first "real" production systems as ARM2 the following year. The ARM2 featured a 32-bit data bus and 26-bit address bus, with 16 registers. The ARM2 was possibly the simplest useful processor in the world, with only 30,000 transistors (compare with the four-year older Motorola 68000's 68,000). Much of this simplicity comes from not having microcode (which represents about 1/4 to 1/3rd of the 68000) and (like most CPU's of the day) not including any cache. This simplicity leads to its excellent low-power needs, and yet it performed better than the 286.

In the late 1980s Apple Computer started working with Acorn on newer versions of the ARM core. The work was so important that Acorn spun off the design team in 1990, and is now a part of Advanced RISC Machines. For this reason you often see ARM lengthened to Advanced RISC Machine instead of Acorn RISC Machine.

This work would eventually turn into the ARM6, which made the ARM design a true 32-bit CPU, while otherwise remaining similar to earlier models. The first models were released in 1991, and Apple used the ARM6-based ARM 610 as the basis for their Apple Newton PDA. The latest specification is ARM10 from 1998, which adds floating point support and 32 registers.

The core has remained largely the same size throughout these changes. ARM2 had 30,000 transistors, while the ARM6 grew to only 35,000. The idea is that the end-user combines the ARM core with a number of optional parts to produce a complete CPU, one that can be built on old fabs and still deliver lots of performance at a low cost.

DEC licensed the design (which caused some confusion because they also produced the DEC Alpha) and produced the StrongARM. At 233MHz this CPU drew only 1 watt of power (more recent versions draw far less). This work was later passed to Intel as a part of a lawsuit settlement, and Intel took the opportunity to replace their ailing i860 and i960 designs with the StrongARM. Today these are known by the name XScale.

Motorola, IBM, Texas Instruments and Atmel have also licensed the basic ARM design for various uses. The ARM chip has become one of the most used CPU designs in the world, found in everything from hard drives, to mobile phones, to routers. Today it accounts for over 75% of all 32-bit embedded CPU's.

Design notes

The ARM instruction set follows the 6502 in concept, but includes a number of features designed to allow the CPU to better pipeline them for execution. In keeping with traditional RISC concepts, this included tuning the commands to execute in well-defined times, typically one cycle. A more interesting addition to the ARM design is the use of a 4-bit condition code on the front of every instruction, meaning that every instruction can be made a conditional.

This cuts down significantly on the space available for, for example, displacements in memory access instructions, but on the other hand it does make it possible to avoid branch instructions when generating code for small if statements. The standard example of this is Euclid's GCD algorithm:

(This example is in the C programming language)

int
gcd(int i, int j)
{
   while (i != j) {
      if (i > j)
          i -= j;
      else
          j -= i;
   }
   return i;
} 

Expressed in ARM assembly, the loop, with a little rotation, might look something like

       b      test
loop   subgt  Ri,Ri,Rj
       suble  Rj,Rj,Ri
test   cmp    Ri,Rj
       bne    loop

which avoids the branches around the then and else clause that one would typically have to emit.

Another unique feature of the instruction set is the ability to fold shifts and rotates into the "data processing" (arithmetic, logical, and register-register move) instructions, so that, for example, the C statement "a += (j << 2);" could be rendered as a single instruction on the ARM, register allocation permitting.

This results in the typical ARM program being denser than what would normally be expected of a RISC processor. This implies that there is less need for load/store operations and that the pipeline is being used more efficiently. Even though the ARM runs at what many would consider to be low speeds, it nevertheless competes quite well with much more complex CPU designs.

The ARM processor also has some features rarely seen on other architectures that are considered RISC, such as PC-relative addressing (indeed, on the ARM the PC is one of its 16 registers) and pre- and post-increment addressing modes.

Perhaps in part because of the conditional execution facility using up four bits of every instruction, recent ARM processors have a 16-bit instruction mode, called THUMB. This is intended to allow smaller code where possible.

Another item of note is that the ARM has been around for a while, with the instruction set increasing somewhat over time. Some ARM processors, for example, have no instruction to load a two-byte quantity, so that, strictly speaking, for them it's not possible to generate code that would behave the way one would expect for C objects of type "volatile short".

See also: DirectBand.

External Links