Main Page | See live article | Alphabetical index

Non-Uniform Memory Access

Non-Uniform Memory Access or NUMA is a computer memory architecture, used in multiprocessors, where the memory access time depends on the memory location. A processor can access its own local memory faster than non-local memory (memory which is local to another processor or shared between processors).

NUMA architectures are the logical next step in scaling from SMP architectures.

Table of contents
1 Cache coherence and NUMA
2 NUMA vs. cluster computing
3 External links

Cache coherence and NUMA

Nearly all CPU architectures use a small amount of very fast non-shared memory known as cache to exploit locality of reference in memory accesses. With NUMA, maintaining cache coherence across shared memory has a significant overhead.

Although simpler to design and build, non-cache-coherent NUMA systems are prohibitively complex to program in the standard von Neumann programming model. As a result, all fielded NUMA designs use special-purpose hardware to maintain cache coherence, and are thus classed as "cache-coherent NUMA" (ccNUMA).

This is typically done by using inter-processor communication between cache controllers to keep a consistent memory image when the same memory location is stored in more than one cache. For this reason, ccNUMA performs poorly when multiple processors attempt to access the same memory area in rapid succession. For this reason, operating system support for NUMA attempts to reduce the frequency of this kind of access, by allocating processors and memory in NUMA-friendly ways, and by avoiding scheduling and locking algorithms that do not make unnecessary NUMA-unfriendly accesses.

NUMA vs. cluster computing

NUMA can be viewed as a very tightly coupled form of cluster computing. The addition of virtual memory paging to a cluster architecture can allow NUMA to be implemented entirely in software where no NUMA hardware exists. However, the inter-node latency of software-based NUMA is several orders of magnitude greater than with hardware NUMA.

See also:

External links


This article (or an earlier version of it) contains material from FOLDOC, used with permission.