NUMA architectures are the logical next step in scaling from SMP architectures.
Table of contents |
2 NUMA vs. cluster computing 3 External links |
Nearly all CPU architectures use a small amount of very fast non-shared memory known as cache to exploit locality of reference in memory accesses. With NUMA, maintaining cache coherence across shared memory has a significant overhead.
Although simpler to design and build, non-cache-coherent NUMA systems are prohibitively complex to program in the standard von Neumann programming model. As a result, all fielded NUMA designs use special-purpose hardware to maintain cache coherence, and are thus classed as "cache-coherent NUMA" (ccNUMA).
This is typically done by using inter-processor communication between cache controllers to keep a consistent memory image when the same memory location is stored in more than one cache.
For this reason, ccNUMA performs poorly when multiple processors attempt to access the same memory area in rapid succession. For this reason, operating system support for NUMA attempts to reduce the frequency of this kind of access, by allocating processors and memory in NUMA-friendly ways, and by avoiding scheduling and locking algorithms that do not make unnecessary NUMA-unfriendly accesses.
NUMA can be viewed as a very tightly coupled form of cluster computing. The addition of virtual memory paging to a cluster architecture can allow NUMA to be implemented entirely in software where no NUMA hardware exists. However, the inter-node latency of software-based NUMA is several orders of magnitude greater than with hardware NUMA.
See also:
Cache coherence and NUMA
NUMA vs. cluster computing
External links
This article (or an earlier version of it) contains material from FOLDOC, used with permission.