CPU memory cache
CPU memory cache is a
cache used by a CPU to store data temporary from the main memory.
Typically the fastest piece of memory on a CPU is internal flip-flops and buffers. These are not generally accessible to the programmer. Next fastest is the architecture registers. These are the registers that are accessible to the programmer (if he is programming in assembler). Next there is usually at least one cache between those registers and main memory (RAM); often there are two caches, one on-chip (on the same piece of silicon as the rest of the CPU) and one off-chip. When there is more than one cache they are usually called level-1 (for the one nearest the CPU), level-2, level-3 and so on.
The level-1 cache might be between 32 kibibits and 512 kibibits, say. In a 32 bit architecture that would be between 1024 and 16384 words (each word being 32 bits).
For example, an Intel 80486 has an 8-KB on-chip cache, and most Pentiums have a 16-KB on-chip level one cache that consists of an 8-KB instruction cache and an 8-KB data cache.
In general, a given instruction will require the CPU to fetch between one and two words of information, including the instruction itself and any data it may read from or write to main memory. If the required word is not in L1 cache, it is a "cache miss", and the word must be fetched from lower memory before the instruction can continue. Because of the difference in speed between the CPU and main memory, a cache miss can result in a relatively long delay. Much of the effort in cache design, then, is to first reduce the chance of a cache miss, and, second, to reduce the penalty incurred by a miss.
A cache will usually arrange data in "lines" (sometimes also called blocks), a line is a contiguous sequence of words starting at an address that is a multiple of the line size. A line might be 8 words say. So we might have 1024 lines in our cache. A "line" in main memory when accessed will end up as a line in the cache. Usually a cache will transfer data to and from memory in chunks of a whole line, this is more efficient than transferring each word individually. This is both a source of efficiency and inefficiency: if we end up accessing all of the data in a line then bring the line into cache all at once will have been more efficient than bring each word into the cache; on the other hand, if we only access one word from the line then we paid the cost of having to bring the whole line in which will be slower. This inefficiency can be reduced if the requested or "critical" word is the first to be retrieved, though this approach requires more difficult design and hardware.
Cache architectures
When a line from main memory is stored in the cache the cache has to associate the address of the data (where it is in main memory) with the data, so that it can correctly process accesses to the data later on. Most caches are arranged so that a particular line in main memory can end up in only a few places in the cache. If any line in main memory can go in any line in the cache then the cache is said to be fully associative. If any line in main memory can go in only one particular line in the cache then the cache is said to be direct mapped. Usually the situation is somewhere in between, in which case we have an N-way set associative cache. For example, a cache is eight-way set associative if a particular line in main memory can go in one of up to eight lines in the cache.
A direct mapped cache is easier to implement: let's say we have 1024 lines of 32 bytes each on a 32-bit byte-addressed architecture. Each address in main memory must go into a corresponding line of the cache, as determined by some direct mapping scheme. If the cache line location corresponds to bits 5-14 of the main memory address (that is the address modulo 215, divided by 25), the correspondence between main memory addresses and cache lines looks like this:
main memory address cache line
0-31 0
32-63 1
64-95 2
... ...
32736-32767 1023
32768-32799 0
32780-32811 1
and so the pattern repeats through memory.
In a direct mapped cache, when we store a line in the cache all we need to associate with it are bits 15-31 of its address. The low order bits are determined by where it is in the cache. The real saving comes when an access is performed. When a location in main memory is accessed we need to check the contents of the cache to see whether we have that data cached; in a direct mapped cache we only have one line to check, because there is only one place in the cache that it can be; we simply check the high order address bits that we have associated with that line in the cache to see if they match the address of the data being accessed.
The more ways associative a cache is the more address bits we need to associate with each line, but more importantly, the more lines we need to check when an access is performed. In a eight-way set associative cache we need to check eight different lines in the cache to see if the data is stored. Because of the speeds at which caches operate the cost in power to drive the electronics of each line to check its contents becomes significant.
Whenever a new line is brought into the cache, a line must be removed to make room for it. Because the most recently removed line is slightly more likely to be required than a line in the next level cache, some architectures implement a victim cache, which holds that line. An L1 victim cache will be slightly slower than the L1 cache, but faster than L2.