Cache Notes

Cache

Cache is a small amount of fast memory combined with a large
amount (the non-cache) of slower memory.
Access to memory is generally slower than the CPU, so cache is a
scheme to help speed up the memory access time.
Locality principle - the observation that the memory references
made in any short time interval tend to use only a small fraction
of the total memory.

This forms the basis of caching systems.
If a given memory reference is to address A, it is likely
that the next memory reference will be in the general vicinity of A.
The general idea of the locality principle is that when a
word is referenced, it is brought from the large slow memory
into the cache, so that the next time it is used, it can be accessed quickly.

If a word is read or written k times in a short interval, the computer
will need 1 reference to slow memory and k-1 references to fast memory (cache).

The first reference to the word is made to main (slow) memory because
at this point the word is not in the cache.
After the first reference to word, the word is placed in the cache
A reference to a word first checks the cache for the word.
If the word is not in the cache, then main memory is checked.
The larger k is, the better the overall performance because all
of the calls to the word (except the first on) are satisfied out of the cache

h, the hit ratio, is the fraction of all references that can be
satisfied from the cache.

h = (k-1)/k
For example, if a word is read/written 10 times, main memory
is accessed once, and cache is accessed 9 times. (the word is placed
in cache after the first hit).
The hit ratio here is 9/10 or 90%. If k is 1000 times,
h= 999/1000=99.9%.
The miss ration is 1-h.

mean access time = c + (1-h)m
where c is the cache access time, and m is the main memory access time
As h approaches 1, most of the access is done in cache, and the
mean access time approaces c, the cache access time.
As h approaches 0, the mean access time approaches c + m, first
a time c to check the cache (unsuccessfully) and then a time m to
do the memory reference.

Associative cache

In associative cache:
Number of blocks that are needed = size of memory/block size
In associative cache, each slot contains:
slot size (in bits) = 1 bit (valid/invalid) +
#bits needed for block addresses + #bits needed to store the actual data
The #bits needed to store the actual data is computed:
(block size in bytes) * 8 bits/byte
Associative cache - consists of a number of slots or lines,
each containing: valid bit, block number, value
When Valid = 1, the slot is in use in the cache
In Fig 28, there is a cache with 1024 slots.
In associative cache, the order of the entries is random. When
the computer is reset, all the Valid bits are set to 0, no cache entries
are valid.
Fig 28 - the memory is 2^24 bytes divided into 2^22 4 byte blocks

The cache needs 22 bits to store labels of up to 22 bit-length names.
The block size is 4 bytes per block, which is 32 bits per block.
The cache needs 1 bit (valid/invalid) + 22 bits (for the block number) +
32 bits (for the block value) = 55 bits per line X 1024 lines

Direct-mapped cache

In direct-mapped cache:
Each slot contains:
1 bit (valid/invalid) + # bits for tag + #bits for actual data
To compute the number of bits for the tag:
#bits for address - log base 2 of the block size - log base 2 of number of slots

Subracting log base 2 of the block size is like dividing by
the block size.
Subtracting log base 2 of the number of slots is like performing
the operation: block number mod the number of slots

Fig 29: 2^24 addresses, 4 (2^2) byte blocks, 1024 (2^10)slots in the cache
Slot number = 2^24/2^2 mod 2^10 = 2^22 mod 2^10

This is the same as using the lower 10 bits.
The upper 12 bits are the tag