Sunday, October 23, 2011

L1 and L2 cache memory

Cache memory is an extremely fast memory that is built into the CPU. The CPU uses cache memory to store instructions and data that are repeatedly required to run programs which helps to improve overall system performance. The advantage of cache memory is that the CPU does not have to use the memory bus(EMIF) for program/data transfer for repeated memory fetches. Whenever data needs to be passed through the memory bus, the data transfer speed slows to the External Memory Interface (EMIF) capability. The CPU can process data much faster by avoiding the bottleneck created by the memory bus.

As it happens, once most programs are open and running, they use very few resources. When these resources are kept in cache, programs can operate more quickly and efficiently. All else being equal, cache is so effective in system performance that a computer running a fast CPU with little cache can have lower benchmarks than a system running a somewhat slower CPU with more cache. Cache built into the CPU itself is referred to as Level 1 (L1) cache. Cache that resides on a separate chip next to the CPU is called Level 2 (L2) cache. Some CPUs have both L1 and L2 cache built-in and designate the separate cache chip as Level 3 (L3) cache.

Usually on DSPs from Texas instruments have both L1 and L2 built-in to CPU. L1 is divided to L1P (Program) and L1D (Data). L1 and L2 are collectively referred as Internal System Memory (ISRAM) due to the reason that you can disable the cache functionality of them and use as Internal memory. This memory is extremely fast than accessing DDR/SDRAM through EMIF bus. IRAM is usually small as few KBs. The maximum I have seen so far is 256 KB.

L1 is the fastest. So it is better to use L1P and L1D separately as this will boost up the performance of the DSP application. L2 is the next fastest (less faster than L1 and a lot more faster than DDR through EMIF). Part of L2 is usually assigned as a cache (32 KB or 64 KB) and the remaining area is used as ISRAM. This area is used to store most critical parts of your code that require fast execution. For example, codes such as FIR, IIR, DCT, DFT are usually stored in ISRAM for fast execution.




----------------------------------------------------------------------------------------------

 A level 2 (L2) cache basically operates in the same manner as a level 1 cache; however, level 2
cache are typically larger in capacity. Level 1 and level 2 caches interact as follows: an address misses in
L1 and is passed on to L2 for handling; L2 employs the same valid bit and tag comparisons to determine if
the requested address is present in L2 cache or not. L1 hits are directly serviced from the L1 caches and
do not require involvement of L2 caches.
As L1P and L1D, the L2 memory space can also be split into an addressable internal memory (L2 SRAM)
and a cache (L2 Cache) portion. Unlike L1 caches that are read – allocate only, L2 cache is a read and
write allocate cache. L2 cache is used to cache external memory addresses only; whereas, L1P and L1D
are used to cache both L2 SRAM and external memory addresses.

 --------------------------------------------------------------------------------------------------

CPU cache is something more complicated than other kind of caches and thus, CPU caches are divided into 2 groups, the Level 1 and Level 2 caches, usually called L1 and L2. A L1 cache is some kind of memory which is built into the same CPU and it is the place where the CPU first try to access. The L2 cache is another memory but instead of feeding the CPU, this one feeds the L1 cache and this way the L2 cache can be understood as a cache of the L1 cache.
L2 caches may be built the same way as the L1 caches, into the CPU but sometimes it can also be located in another chip or in a MCP (Multichip Package Module), it can also be a completely separate chip. With some exceptions, L1 and L2 caches are considered SRAM (static RAM) while the memory of the computer is considered DRAM (Dynamic Ram) or any kind of variation of DRAM. Some processors use another cache level named L3.
The difference between L1 and L2 (and L3 in some cases) is the size. L1 is smaller than L2 and L3. This way the data is easier to be found in the L1 than L2, making the access much faster, if the data is not found in the L1 the data will be looked in the L2 bigger cache and if it is not there, an access to memory will be needed making the access much slower than either to L1 or L2.
The way the caches are managed depends on the architecture of the processors, but there are 2 main methods, inclusive and exclusive. In some processors the data which is stored on the L1 cache will also be present in the L2, this is called inclusive or more technically “strictly inclusive”. The AMD Athlon for example uses an exclusive cache pattern so the data will either be available in the L1 or the L2 but will never be stored in both. Intel Pentium II, III and 4 use a mixed pattern where the data must not be in both of them but usually it is. This is called mainly inclusive policy.
Which method is better is a very complicated question. The exclusive cache method can store more data because the data is not repeated on both of them. The advantage is even greater depending on the size of both caches. The major advantage of inclusive policy is that when other devices or processors in a system with several processors want to delete some data, they only need to check the L2 cache because the data will be also stored in the L1 cache for sure, while the exclusive cache policy will have to check on both the L1 and the L2 cache, making the operation slower.