Handling the Hardware Cache and the TLB

December 28, 2021

Handling the Hardware Cache and the TLB

The last topic of memory addressing deals with how the kernel makes an optimal use

of the hardware caches. Hardware caches and Translation Lookaside Buffers play a

crucial role in boosting the performance of modern computer architectures. Several

techniques are used by kernel developers to reduce the number of cache and TLB

misses.

Handling the hardware cache

As mentioned earlier in this chapter, hardware caches are addressed by cache lines.

The L1_CACHE_BYTES macro yields the size of a cache line in bytes. On Intel models

earlier than the Pentium 4, the macro yields the value 32; on a Pentium 4, it yields

the value 128.

To optimize the cache hit rate, the kernel considers the architecture in making the

following decisions.

• The most frequently used fields of a data structure are placed at the low offset

within the data structure, so they can be cached in the same line.

• When allocating a large set of data structures, the kernel tries to store each of

them in memory in such a way that all cache lines are used uniformly.

Cache synchronization is performed automatically by the 80 × 86 microprocessors,

thus the Linux kernel for this kind of processor does not perform any hardware cache flushing. The kernel does provide, however, cache flushing interfaces for pro-

cessors that do not synchronize caches.

Handling the TLB

Processors cannot synchronize their own TLB cache automatically because it is the

kernel, and not the hardware, that decides when a mapping between a linear and a

physical address is no longer valid.

Linux 2.6 offers several TLB flush methods that should be applied appropriately,

depending on the type of page table change

Despite the rich set of TLB methods offered by the generic Linux kernel, every micro-

processor usually offers a far more restricted set of TLB-invalidating assembly lan-

guage instructions. In this respect, one of the more flexible hardware platforms is

Sun’s UltraSPARC. In contrast, Intel microprocessors offers only two TLB-invalidat-

ing techniques:

• All Pentium models automatically flush the TLB entries relative to non-global

pages when a value is loaded into the cr3 register.

• In Pentium Pro and later models, the invlpg assembly language instruction inval-

idates a single TLB entry mapping a given linear address.

avoid useless TLB flushing in multiprocessor systems, the kernel uses a tech-

nique called lazy TLBmode. The basic idea is the following: if several CPUs are using

the same page tables and a TLB entry must be flushed on all of them, then TLB

flushing may, in some cases, be delayed on CPUs running kernel threads.

In fact, each kernel thread does not have its own set of page tables; rather, it makes

use of the set of page tables belonging to a regular process. However, there is no need

to invalidate a TLB entry that refers to a User Mode linear address, because no ker-

nel thread accesses the User Mode address space.*

When some CPUs start running a kernel thread, the kernel sets it into lazy TLB

mode. When requests are issued to clear some TLB entries, each CPU in lazy TLB

mode does not flush the corresponding entries; however, the CPU remembers that its

current process is running on a set of page tables whose TLB entries for the User

Mode addresses are invalid. As soon as the CPU in lazy TLB mode switches to a reg-

ular process with a different set of page tables, the hardware automatically flushes

the TLB entries, and the kernel sets the CPU back in non-lazy TLB mode. However,

if a CPU in lazy TLB mode switches to a regular process that owns the same set of

page tables used by the previously running kernel thread, then any deferred TLB

invalidation must be effectively applied by the kernel. This “lazy” invalidation is

effectively achieved by flushing all non-global TLB entries of the CPU.

Some extra data structures are needed to implement the lazy TLB mode. The cpu_

tlbstate variable is a static array of NR_CPUS structures (the default value for this

macro is 32; it denotes the maximum number of CPUs in the system) consisting of

an active_mm field pointing to the memory descriptor of the current process (see

Chapter 9) and a state flag that can assume only two values: TLBSTATE_OK (non-lazy

TLB mode) or TLBSTATE_LAZY (lazy TLB mode). Furthermore, each memory descrip-

tor includes a cpu_vm_mask field that stores the indices of the CPUs that should

receive Interprocessor Interrupts related to TLB flushing. This field is meaningful

only when the memory descriptor belongs to a process currently in execution.

When a CPU starts executing a kernel thread, the kernel sets the state field of its

cpu_tlbstate element to TLBSTATE_LAZY; moreover, the cpu_vm_mask field of the active

memory descriptor stores the indices of all CPUs in the system, including the one

that is entering in lazy TLB mode. When another CPU wants to invalidate the TLB

entries of all CPUs relative to a given set of page tables, it delivers an Interprocessor

Interrupt to all CPUs whose indices are included in the cpu_vm_mask field of the corre-

sponding memory descriptor.

When a CPU receives an Interprocessor Interrupt related to TLB flushing and veri-

fies that it affects the set of page tables of its current process, it checks whether the state field of its cpu_tlbstate element is equal to TLBSTATE_LAZY. In this case, the ker-

nel refuses to invalidate the TLB entries and removes the CPU index from the cpu_

vm_mask field of the memory descriptor. This has two consequences:

• As long as the CPU remains in lazy TLB mode, it will not receive other Interpro-

cessor Interrupts related to TLB flushing.

• If the CPU switches to another process that is using the same set of page tables

as the kernel thread that is being replaced, the kernel invokes _ _flush_tlb() to

invalidate all non-global TLB entries of the CPU.

Search This Blog

deep5

Handling the Hardware Cache and the TLB

Handling the hardware cache

Handling the TLB

Popular Posts

Exploring the Power of JavaScript: Unveiling the Magic in Code

Mastering Python: Essential Pro Tips for Students and Beginners