Aligning to cache line and knowing the cache line size
To prevent false sharing, I want to align each element of an array to a cache line. So first I need to know the size of a cache line, so I assign each element that amount of bytes. Secondly I want the start of the array to be aligned to a cache line.
I am using Linux and 8-core x86 platform. First how do I find the cache line size. Secondly, how do I align to a cache line in C. I am using the gcc compiler.
So the structure would be following for example, assuming a cache line size of 64.
element occupies bytes 0-63 element occupies bytes 64-127 element occupies bytes 128-191
and so on, assuming of-course that 0-63 is aligned to a cache line.
To know the sizes, you need to look it up using the documentation for the processor, afaik there is no programatic way to do it. On the plus side however, most cache lines are of a standard size, based on intels standards. On x86 cache lines are 64 bytes, however, to prevent false sharing, you need to follow the guidelines of the processor you are targeting (intel has some special notes on its netburst based processors), generally you need to align to 64 bytes for this (intel states that you should also avoid crossing 16 byte boundries).
To do this in C or C++ requires that you use the standard
aligned_alloc function or one of the compiler specific specifiers such as
__declspec(align(64)). To pad between members in a struct to split them onto different cache lines, you need on insert a member big enough to align it to the next 64 byte boundery
The Elements of Cache Programming Style, A useful tool for finding a hot spot is cacheprof [Seward]. cache lines are 32 bytes in size and are aligned to 32 byte offsets; memory locations which are offset � At a minimum, be aware of the cache line size and structure your data such that commonly used elements fall within the same cache line. The Intel Atom platform has the following caches, all with a cache line size of 64 bytes: • 32-K eight-way set associative L1 instruction cache. • 24-K six-way set associative L1 data cache. •
I am using Linux and 8-core x86 platform. First how do I find the cache line size.
$ getconf LEVEL1_DCACHE_LINESIZE 64
Pass the value as a macro definition to the compiler.
$ gcc -DLEVEL1_DCACHE_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE` ...
sysconf(_SC_LEVEL1_DCACHE_LINESIZE) can be used to get L1 cache size.
c - Aligning to cache line and knowing the cache line size, Your array is 256 bytes, so it will not fit in one 64 byte cache line. Your CPU has multiple cache lines, so it's very likely that 256 bytes will fit in whatever cache� The correct way to do this is not to make your data structure bigger but to try and ensure your threads access data at least 1 cache line size apart to avoid the problem all together; this will win you better cache usage as each thread will be able to work on X amount of data from the cache before having to refetch from memory.
Another simple way is to just cat the /proc/cpuinfo:
cat /proc/cpuinfo | grep cache_alignment
Cache Line Size, Secondly, alignment to the cache line size can also improve SIMD The compiler, knowing beforehand what is aligned and what is not aligned does not have to� 1. If your data is read only, what is important is that data accessed together stays in the cache as far as possible. The line size of i7 is 64 bytes (see agner.org/optimize/microarchitecture.pdf) so one of your data correctly aligned will span 2 cache lines, while if it isn't it will take 3 cache lines.
There's no completely portable way to get the cacheline size. But if you're on x86/64, you can call the
cpuid instruction to get everything you need to know about the cache - including size, cacheline size, how many levels, etc...
(scroll down a little bit, the page is about SIMD, but it has a section getting the cacheline.)
As for aligning your data structures, there's also no completely portable way to do it. GCC and VS10 have different ways to specify alignment of a struct. One way to "hack" it is to pad your struct with unused variables until it matches the alignment you want.
To align your mallocs(), all the mainstream compilers also have aligned malloc functions for that purpose.
How to align on both word size and cache lines in x86, To understand how alignment affects things, let's look at a larger context. First, as you note, 2600 bytes of UTF-8 (or any kind of data) will� coherency_line_size level number_of_sets physical_line_partition shared_cpu_list shared_cpu_map size type ways_of_associativity This gives you more information about the cache then you'd ever hope to know, including the cacheline size ( coherency_line_size ) as well as what CPUs share this cache.
posix_memalign or valloc can be used to align allocated memory to a cache line.
[PDF] The Span Cache: Software Controlled Tag Checks and Cache Line , use of the cache. If the com- piler knows the application only needs one word, only one section 5 we describe software control of cache line size, and sketch two compiler keeps the stack 32-byte aligned, which allows this transformation. In our analogy, a line size of 64 would correspond to a car with 64 seats. We always transfer things in 64-byte chunks and the bottom log₂(64) = 6 bits of an address refer to a particular byte offset in a cache line. The next log₂(64) = 6 bits determine which set an address falls into 5. Each of those sets can contain 8 different things, so
[PDF] Patterns for Cache Optimizations on Multi-Processor, intricate understanding of both complicated compiler tech- niques and ern processors today have cache lines that access memory in. 64 byte aligned ( CACHE BOUNDARY)) getconf lets us query the cache line size of the machine dy-. • Cache lines = L • Cache line size = B • Address length = A (32 bits in our case) • Index bits = log 2(L) • Offset bits = log2(B) • Tag bits = A - (index bits + offset bits) 11
C++ – Memory Alignment – Yet Another Compsci guy, Knowing how RAM works, how CPU caches are designed, how a A cache-line can be thought of as the same as a RAM page-frame/block. sense to align data in main-memory considering the cache-line size (which is� BKM: Using align(n) and structures to force cache locality of small data elements: You can also use this data alignment support to advantage for optimizing cache line usage. By clustering small objects that are commonly used together into a struct, and forcing the struct to be allocated at the beginning of a cache line, you can effectively
cache line, Tagged with cache line. Learning DPDK: Avoid False Sharing Align a structure accessed by each thread to a cache line size (64 bytes) using� Additionally, by aligning frequently used data to the cache line size of a specific processor, you improve cache performance. For example, if you define a structure whose size is less than 32 bytes, you may want to align it to 32 bytes to make sure that objects of that structure type are efficiently cached. # is the alignment value.