Aligning to cache line and knowing the cache line size

cache line boundary
how to calculate cache line size
cache line padding
cache line fill operation
what is the cache line size (in words)
intel cache line size
cache line bytes
how to get size of cache line

To prevent false sharing, I want to align each element of an array to a cache line. So first I need to know the size of a cache line, so I assign each element that amount of bytes. Secondly I want the start of the array to be aligned to a cache line.

I am using Linux and 8-core x86 platform. First how do I find the cache line size. Secondly, how do I align to a cache line in C. I am using the gcc compiler.

So the structure would be following for example, assuming a cache line size of 64.

element[0] occupies bytes 0-63
element[1] occupies bytes 64-127
element[2] occupies bytes 128-191

and so on, assuming of-course that 0-63 is aligned to a cache line.

To know the sizes, you need to look it up using the documentation for the processor, afaik there is no programatic way to do it. On the plus side however, most cache lines are of a standard size, based on intels standards. On x86 cache lines are 64 bytes, however, to prevent false sharing, you need to follow the guidelines of the processor you are targeting (intel has some special notes on its netburst based processors), generally you need to align to 64 bytes for this (intel states that you should also avoid crossing 16 byte boundries).

To do this in C or C++ requires that you use the standard aligned_alloc function or one of the compiler specific specifiers such as __attribute__((align(64))) or __declspec(align(64)). To pad between members in a struct to split them onto different cache lines, you need on insert a member big enough to align it to the next 64 byte boundery

The Elements of Cache Programming Style, A useful tool for finding a hot spot is cacheprof [Seward]. cache lines are 32 bytes in size and are aligned to 32 byte offsets; memory locations which are offset � At a minimum, be aware of the cache line size and structure your data such that commonly used elements fall within the same cache line. The Intel Atom platform has the following caches, all with a cache line size of 64 bytes: • 32-K eight-way set associative L1 instruction cache. • 24-K six-way set associative L1 data cache. •

I am using Linux and 8-core x86 platform. First how do I find the cache line size.

$ getconf LEVEL1_DCACHE_LINESIZE
64

Pass the value as a macro definition to the compiler.

$ gcc -DLEVEL1_DCACHE_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE` ...

At run-time sysconf(_SC_LEVEL1_DCACHE_LINESIZE) can be used to get L1 cache size.

c - Aligning to cache line and knowing the cache line size, Your array is 256 bytes, so it will not fit in one 64 byte cache line. Your CPU has multiple cache lines, so it's very likely that 256 bytes will fit in whatever cache� The correct way to do this is not to make your data structure bigger but to try and ensure your threads access data at least 1 cache line size apart to avoid the problem all together; this will win you better cache usage as each thread will be able to work on X amount of data from the cache before having to refetch from memory.

Another simple way is to just cat the /proc/cpuinfo:

cat /proc/cpuinfo | grep cache_alignment

Cache Line Size, Secondly, alignment to the cache line size can also improve SIMD The compiler, knowing beforehand what is aligned and what is not aligned does not have to� 1. If your data is read only, what is important is that data accessed together stays in the cache as far as possible. The line size of i7 is 64 bytes (see agner.org/optimize/microarchitecture.pdf) so one of your data correctly aligned will span 2 cache lines, while if it isn't it will take 3 cache lines.

There's no completely portable way to get the cacheline size. But if you're on x86/64, you can call the cpuid instruction to get everything you need to know about the cache - including size, cacheline size, how many levels, etc...

http://softpixel.com/~cwright/programming/simd/cpuid.php

(scroll down a little bit, the page is about SIMD, but it has a section getting the cacheline.)

As for aligning your data structures, there's also no completely portable way to do it. GCC and VS10 have different ways to specify alignment of a struct. One way to "hack" it is to pad your struct with unused variables until it matches the alignment you want.

To align your mallocs(), all the mainstream compilers also have aligned malloc functions for that purpose.

How to align on both word size and cache lines in x86, To understand how alignment affects things, let's look at a larger context. First, as you note, 2600 bytes of UTF-8 (or any kind of data) will� coherency_line_size level number_of_sets physical_line_partition shared_cpu_list shared_cpu_map size type ways_of_associativity This gives you more information about the cache then you'd ever hope to know, including the cacheline size ( coherency_line_size ) as well as what CPUs share this cache.

posix_memalign or valloc can be used to align allocated memory to a cache line.

[PDF] The Span Cache: Software Controlled Tag Checks and Cache Line , use of the cache. If the com- piler knows the application only needs one word, only one section 5 we describe software control of cache line size, and sketch two compiler keeps the stack 32-byte aligned, which allows this transformation. In our analogy, a line size of 64 would correspond to a car with 64 seats. We always transfer things in 64-byte chunks and the bottom log₂(64) = 6 bits of an address refer to a particular byte offset in a cache line. The next log₂(64) = 6 bits determine which set an address falls into 5. Each of those sets can contain 8 different things, so

[PDF] Patterns for Cache Optimizations on Multi-Processor, intricate understanding of both complicated compiler tech- niques and ern processors today have cache lines that access memory in. 64 byte aligned ( CACHE BOUNDARY)) getconf lets us query the cache line size of the machine dy-. • Cache lines = L • Cache line size = B • Address length = A (32 bits in our case) • Index bits = log 2(L) • Offset bits = log2(B) • Tag bits = A - (index bits + offset bits) 11

C++ – Memory Alignment – Yet Another Compsci guy, Knowing how RAM works, how CPU caches are designed, how a A cache-line can be thought of as the same as a RAM page-frame/block. sense to align data in main-memory considering the cache-line size (which is� BKM: Using align(n) and structures to force cache locality of small data elements: You can also use this data alignment support to advantage for optimizing cache line usage. By clustering small objects that are commonly used together into a struct, and forcing the struct to be allocated at the beginning of a cache line, you can effectively

cache line, Tagged with cache line. Learning DPDK: Avoid False Sharing Align a structure accessed by each thread to a cache line size (64 bytes) using� Additionally, by aligning frequently used data to the cache line size of a specific processor, you improve cache performance. For example, if you define a structure whose size is less than 32 bytes, you may want to align it to 32 bytes to make sure that objects of that structure type are efficiently cached. # is the alignment value.

Comments
  • Perhaps this can help: stackoverflow.com/questions/794632/…
  • But it doesn't show how to align to a cache using gcc.
  • Possible duplicate of Programmatically get the cache line size?
  • It's not a bad idea to use a compile-time constant of 64 bytes as the cache-line size, so the compiler can bake that into functions that care about it. Making the compiler generate code for a runtime-variable cache line size could eat up some of the benefit of aligning things, especially in cases of auto-vectorization where it helps the compiler make better code if it knows a pointer is aligned to a cache line width (which is wider than the SIMD vector width).
  • @MetallicPriest: gcc and g++ both support __attributes__
  • @MetallicPriest: mmap & VirtualAlloc allocate page aligned memory, generally page granularity is 64kb (under windows), and since 64kb is a power of 64, it will be aligned properly.
  • You can get the cache line size programatically. Check here. Also you can not generalize to having 64 byte cache lines on x86. It is only true for recent ones.
  • C++11 addes alignas that is portable way of specifying alignment
  • @NoSenseEtAl alignas officially only supports alignment up till the size of the type std::max_align_t, which is typically the alignment requirement of a long double, aka 8 or 16 bytes - not 64 unfortunately. See for example stackoverflow.com/questions/49373287/…
  • Where are these sysconf()s specified? POSIX / IEEE Std 1003.1-20xx ?
  • @BrianCain pubs.opengroup.org/onlinepubs/9699919799/functions/sysconf.html
  • @BrianCain I use Linux, so I just did man sysconf. Linux is not exactly POSIX compilant, so that Linux-specific documentation is often more useful. Sometimes it is out of date, so you just egrep -nH -r /usr/include -e '\b_SC'.
  • In case of Mac, use sysctl hw.cachelinesize.
  • Usually it's so much better to have a compile-time-constant line size that I'd rather hard-code 64 than call sysconf. The compiler won't even know it's a power of 2, so you'll have to manually do stuff like offset = ptr & (linesize-1) for remainder or bit-scan + right-shift to implement division. You can't just use / in code that's performance-sensitive.