Why is dynamically allocated memory always 16 bytes aligned?

I wrote a simple example:

#include <iostream>

int main() {
    void* byte1 = ::operator new(1);
    void* byte2 = ::operator new(1);
    void* byte3 = malloc(1);
    std::cout << "byte1: " << byte1 << std::endl;
    std::cout << "byte2: " << byte2 << std::endl;
    std::cout << "byte3: " << byte3 << std::endl;
    return 0;
}

Running the example, I get the following results:

byte1: 0x1f53e70

byte2: 0x1f53e90

byte3: 0x1f53eb0

Each time I allocate a single byte of memory, it's always 16 bytes aligned. Why does this happen?

I tested this code on GCC 5.4.0 as well as GCC 7.4.0, and got the same results.

Why does this happen?

Because the standard says so. More specifically, it says that the dynamic allocations1 are aligned to at least the maximum fundamental2 alignment (it may have stricter alignment). There is a pre-defined macro (since C++17) just for the purpose of telling you exactly what this guaranteed alignment is: __STDCPP_DEFAULT_NEW_ALIGNMENT__. Why this might be 16 in your example... that is a choice of the language implementation, restricted by what is allowed by the target hardware architecture.

This is (was) a necessary design, considering that there is (was) no way to pass information about the needed alignment to the allocation function (until C++17 which introduced aligned-new syntax for the purpose of allocating "over-aligned" memory).

malloc doesn't know anything about the types of objects that you intend to create into the memory. One might think that new could in theory deduce the alignment since it is given a type... but what if you wanted to reuse that memory for other objects with stricter alignment, like for example in implementation of std::vector? And once you know the API of the operator new: void* operator new ( std::size_t count ), you can see that the type or its alignment are not an argument that could affect the alignment of the allocation.

1 Made by the default allocator, or malloc family of functions.

2 The maximum fundamental alignment is alignof(std::max_align_t). No fundamental type (arithmetic types, pointers) has stricter alignment than this.

Generating Aligned Memory, C dynamic memory allocation refers to performing manual memory management for dynamic malloc() takes a single argument (the amount of memory to allocate in bytes), while calloc() needs two arguments the minimum chunk size 16 bytes on 32-bit systems and 24/32 (depends on alignment) bytes on 64-bit systems. The "standard" C way to do that is to allocate 16 bytes more than you need, then align the returned value from malloc, but you have to keep the original value as well, so that you can free() it later. I don't see a way to do it in Fortran though.

It's probably the way the memory allocator manages to get the necessary information to the deallocation function: the issue of the deallocation function (like free or the general, global operator delete) is that there is exactly one argument, the pointer to the allocated memory and no indication of the size of the block that was requested (or the size that was allocated if it's larger), so that indication (and much more) needs to be provided in some other form to the deallocation function.

The most simple yet efficient approach is to allocate room for that additional information plus the requested bytes, and return a pointer to the end of the information block, let's call it IB. The size and alignment of IB automatically aligns the address returned by either malloc or operator new, even if you allocate a minuscule amount: the real amount allocated by malloc(s) is sizeof(IB)+s.

For such small allocations the approach is relatively wasteful and other strategies might be used, but having multiple allocation methods complicate deallocation as the function must first determine which method was used.

[PDF] Memory Allocation I, 3.2.3.6 Allocating Aligned Memory Blocks by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems). The aligned_alloc function allocates a block of size bytes whose address is a multiple of alignment . That memory address is just a number, and for it to be aligned to a 16-byte boundary means that the memory address needs to be a multiple of 16 (think of all of the computer's memory being divided into 16-byte chunks, what we want is the beginning of one of those chunks).

There are actually two reasons. The first reason is, that there are some alignment requirements for some kinds of objects. Usually, these alignment requirements are soft: A misaligned access is "just" slower (possibly by orders of magnitude). They can also be hard: On the PPC, for instance, you simply could not access a vector in memory if that vector was not aligned to 16 bytes. Alignment is not something optional, it is something that must be considered when allocating memory. Always.

Note that there is no way to specify an alignment to malloc(). There's simply no argument for it. As such, malloc() must be implemented to provide a pointer that is correctly aligned for any purposes on the platform. The ::operator new() in C++ follows the same principle.

How much alignment is needed is fully platform dependent. On a PPC, there is no way that you can get away with less than 16 bytes alignment. X86 is a bit more lenient in this, afaik.


The second reason is the inner workings of an allocator function. Typical implementations have an allocator overhead of at least 2 pointers: Whenever you request a byte from malloc() it will usually need to allocate space for at least two additional pointers to do its own bookkeeping (the exact amount depends on the implementation). On a 64 bit architecture, that's 16 bytes. As such, it is not sensible for malloc() to think in terms of bytes, it's more efficient to think in terms of 16 byte blocks. At least. You see that with your example code: The resulting pointers are actually 32 bytes apart. Each memory block occupies 16 bytes payload + 16 bytes internal bookkeeping memory.

Since the allocators request entire memory pages from the kernel (4096 bytes, 4096 bytes aligned!), the resulting memory blocks are naturally 16 bytes aligned on a 64 bit platform. It's simply not practical to provide less aligned memory allocations.


So, taken these two reasons together, it is both practical and required to provide seriously aligned memory blocks from an allocator function. The exact amount of alignment depends on the platform, but will usually not be less than the size of two pointers.

C dynamic memory allocation, alignment. 2. Dynamic Memory Allocation in C Processors do not always access memory in byte sized chunks, instead in 2, 4, 8, even 16 or 32 byte chunks. Why are the last three bits of a block header always zero and what is the significance of this? A block is always aligned to be a multiple of eight, so the last three bits will always be zero. This means that those bits can be used for something else, such as indicating whether or not the block has been allocated.

Why does this happens?

Because in general case library does not know what kind of data you are going to store in that memory so it has to be aligned to the biggest data type on that platform. And if you store data unaligned you will get significant penalty of hardware performance. On some platforms you will even get segfault if you try to access data unaligned.

Aligned Memory Blocks (The GNU C Library), The situation is mostly simple in pbrt, since most dynamic memory allocation is To be safe, the implementation always hands out 16-byte-aligned pointers (i.e.,� Dynamically created lists insertions and deletions can be done very easily just by the manipulation of addresses whereas in case of statically allocated memory insertions and deletions lead to more movements and wastage of memory. When you want you to use the concept of structures and linked list in programming, dynamic memory allocation is a must.

Due to the platform. On X86 it isn't necessary but gains performance of the operations. As I know on newer models it doesn't make a difference but compiler goes for the optimum. When not aligned properly for example a long not aligned 4 byte on a m68k processor will crash.

[PDF] Lecture 6: Outline, Memory is allocated to applications using the malloc subsystem. allows a program to control the default malloc alignment dynamically at runtime. Each block must be aligned on a 16 or 32 byte boundary, thus the total amount of of the malloc 3.1 allocation algorithm, the application program almost always receives� Is there a way to get aligned dynamically allocated memory? (provided that the requested memory size is a power of 2.) For example, if I request 128 bytes of memory, can I implement an allocator that allocates 128 bytes with 128-byte alignment? Of course I know that it is possible by allocating twice the requested

Memory Management, It means the memory allocated dynamically is guaranteed to have the same alignment of allocating memory from free storage: 16 bytes. CPU does not read from or write to memory one byte at a time. Instead, CPU accesses memory in 2, 4, 8, 16, or 32 byte chunks at a time. The reason for doing this is the performance - accessing an address on 4-byte or 16-byte boundary is a lot faster than accessing an address on 1-byte boundary.

System memory allocation using the malloc subsystem, Placement new won't allocate the memory for you-it will just create an object of the When you dynamically allocate an object of the Waypoint class without specifying an alignment, the matrix member variable is not guaranteed to be on a 16-byte C's philosophy has always been that you only pay for what you use, and if� The malloc function returns a pointer to the allocated memory of byte_size. Example: ptr = (int *) malloc (50) When this statement is successfully executed, a memory space of 50 bytes is reserved. The address of the first byte of reserved space is assigned to the pointer ptr of type int. Consider another example:

The alignment of dynamically allocating memory, However, each instance of struct s2 is aligned to a 32 byte boundary as declared in the __declspec. Hence, each instance of struct s2 present within struct s1 will be aligned to a 32 byte boundary. BKM: Alignment of dynamically allocated memory: We can further extend this example, by dynamically allocating an array of struct s2.

Comments
  • @MosheRabaev As far as I know, the alignas is used on specific variable or type. How can I set the default alignas to every object?
  • @MosheRabaev If there is a default alignment, does it apply to objects on the stack too?
  • There is no global alignas, I don't know what @MosheRabaev wants to say with the comment.
  • I have no clue why by default it's aligning to 16 bytes. I phrased it wrongly, I mean to say use alignas for custom behavior.
  • Is there any synonym for __STDCPP_DEFAULT_NEW_ALIGNMENT__ in C++11?
  • According to your explanation, __STDCPP_DEFAULT_NEW_ALIGNMENT__ is 16, which is consistent with my test result in gcc 7.4 with C++17. But I found the value of sizeof(std::max_align_t) is 32 in gcc 5.4 with C++11 and gcc 7.4 with C++17.
  • @jinge interesting. Then I may have gotten something wrong about their relation. I thought STDCPP_DEFAULT_NEW_ALIGNMENT would have been bigger.
  • @eerorika Since C++17 [new.delete.single]/1 says that this overload of operator new only needs to return a pointer suitably aligned for any complete object type of the given size given that it doesn't have new-extended alignment, where new-extended means larger than __STDCPP_DEFAULT_NEW_ALIGNMENT__. I didn't find anything requiring this to be at least as large as the largest fundamental alignment, which is alignof(std​::​max_­align_­t) (I think you mixed up sizeof and alignof.).
  • @jinge Try alignof(std::max_align_t) instead of sizeof(std::max_align_t) and you will get the same result as for __STDCPP_DEFAULT_NEW_ALIGNMENT__. As I mentioned in the comments above, this was probably a mistake by eerorika, but as I also mentioned I don't think the two values are required to be ordered in a certain way (I don't know for sure though.)
  • And on other platforms you may even read/write the wrong data because the CPU simply ignores the last few bits of the address... (That's even worse than a SEGFAULT, imho.)
  • @cmaster In some cases, an incorrect address is even decoded as a shift instruction on the one word at the correct address. That is you get a diff result, w/o an error indication.