Google Sparsehash uses realloc() on type which is not trivially copyable

Consider this simple program:

#include <string>
#include <sparsehash/dense_hash_map>

int main()
{
    google::dense_hash_map<std::string, int> map;
    map["foo"] = 0;
}

Compiling with GCC 8.2 and -Wclass-memaccess (or -Wall) produces a warning:

sparsehash/internal/libc_allocator_with_realloc.h:68:40: warning:
‘void* realloc(void*, size_t)’ moving an object of non-trivially copyable type
    ‘struct std::pair<const std::__cxx11::basic_string<char>, int>’;
use ‘new’ and ‘delete’ instead [-Wclass-memaccess]
    return static_cast<pointer>(realloc(p, n * sizeof(value_type)));
                                ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

The questions are:

  1. Is it undefined behavior?
  2. Can you suggest a fix or workaround which can be applied to the application code (not by changing Sparsehash or avoiding its use)?
  3. (Bonus points) can you construct a program which actually misbehaves due to this (using std::string or your own nontrivial type)? So far I haven't seen any problems in code using std::string as the key type, despite the fact that std::string must be quite a commonly used key type.

I've filed an issue here: https://github.com/sparsehash/sparsehash/issues/149

  1. Yes, it is undefined behavior. But don't despair yet, iff std::string doesn't store any internal pointers in your implementation, nor registers them anywhere, it will "work" anyway; making a bitwise copy will be equivalent to move-construction at the destination and destructing the source. Such is the case for most (not all) string-implementations, SSO or not.

  2. If you might use a type not guaranteed to be trivially destructively-moveable, use a different allocator (last template-argument) to avoid bitwise-moves.

  3. Making a program blow up due to invalid moveing by bitwise copy is trivial. Use this type with google::dense_hash_map:

    class bang {
        bang* p;
    public:
        bang() : p(this) {}
        bang(bang const&) : bang() {}
        bang& operator=(bang const&) { return *this; }
        ~bang() { if (p != this) std::abort(); }
    };
    

sparsehash/sparsehash, Building Sparsehash trunk or 2.0.3 from source with GCC 8.2 gives various Discussion: https://stackoverflow.com/questions/52436264/google-sparsehash-​uses-realloc-on-type-which-is-not-trivially-copyable Prevent compiler warning about calling realloc() on an object which cannot be relocated in  A regular allocator (e.g. std::allocator) must be used to avoid undefined behavior unless is_trivially_copyable<value_type> is true. As Sparsehash is written today, the default is a custom allocator using realloc() which invokes undefined behavior for many common types, including std::string as either key or value type.

1. Is it undefined behavior? Yes. You should never copy objects using realloc(), because sometimes they have internal pointers that point to a resource. The problem comes later when 2 different objects have their destructors run. Now a double deallocation occurs of the same resource, a complete no no.

2. Can you suggest a fix or workaround which can be applied to the application code (not by changing Sparsehash or avoiding its use)?

Try

#include <memory>

and change line

google::dense_hash_map<std::string, int> map;

to

google::dense_hash_map<std::string, int, std::hash<std::string>, std::equal_to<std::string>, std::allocator> map;

Now, it won't use google's allocator libc_allocator_with_realloc

3. (Bonus points) can you construct a program which actually misbehaves due to this (using std::string or your own nontrivial type)? So far I haven't seen any problems in code using std::string as the key type, despite the fact that std::string must be quite a commonly used key type.

Not easily. Because you are trying to cause undefined behavour. In your test program, I would feed strings that are at least 32 characters in length, so the small string optimisation does not kick in. And there are tests that can be done in gcc's heap to see if it has gone corrupt. See 1

sparsepp.hpp, Google Sparsehash uses realloc() on type which is not trivially copyable realloc(void*, size_t)' moving an object of non-trivially copyable type  2 Google Sparsehash uses realloc() on type which is not trivially copyable Sep 25 '18 2 Merge-Sort on linked list Jul 7 '18 1 MPI_Bcast c++ STL vector Jul 18 '18

I suppose this code anticipates the maybe c++20 class attribute trivially relocatable. In essence, this is an object which memory location can safely be changed. In c++ parlance, this is an object that can safely be copied by coping the object representation and the program keeps the expected behavior as long as the copied object is not accessed any more, not even for destruction.

For example, this code might not be specified as "undefined behavior" by the C++20 standard:

alignas(string) unsigned char buffer[sizeof(string)];
auto p = new(buffer) string{"test"};
alignas(string) unsigned char buffer2[sizeof(string)];
memcpy(buffer,buffer2,sizeof(string));//create a new string object copy of *p;
auto np = reinterpret_cast<string*>(buffer2);
(*np)[0]="r";
// the object at p shall not be accessed, not even destroyed.

A type should not be trivially relocatable if it has a non static data member that refers to any part of itself:

struct fail{
    int a;
    int b;
    int* selected;
    fail(bool is_a,int i){
       if (is_a){ a=i; selected=&a;}
       else { b=i; selected=&b;}
       }
     };

Some implementation of linked list container could also not be trivially relocatable, for example if the container hold a member which is the root node. So dense_hash_map shall just not be used with those kind of self memory referencing types.

Sparsehash Internals, Neither the name of Google Inc. nor the names of its // contributors may be used native type, we need to // include a std lib header to detect this - not ideal, but we'll use), or else a pointer to // something that supports a Read()/Write() method. the array, assuming value_type has trivial copy // constructor and destructor,  C++ implementation of a memory efficient hash map and hash set - Tessil/sparse-map

Il2CppOutputProject/IL2CPP/external/google/sparsehash , Sparsehash also provides dense_hash_map and dense_hash_set , which What makes sparse_hash_map clever is that it uses a sparse array to but through clever bitmap manipulation techniques, Google's sparse No issues? std::string> MyHashMap; int main() { // Construct a sparse_hash_map  It is unacceptable behaviour because objects in C++ are not trivially copyable in general. Therefore `realloc()` cannot be used in generic code. However the ability to resize already allocated memory block can be very useful.

'[kdevplatform] /: update the google sparsehash classes', Type requirements: value_type is required to be Copy Constructible // and Use // <google/dense_hash_map> or <google/dense_hash_set> instead. reference operator*() const { return *pos; } pointer operator->() const (Really, we want it to have "trivial // move", because that's what realloc does. The point to note is that realloc() should only be used for dynamically allocated memory. If the memory is not dynamically allocated, then behavior is undefined. If the memory is not dynamically allocated, then behavior is undefined.

Pandas Loc Index and Column Condition, Likewise, insert() may invalidate +// pointers into the hashtable. + // NOTE: if Key or T are not POD types, the serializer MUST use + // placement-new to number of buckets, assuming value_type has trivial copy - // constructor and destructor. (Really, we want it to have "trivial - // move", because that's what realloc does. A copyable type is one that can be initialized or assigned from any other object of the same type (so is also movable by definition), with the stipulation that the value of the source does not change.

Comments
  • I can't recreate this using either GCC 8.1 or clang (using -Weverything) using sparsehash 2.0.3. What sparsehash version are you using?
  • @Yuushi: I'm using GCC 8.2 on 64-bit Linux (just edited the question from "GCC 8" to "GCC 8.2" in case it matters). I tried Sparsehash 2.0.2 and trunk (which should be newer than 2.0.3); both had the warning.
  • @Yuushi: I've now tried with Sparsehash 2.0.3 and it produces the same warning. In fact, I built it from source and GCC 8.2 issues similar warnings about class-memaccess while building Sparsehash itself.
  • From dense_hash_map documentation: "Alloc The STL allocator to use. By default, uses the provided allocator libc_allocator_with_realloc, which likely gives better performance than other STL allocators due to its built-in support for realloc, which this container takes advantage of." This kind-of seems that for (at least) non-triviallycopyable types one should provide another allocator to dense_hash_map. If fact, I don't think that realloc can be used with any types according to the standard, since it does not construct objects in the allocated memory.
  • @DanielLangr: That's helpful, thanks. The mysteries to me right now are (1) why such a seemingly major limitation is not called out in the comments or docs, and (2) why even building Sparsehash itself causes these sorts of warnings (making it seem like Sparsehash has UB even before you actually use it in an application).
  • The C and C++ languages have long been caught in a Catch-22: they don't want to add new definitions for constructs which have been in use for decades without need for an official definition, but some compiler writers insist that constructs that aren't "officially" defined aren't really part of the language. The notion that some types of objects can be safely relocated and others can't goes back to 1986; if the language survived three decades without it, why should the Committee recognize that it's needed "now"?
  • @supercat They could decide to make the standard c++ specification a representation of the standard behavior ;)! Or not!
  • The Committee likely thought the only time it would matter whether the Standard actually mandated something would be in cases where an implementer had a good reason for wanting to do something different; even in those cases the implementer's judgment would likely be better than that of a Committee with no idea what reasons an implementer might have for doing something unusual. They have no diplomatic way of handling the case where an implementation does something silly for no reason beyond the fact that the Standard didn't mandate the common behavior.
  • @supercat Sentence as this one [expr.rel] If both operands (after conversions) are of arithmetic or enumeration type, each of the operators shall yield true if the specified relationship is true and false if it is false. showes that the standard is also the result of an intention to formalize common/standard implementation (non silly one at least). But formalism always demonstrates their limit when applied to reality. The same happen in countries with a tradition of written law. In those countries, unscrupulous can follow the written law while they violate the spirit of the law.
  • @supercat The concept of "quality implementation" is something you are coming back recurrently. I can't find definition on internet for this, but I have read a lot of your comments. The idea is that the standard shall be more "relaxed" and offer opportunities to implementation to show their bests? Something wiser?