Why is std::string_view faster than const char*?

Or am I measuring something else?

In this code I have a stack of tags (integers). Each tag has a string representation (const char* or std::string_view). In the loop stack values are converted to the corresponding string values. Those values are appended to a preallocated string or assigned to an array element.

The results show that the version with std::string_view is slightly faster than the version with const char*.

Code:

#include <array>
#include <iostream>
#include <chrono>
#include <stack>
#include <string_view>

using namespace std;

int main()
{
    enum Tag : int { TAG_A, TAG_B, TAG_C, TAG_D, TAG_E, TAG_F };
    constexpr const char* tag_value[] = 
        { "AAA", "BBB", "CCC", "DDD", "EEE", "FFF" };
    constexpr std::string_view tag_values[] =
        { "AAA", "BBB", "CCC", "DDD", "EEE", "FFF" };

    const size_t iterations = 10000;
    std::stack<Tag> stack_tag;
    std::string out;
    std::chrono::steady_clock::time_point begin;
    std::chrono::steady_clock::time_point end;

    auto prepareForBecnhmark = [&stack_tag, &out](){
        for(size_t i=0; i<iterations; i++)
            stack_tag.push(static_cast<Tag>(i%6));
        out.clear();
        out.reserve(iterations*10);
    };

// Append to string
    prepareForBecnhmark();
    begin = std::chrono::steady_clock::now();
    for(size_t i=0; i<iterations; i++) {
        out.append(tag_value[stack_tag.top()]);
        stack_tag.pop();
    }
    end = std::chrono::steady_clock::now();
    std::cout << out[100] << "append string const char* = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;

    prepareForBecnhmark();
    begin = std::chrono::steady_clock::now();
    for(size_t i=0; i<iterations; i++) {
        out.append(tag_values[stack_tag.top()]);
        stack_tag.pop();
    }
    end = std::chrono::steady_clock::now();
    std::cout << out[100] << "append string string_view= " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;

// Add to array
    prepareForBecnhmark();
    std::array<const char*, iterations> cca;
    begin = std::chrono::steady_clock::now();
    for(size_t i=0; i<iterations; i++) {
        cca[i] = tag_value[stack_tag.top()];
        stack_tag.pop();
    }
    end = std::chrono::steady_clock::now();
    std::cout << "fill array const char* = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;

    prepareForBecnhmark();
    std::array<std::string_view, iterations> ccsv;
    begin = std::chrono::steady_clock::now();
    for(size_t i=0; i<iterations; i++) {
        ccsv[i] = tag_values[stack_tag.top()];
        stack_tag.pop();
    }
    end = std::chrono::steady_clock::now();
    std::cout << "fill array string_view = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[µs]" << std::endl;
    std::cout << ccsv[ccsv.size()-1] << cca[cca.size()-1] << std::endl;

    return 0;
}

Results on my machine are:

Aappend string const char* = 97[µs]
Aappend string string_view= 72[µs]
fill array const char* = 35[µs]
fill array string_view = 18[µs]

Godbolt compiler explorer url: https://godbolt.org/z/SMrevx

UPD: Results after more accurate benchmarking (500 runs 300000 iterations):

Caverage append string const char* = 2636[µs]
Caverage append string string_view= 2096[µs]
average fill array const char* = 526[µs]
average fill array string_view = 568[µs]

Godbolt url: https://godbolt.org/z/aU7zL_

So in the second case const char* is faster as expected. And the first case was explained in the answers.

Simply because with std::string_view you're passed the length and you don't have to insert a null char whenever you want a new string. char* has to search for the end everytime and if you want a substring you'll probably have to copy as you'll need a null char at the end of the substring.

How exactly is std::string_view faster than const std::string&?, void foo(const char* str, size_t length) { // written to work with const char* and What makes std::string_view better than const std::string& is that it eliminates the � std::string_view is basically just a wrapper around a const char* . And passing const char* means that there will be one less pointer in the system in comparison with passing const string* (or const string& ), because string* implies something like: string* -> char* -> char[] | string |. Clearly for the purpose of passing const arguments the first pointer is superfluous.

std::string_view for practical purposes boils down to:

{
  const char* __data_;
  size_t __size_;
}

The standard actually specifies in sec. 24.4.2 that this is a pointer and size. It also specifies how certain operations work with string view. Most notably whenever you interact with std::string you will call the overload that also takes the size as input. Hence when you call append, this boils down to two different calls: str.append(sv) translates to str.append(sv.data(), sv.size()).

The significant difference is that you now know the size of the string after the append, which means you also know whether you will have to reallocate your internal buffer, and how big you have to make it. If you don't know the size up-front you could start copying, but std::string gives the strong guarantee for append, so for practical purposes most libraries probably precompute the length and the required buffer, although technically it would also be possible to just remember the old-size and erase everything after if you don't finish successfully (doubt anyone does that, although it might be a local optimization for strings since destruction is trivial).

C++ std::string_view for better performance: An example use case , It has a lot of nice features not available using const char*: * String usage is easier [code ]length()[/code] is O(1); that is far faster than [co Revisiting this, since C++17, you could use std::string_view to replace const char* or const std:: string& and In C++, how much slower is it, if at all, to use std::string instead of char*?. `std::string_view` is not zero-terminated, because it's just a view onto another string. It has no influence on the string's data, so the only way to reliably know where the string ends is by storing the string's length. This allows `std::string_view` to shrink the viewed string without modifying it (It simply reduces the length).

It may be due to string_view has the size of string value. The "const char*" hasn't information about size and has to define it.

Which is preferable to use - const char* or const std:string&?, How exactly is std::string_view faster than const std::string&?, std::string_view a const char*?, A std::string_view doesn't provide a conversion to a const char*� A std::string_view doesn't provide a conversion to a const char* because it doesn't store a null-terminated string. It stores a pointer to the first element, and the length of the string, basically. It stores a pointer to the first element, and the length of the string, basically.

Using const char* as key for map/unordered_map, How much is std::string_view faster than standard std::string operations? Have a look at a few string_view { size_t _len; const CharT* _str; }. The function foo2 could've used a const std::string& parameter, but used a std::string_view so that it is more efficient if you pass in a string that isn't a std::string; no surprises there. But it's less efficient than if you'd just given it a const std::string& parameter!

Performance of std::string_view vs std::string , Visual Studio 2017 contains support for std::string_view, a type added in C++17 to serve some of the roles previously served by const char * and const std::string& parameters. string_view is neither a “better const std::string&”, nor “better const char *”; it is neither a superset or subset of either.

Good info here already, but a few higher-level comments about this: * For many situations, the performance difference is not going to be big enough to worry about. So before rewriting code in a less readable and maintainable way, decide if this

Comments
  • From the disassembly it seems like the const char* variant calls strlen during runtime. I would think that the compiler can do this during compile time for string_view.
  • There's no reason to use std::endl here; you could make a \n part of the string you're printing anyway. Printing is outside the timed regions so it should be fine for a flush to happen or not, but probably the cout / stdio buffer will be big enough that no flushing has to happen until the end, if the output is going to a pipe. Not making a system call between each timed section is a Good Thing.
  • What about becnhmark with array? Isn't it just a pointer copied to an array. Or does string literal it references is copied too?
  • @uni: do those numbers change if you benchmark the other one first? Your total benchmark is over so quickly that the CPU might just be ramping up to max turbo around then. Or the first array pays more in page-fault cost than the 2nd array. TL:DR: that part of your results is probably down to naive microbenchmarking methodology.
  • @uni: or maybe it's real; I tried reversing them and running on Godbolt still shows fill array string_view = 19[µs], vs. 61[µs] for const char*. godbolt.org/z/MyUxqE. The loops loop basically equivalent, assuming they never fall through to the part that calls operator delete. (Of course the string-view objects are 16 bytes wide and get copies with movdqa / movaps). IDK, would have to try it locally with perf counters, or single step to see if delete calls happen. Increasing iteration count reduces the difference ratio some: godbolt.org/z/jvM8Cr
  • @PeterCordes I followed your suggestion and increased iterations count and times benchmark runs to have time average across 500 runs. Here are results: iterations 10000 50000 100000 200000 300000 string_view 17 87 183 368 588 const char* 17 88 177 353 526 delta 0 -1 6 15 62 In this case the difference is minuscule and const char* is now faster
  • @uni: That makes more sense. Probably const char* is faster for large iteration counts because it's smaller, and those large sizes mean bigger arrays that start to get L1 or even L2 cache misses. (Modern x86-64 can copy an 8-byte object just as fast as a 16-byte object, especially when they're aligned.) Still not sure what was slowing down the small size without a repeat loop.