C++ High Performance File Reading and Writing (C++14)
I’m writing a C++14 program to load text strings from a file, do some computation on them, and write back to another file. I’m using Linux, and the files are relatively large (O(10^6 lines)). My typical approach to this is to use the old C
sscanf utilities to read and parse the input, and
fprintf(FILE*, …) to write the output files. This works, but I’m wondering if there’s a better way with the goals of high performance and generally recommended approach with the modern C++ standard that I’m using. I’ve heard that
iostream is quite slow; if that’s true, I’m wondering if there’s a more recommended approach.
Update: To clarify a bit on the use case: for each line of the input file, I'll be doing some text manipulation (data cleanup, etc.). Each line is independent. So, loading the entire input file (or, at least large chunks of it), and processing it line by line, and then writing it, seems to make the most sense. The ideal abstraction for this would be to get an iterator to the read-in buffer, with each line being an entry. Is there a recommended way to do that with std::ifstream?
The fastest option, if you have the memory to do it, is to read the entire file into a buffer with 1 read, process the buffer in memory, and write it all out again with 1 write.
Read it all:
std::string buffer; std::ifstream f("file.txt"); f.seekg(0, std::ios::end); buffer.resize(f.tellg()); f.seekg(0); f.read(buffer.data(), buffer.size());
Then process it
Then write it all:
std::ofstream f("file.txt"); f.write(buffer.data(), buffer.size());
File I/O in C programming with examples, C File I/O – Table of Contents. 1. Opening a File 2. Reading a File 3. Writing a File 4. Closing A high performance C++14 library for effortlessly reading and writing UBJSON - WhiZTiM/UbjsonCpp. A high performance C++14 library for effortlessly reading and
I think you could read the file in parallel creating n threads which each have their own offset using david's method, and then pull data into separate area's which you then map to a single location. Check out ROMIO for ideas on how to maximize speed. ROMIO ideas could be done in std c++ without much trouble.
Fast I/O for Competitive Programming, In competitive programming, it is important to read input as fast as possible so we save However, you can still use cin/cout and achieve the same speed as of all the C++ standard streams with their corresponding standard C streams if it is (You'd need to flush if you were writing, say, an interactive progress bar, but not The first task reads the input file line-by-line and puts read lines into the queue. The second task gets the lines from the queue, formats them and puts the result in another queue. The third task gets the formatted results from the second queue and writes them to the resulting file.
If you have C++17 (std::filesystem), there is also this way (which gets the file's size through std::filesystem::file_size instead of seekg and tellg). I presume this would allow you avoid reading twice
It's shown in this answer
Which is fastest: read, fread, ifstream or mmap? – Daniel Lemire's blog, If you program in C/C++, you have many options to read files: The standard C The standard C library also offers a higher level fread function. What is the Fastest Method for High Performance Sequential File I/O in C++? 2,265 14 14 silver badges 18 18 For small files (16KiB) read and write system
C++17- std::byte and std::filesystem, My post C++17 - What's New in the Library was fine for the first overview. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 I can read and write the last write time of a file. new/delete · noexcept · nullptr · Ongoing Optimization · overloading · override · performance 4. Creating and sequentially writing a binary file Sequential file writing is very similar to reading, except that proper buffering is extremely important. Without buffering, when a program changes just one byte, the file system must fetch the disk block that contains the byte, modify the block and then rewrite it. Buffering avoids this read-modify-write IO behavior.
C++ IO Streams and File Input/Output, C/C++ IO are based on streams, which are sequence of bytes flowing in and out of In formatted or high-level IO, bytes are grouped and converted to types such as The unformatted output functions (e.g., put() , write() ) outputs the bytes as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 The last chapter explained the standard input and output devices handled by C programming language. This chapter cover how C programmers can create, open, close text or binary files for their data storage. A file represents a sequence of bytes, regardless of it being a text file or a binary file. C programming language provides access on high
Writing Data to Files Using C++, Note: This material is adapted from chapters 11 and 14 of "C++: How to Program" by Deitel and Deitel. The commands for writing data to a file are (for the most libPhenom - libPhenom is an eventing framework for building high performance and high scalability systems in C. [Apache2] LibSourcey - C++11 evented IO for real-time video streaming and high performance networking applications. [LGPL] LibU - A multiplatform utility library written in C. [BSD]