Multi stream iterators c++

iterator c++
istream iterator c++ example
c++ vector
istream_iterator delimiter
ostream_iterator example
istream_iterator c++ geeksforgeeks
istream_iterator read file line by line
istream_iterator end of file

The purpose of my program is to open a text file of m lines of the same length n, read the file column by column and print each column.

For example, for this text file

abcd
efgh 
jklm

I would like to print

a e j
b f k
c g l
d h m

As one line length can be 200 000 000 and the column length can be more than 10 000, I can't open all the file in memory in a matrix.

Theoretically, I would like to have a program that use O(m) in space and O(m*n) in time.

At the beginning, I had to think about these solutions:

  • if I see all the file for each column the complexity is O(m*n²),
  • If I use seekg and an array of positions and jump from position to position, the complexity is O(mnlog(n)).

Last point, for some server problems, I need to use only the STL.

My last idea is to create an array of iterators of a file and initialized these iterators at the beginning of each line. After that, to see the next column, I only need to increase each iterator. This is my code

ifstream str2;
str2.open ("Input/test.data", ifstream::in);

int nbline = 3;
int nbcolumn = 4;
int x = 0;

istreambuf_iterator<char> istart (str2);
istreambuf_iterator<char> iend ;

istreambuf_iterator<char>* iarray;
iarray = new istreambuf_iterator<char>[nbline];


while (istart != iend){
    if (x % nbcolumn == 0){
        iarray[x/nbcolumn] = istart;
    }
    istart++;
    x++;
}

for (int j = 0; j<nbcolumn;j++){
    for (int i = 0; i<nbline;i++){
        cout  << *iarray[i] << "\t";
        iarray[i]++;
    }
    cout << endl;
}

Sadly, it does not work, and I have this thing as output

a       e       f       
�       �       �       
�       �       �       
�       �       �       

I think the problem is that the array of iterators iarray are not independent of istart, how I can do that?

You can break the task into chunks, then process each chunk before moving on to the next.

You'd need a buffer for each line (the larger this is the better the performance will be) and the seek position for that row. You may also need to make an initial pass thru the file to get the correct offsets for each row.

Read B bytes into the buffer for each row (using tellg to save the position in each row), then loop over those and generate your output. Go back and read the next B bytes from each row (using seekg to set the file position beforehand, and tellg to remember it afterwards) and generate the output. Repeat until you're done, being careful with the last chunk (or with small inputs) to not go past the end of the line.

Using your example, you have 3 rows to keep track of. Using a B size of 2, you'd read in ab, ef, and jk into your 3 buffers. Looping over those you'd output aej and bfk. Go back and read the next chunks: cd, gh, and lm. This gives cgl and dhm as output.

Multi stream iterators c++, Multi stream iterators c++. c++ iterator istream iterator c++ example ostream iterator istream_iterator delimiter vector c++ istream_iterator read file line by line istream_iterator Istream iterators are input iterators that read successive elements from an input stream (such as cin ). They are constructed from a basic_istream object, to which they become associated, so that whenever operator++ is used on the iterator, it extracts an element from the stream (using operator>> ).

I would do this like this:

  1. Open the source file.
  2. Measure line size
  3. Measure line count (file size / (line size + size of EOL)). Note EOL can be 2 bytes.
  4. calculate result file size. Open result file and force it to have it desired size, so later you can seek to any part of the file.
  5. peak some size of square which is memory manageable. For example 1024x1024
  6. Now you should load square part of the matrix. 1024 elements for rows of 1024 constitutive rows.
  7. Transpose square
  8. Write it to destination file by seeking to proper column of each part of row you are writing. (you can reduce memory consumption in previous point by transposing one column and then write it as a row, instead transposing whole square at once)
  9. Iterate square over whole file matrix

IMO you can't do it better. Most critical will be how to select size of the square. Big power of 2 is recommended.

std::istream_iterator and std::ostream_iterator in C++ STL , The STL is a very powerful library in C++. It is strongly built on Stream iterators are either input stream iterator or output stream iterator. The classes for these  An iterator is an object (like a pointer) that points to an element inside the container.We can use iterators to move through the contents of the container. They can be visualized as something similar to a pointer pointing to some location and we can access the content at that particular location using them.

If you want to do this using multiple std::istreambuf_iterators then you will need multiple fstreams for them to act on, otherwise when you iterate one (i.e. istart++) that will affect all the iterators for that fstream, meaning that the next time you iterate one (i.e. *iarray[i]++) you will skip a character. This is explained more clearly in the reference. Consider this snippet:

std::ifstream str;
str.open("test.data", std::ifstream::in);

std::istreambuf_iterator<char> i1 (str);
std::istreambuf_iterator<char> i2 (str);

std::cout << "i1 - " << *i1 << "   i2 - " << *i2 << std::endl;
i1++;
std::cout << "i1 - " << *i1 << "   i2 - " << *i2 << std::endl;
i2++;
std::cout << "i1 - " << *i1 << "   i2 - " << *i2 << std::endl;

which will output

i1 - a   i2 - a
i1 - b   i2 - a
i1 - b   i2 - c

Where i2 has appeared to 'skip' b in the stream. Even if you assign the second iterator later, i.e.

std::ifstream str;
str.open("test.data", std::ifstream::in);

std::istreambuf_iterator<char> i1 (str);
std::istreambuf_iterator<char> i2;
std::istreambuf_iterator<char> iend;

int x = 0;
while (i1 != iend) {
    if (x % 4 == 0) {
        i2 = i1;
        break;
    }
    x++;
    i1++;
}

std::cout << *i1 << " " << *i2 << std::endl;
i1++;
std::cout << *i1 << " " << *i2 << std::endl;
i2++;
std::cout << *i1 << " " << *i2 << std::endl;

the output remains the same -

i1 - a   i2 - a
i1 - b   i2 - a
i1 - b   i2 - c
Why?

Because in either case both iterators are acting on the same stream object, and every time you iterate one it removes a character from the stream. In the code in question every iterator (istart, iarray[i]) acts on the same stream object and therefore every iteration of one of them removes a char from the stream. The output is then quickly the result of undefined behavior, as iterating beyond the end-of-stream is undefined (and since the iterators are iterating together you reach it quickly).


If you want to do this the way you have outline, you simply need multiple fstream objects, such as

#include <fstream>
#include <string>
#include <iostream>


int main(int argn, char** argv) {
    std::ifstream str2;
    str2.open ("test.data", std::ifstream::in);

    int nbline = 3;
    int nbcolumn = 4;
    int x = 0;

    std::istreambuf_iterator<char> istart (str2);
    std::istreambuf_iterator<char> iend ;

    std::ifstream* streams = new std::ifstream[nbline];
    for (int ii = 0; ii < nbline; ii++) {
        streams[ii].open("test.data", std::ifstream::in);
    }
    std::istreambuf_iterator<char>* iarray = new std::istreambuf_iterator<char>[nbline];
    for (int ii = 0; ii < nbline; ii ++) {
        iarray[ii] = std::istreambuf_iterator<char> (streams[ii]);
    }

    int idx = 0;
    while (istart != iend) {
        if (x % nbcolumn == 0) {
            std::advance(iarray[x/nbcolumn], (nbcolumn+1)*idx);
            idx++;
        }
        x++;
        istart++;
    }

    for (int ii = 0; ii < nbcolumn; ii ++) {
        for (int jj = 0; jj < nbline; jj ++) {
            std::cout << *iarray[jj]++ << "\t";
        }
        std::cout << std::endl;
    }
}

Which produces the output you are expecting,

a       e       j
b       f       k
c       g       l
d       h       m

I can make no comment on the speed of this method relative to others that have been suggested, but this is how you would do what you are asking using this method.

Multi-Paradigm Programming using C++, The example program looks for the occurrence of a certain word in the input stream. #include <algorithm.> #include <iterator > #include "word. h." int main(int​  std::istream_iterator is a single-pass input iterator that reads successive objects of type T from the std::basic_istream object for which it was constructed, by calling the appropriate operator>>. The actual read operation is performed when the iterator is incremented, not when it is dereferenced.

You cannot use istreambuf_iterator twice it can only be used once. Anyhow hope code below helps you

Let me explain what I am trying to do first; You know file reads are much faster when you do it sequentally. What I am doing there is buffered read. Lets say in your example I am buffering two lines so I have to allocate 6 bytes of buffer and fill it with seeks; Each read will read two bytes as we are holding two lines. This can be optimized though if you print out first character as you read immediately you can buffer two lines just by using 3 bytes and threelines just by buffering 6 bytes in your example. Anyhow I am giving you non optimized version of it.

Again let me remind you, you cannot use istreambuf_iterator twice: How do I use an iterator on an ifstream twice in C++?

if you have to use iterator you can implement your iterator that can seek and read on a file; can be really messy though,,,

#include <iostream>
#include <fstream>
#include <vector>
#include <stdexcept>
#include <sstream>
#include <algorithm>

std::vector<std::size_t> getPositions(std::ifstream& str2, int &numcolumns) {
    std::vector<std::size_t> iarray;

    iarray.push_back(0); // Add first iterator

    bool newlinereached = false;
    int tmpcol = 0;
    int currentLine = 0;
    char currentChar = 0;
    char previosChar = 0;

    numcolumns = -1;

    for (str2.seekg(0, std::ios_base::beg); !str2.eof(); previosChar = currentChar) {
        const std::size_t currentPosition = str2.tellg();
        str2.read(&currentChar, 1);
        if (newlinereached) {
            if (currentChar == '\r') {
                // Always error but skip for now :)
                continue;
            }
            else if (currentChar == '\n') {
                // ERROR CONDITION WHEN if (numcolumns < 0) or previosChar == '\n'
                continue;
            }
            else if (tmpcol == 0) {
                throw std::runtime_error((std::stringstream() << "Line " << currentLine << " is empty").str());
            }
            else {
                if (numcolumns < 0) {
                    // We just found first column size
                    numcolumns = tmpcol;
                    iarray.reserve(numcolumns);
                }
                else if (tmpcol != numcolumns) {
                    throw std::runtime_error((std::stringstream() << "Line " << currentLine
                        << " have incosistend number of columns it should have been " << numcolumns).str());
                }

                iarray.push_back(currentPosition);
                tmpcol = 1;
                newlinereached = false;
            }
        }
        else if (currentChar == '\r' || currentChar == '\n') {
            newlinereached = true;
            ++currentLine;
        }
        else {
            tmpcol++;
        }
    }

    if (currentChar == 0) {
        throw std::runtime_error((std::stringstream() << "Line " << currentLine
            << " contains 'null' character " << numcolumns).str());
    }

    str2.clear(); // Restart 

    return iarray;
}

int main() {
    using namespace std;

    ifstream str2;
    str2.open("Text.txt", ifstream::in);
    if (!str2.is_open()) {
        cerr << "Failed to open the file" << endl;
        return 1;
    }

    int numinputcolumns = -1;

    std::vector<std::size_t> iarray =
        getPositions(str2, numinputcolumns); // S(N)

    const std::size_t numinputrows = iarray.size();

    std::vector<char> buffer;
    const int numlinestobuffer = std::min(2, numinputcolumns); // 1 For no buffer

    buffer.resize(numinputrows * numlinestobuffer); // S(N)

    const std::size_t bufferReadMax = buffer.size();


    for (int j = 0; j < numinputcolumns; j += numlinestobuffer)
    {
        // Seek fill buffer. Needed because sequental reads are much faster even on SSD
        // Still can be optimized more: We can buffer n+1 rows as we can discard current row read
        std::size_t nread = std::min(numlinestobuffer, numinputcolumns - j);
        for (int i = 0; i < numinputrows; ++i)
        {
            str2.seekg(iarray[i], ios_base::beg);
            size_t p = str2.tellg();
            str2.read(&buffer[i * numlinestobuffer], nread);
            iarray[i] += nread;
        }

        // Print the buffer
        for (int b = 0; b < nread; ++b)
        {
            for (int k = 0; k < numinputrows; ++k) {
                std::cout << buffer[b + k * numlinestobuffer] << '\t';
            }
            std::cout << std::endl;
        }
    }

    return 0;
}

Differences between streams and iterators in C++?, tl;dr: Streams are different from iterators in one or both of two of the following aspects. Second, most iterators are ephemeral pointers to persistent state, and can support it (reverse iteration, multiple pass, random-access). An iterator is an object that allows you to step through the contents of another object, by providing convenient operations for getting the first element, testing when you are done, and getting the next element if you are not. In C, we try to design iterators to have operations that fit well in the top of a for loop.

C++ Tutorial: STL III - Iterators - 2020, An iterator is an object that can navigate over elements of STL containers. eos; // end of stream iterator cout << "Sum of the data is " << accumulate(iter, eos, 0) <​< endl; return 0; } g++ -std=c++11 -o f f.cpp $ . col_mset.insert(6); col_mset.​insert(2); multiset<int>::const_iterator pos_mset; cout << "multi sets in ascending​  I think that the reasons why that is the case illuminate differences between streams and iterators; the objects I'm working with are either "actually streams" or "something else", but they are not iterators. tl;dr: Streams are different from iterators in one or both of two of the following aspects. First, an iterator to N results is post

istream_iterator - C++ Reference, Istream iterators are input iterators that read successive elements from an input This kind of iterator has a special state as an end-of-stream iterator, which is  The position of new iterator using next() is : 4 The position of new iterator using prev() is : 3 6. inserter() :- This function is used to insert the elements at any position in the container. It accepts 2 arguments, the container and iterator to position where the elements have to be inserted .

<iterator> - C++ Reference, Multi-pass: neither dereferencing nor incrementing affects dereferenceability, { b=​a; *a++; *b; } istreambuf_iterator: Input stream buffer iterator (class template ). The most obvious form of iterator is a pointer: A pointer can point to elements in an array, and can iterate through them using the increment operator (++). But other kinds of iterators are possible. For example, each container type (such as a list) has a specific iterator type designed to iterate through its elements.

AngelikaLanger.com - Stream Iterators, This classic IOStreams library did not have stream iterators. The idea for those In contrast, container iterators are multi-pass iterators. We can repeatedly  Stream are lazy: the code is only linking the Stream to the Iterator but the actual iteration won't happen until a terminal operation. If you use the iterator in the meantime you won't get the expected result.

Comments
  • If you open the same file multiple times, you can have independent streams and thus independent iterators.
  • Alternatively, you could memory map the file into your process and treat it like one giant byte array. The OS will take care of paging in and out file contents as needed.
  • I assume the first output row should be a e j, not a e f.
  • How did you get O(mnlog(n)) for using seekg (and why would you need an array of positions instead of calculated offsets)? It's the logarithmic factor that looks wrong to me; where did that come from?
  • I do not understand. Complexity is usually a theoretical calculation. How does a complexity factor come "in practice" with no theoretical explanation?
  • Maybe you have a better implementation than me but I have already try this method and it is really slow. Can you give me the code than you think about?
  • @B.Hel: I don't know why you say this is slow. I'm skeptical that any faster way exists.
  • This method should be very fast if you choose a buffer size that plays well with your disk block size. Go with 1KB or 4KB and you should get great performance.
  • I know that it is really slow because I have already implemented this version. Try and compare with the read of the file line by line. I speak of file of size 200Tb
  • @B.Hel You want to maximize the amount of data you read every time you read, and minimize the number of seeks. Use 16K, 32K, 64K read buffers if you can. When possible, only read data once. If your row length is not a multiple of the storage unit of your OS (e.g. 2K clusters), this gets complicated as you need to maintain the end-of-line for all rows (read when the beginning of the next line was read), and have different offsets on each line that you need to read things in at. Once you do that, you can use unbuffered reads using OS API calls to avoid copying data around in memory.
  • For now, it is the method that I used but I need to open one file by line and if I have a lot of line, it can be a problem
  • Thank you for your respond but using a seekg in really to slow
  • @B.Hel: Thank you for your response. OK, there is may be a solution for you. If you answer my questions then I can analyse the feasibility for you - just for stackoverflow reputation. I even can create the complete algorithm of Solution 4, and even the integration, but then I need a better motivator than stackoverflow reputation. For further investigation I need you commitment to proceed in this way. Please, tell me how we are going to proceed. Thank you.
  • @B.Hel: Please find the questions in my post under "Solution for trade-off 4".
  • @B.Hel: BTW.: Solution 4 goes into direction of 1201programalarm's and marek-r's idea.