How can I asynchronously load data from large files in Qt?

I'm using Qt 5.2.1 to implement a program that reads in data from a file (could be a few bytes to a few GB) and visualises that data in a way that's dependent on every byte. My example here is a hex viewer.

One object does the reading, and emits a signal dataRead() when it's read a new block of data. The signal carries a pointer to a QByteArray like so:

filereader.cpp
void FileReader::startReading()
{

    /* Object state code here... */

        {
            QFile inFile(fileName);

            if (!inFile.open(QIODevice::ReadOnly))
            {
                changeState(STARTED, State(ERROR, QString()));
                return;
            }

            while(!inFile.atEnd())
            {
                QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE));
                qDebug() << "emitting dataRead()";
                emit dataRead(qa);
            }
        }

    /* Emit EOF signal */

}

The viewer has its loadData slot connected to this signal, and this is the function that displays the data:

hexviewer.cpp
void HexViewer::loadData(QByteArray *data)
{
    QString hexString = data->toHex();

    for (int i = 0; i < hexString.length(); i+=2)
    {
        _ui->hexTextView->insertPlainText(hexString.at(i));
        _ui->hexTextView->insertPlainText(hexString.at(i+1));
        _ui->hexTextView->insertPlainText(" ");
    }

    delete data;
}

The first problem is that if this is just run as-is, the GUI thread will become completely unresponsive. All of the dataRead() signals will be emitted before the GUI is ever redrawn.

(The full code can be run, and when you use a file bigger than about 1kB, you will see this behaviour.)

Going by the response to my forum post Non-blocking local file IO in Qt5 and the answer to another Stack Overflow question How to do async file io in qt?, the answer is: use threads. But neither of these answers go into any detail as to how to shuffle the data itself around, nor how to avoid common errors and pitfalls.

If the data was small (of the order of a hundred bytes) I'd just emit it with the signal. But in the case the file is GB in size (edit) or if the file is on a network-based filesystem eg. NFS, Samba share, I don't want the UI to lock up just because reading the file blocks.

The second problem is that the mechanics of using new in the emitter and delete in the receiver seems a bit naive: I'm effectively using the entire heap as a cross-thread queue.

Question 1: Does Qt have a better/idiomatic way to move data across threads while limiting memory consumption? Does it have a thread safe queue or other structures that can simplify this whole thing?

Question 2: Does I have to implement the threading etc. myself? I'm not a huge fan of reinventing wheels, especially regarding memory management and threading. Are there higher level constructs that can already do this, like there are for network transport?

First of all, you don't have any multithreading in your app at all. Your FileReader class is a subclass of QThread, but it does not mean that all FileReader methods will be executed in another thread. In fact, all your operations are performed in the main (GUI) thread.

FileReader should be a QObject and not a QThread subclass. Then you create a basic QThread object and move your worker (reader) to it using QObject::moveToThread. You can read about this technique here.

Make sure you have registered FileReader::State type using qRegisterMetaType. This is necessary for Qt signal-slot connections to work across different threads.

An example:

HexViewer::HexViewer(QWidget *parent) :
    QMainWindow(parent),
    _ui(new Ui::HexViewer),
    _fileReader(new FileReader())
{
    qRegisterMetaType<FileReader::State>("FileReader::State");

    QThread *readerThread = new QThread(this);
    readerThread->setObjectName("ReaderThread");
    connect(readerThread, SIGNAL(finished()),
            _fileReader, SLOT(deleteLater()));
    _fileReader->moveToThread(readerThread);
    readerThread->start();

    _ui->setupUi(this);

    ...
}

void HexViewer::on_quitButton_clicked()
{
    _fileReader->thread()->quit();
    _fileReader->thread()->wait();

    qApp->quit();
}

Also it is not necessary to allocate data on the heap here:

while(!inFile.atEnd())
{
    QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE));
    qDebug() << "emitting dataRead()";
    emit dataRead(qa);
}

QByteArray uses implicit sharing. It means that its contents are not copied again and again when you pass a QByteArray object across functions in a read-only mode.

Change the code above to this and forget about manual memory management:

while(!inFile.atEnd())
{
    QByteArray qa = inFile.read(DATA_SIZE);
    qDebug() << "emitting dataRead()";
    emit dataRead(qa);
}

But anyway, the main problem is not with multithreading. The problem is that QTextEdit::insertPlainText operation is not cheap, especially when you have a huge amount of data. FileReader reads file data pretty quickly and then floods your widget with new portions of data to display.

It must be noted that you have a very ineffectual implementation of HexViewer::loadData. You insert text data char by char which makes QTextEdit constantly redraw its contents and freezes the GUI.

You should prepare the resulting hex string first (note that data parameter is not a pointer anymore):

void HexViewer::loadData(QByteArray data)
{
    QString tmp = data.toHex();

    QString hexString;
    hexString.reserve(tmp.size() * 1.5);

    const int hexLen = 2;

    for (int i = 0; i < tmp.size(); i += hexLen)
    {
        hexString.append(tmp.mid(i, hexLen) + " ");
    }

    _ui->hexTextView->insertPlainText(hexString);
}

Anyway, the bottleneck of your application is not file reading but QTextEdit updating. Loading data by chunks and then appending it to the widget using QTextEdit::insertPlainText will not speed up anything. For files less than 1Mb it is faster to read the whole file at once and then set the resulting text to the widget in a single step.

I suppose you can't easily display huge texts larger than several megabytes using default Qt widgets. This task requires some non-trivial approch that in general has nothing to do with multithreading or asynchronous data loading. It's all about creating some tricky widget which won't try to display its huge contents at once.

Why doesn't Qt have an async local file API and what to do instead?, What do people do when they need to read a lot of data from a local file For small files, this delay is neglectful but for large files you might want to seconds depending on network load) and I didn't want to application to  Granted, that, e.g. Notepad++ has an excellent hex viewer plugin, and you have to load the file first; but that's because the file may be edited, and that's the way NPP works. I think you would likely wind up subclassing a text box, going and getting enough data to load up the text box, or even splurge, and load up 500k of data before and after

Non-blocking local file IO in Qt5, I'm looking at how I can implement non-blocking file I/O in Qt5. My application reads large data files and displays some summarised information  I would implement a thread that will handle the I/O. You can connect the appropriate sig/slots to "invoke" the IO from your main thread to the IO thread. You can pass the data to be read/written as a parameter to the signal. Something like this:

  1. if you are planing to edit 10GB files forgot about QTextEdit. This ui->hexTextView->insertPlainText will simply eat whole memory before you will read 1/10 of the file. IMO you should use QTableView to present and edit data. To do that you should inherit QAbstractTableModel. In one row you should present 16 bytes. In first 16 columns in hex form and in next column in ASCII form. This shouldn't be to complex. Just read fearfully documentation of QAbstractTableModel. Caching data will be most important here. If I will have a time I will give code example.

  2. Forgot about use of multiple threads. This is bad case to use such thing and most probably you will create lots of problems related with synchronization.

Ok I had some time here is code which is working (I've test it works smoothly):

#include <QObject>
#include <QFile>
#include <QQueue>

class LargeFileCache : public QObject
{
    Q_OBJECT
public:
    explicit LargeFileCache(QObject *parent = 0);

    char geByte(qint64 pos);
    qint64 FileSize() const;

signals:

public slots:
    void SetFileName(const QString& filename);

private:
    static const int kPageSize;

    struct Page {
        qint64 offset;
        QByteArray data;
    };

private:
    int maxPageCount;
    qint64 fileSize;

    QFile file;
    QQueue<Page> pages;
};

#include <QAbstractTableModel>

class LargeFileCache;

class LageFileDataModel : public QAbstractTableModel
{
    Q_OBJECT
public:
    explicit LageFileDataModel(QObject *parent);

    // QAbstractTableModel
    int rowCount(const QModelIndex &parent) const;
    int columnCount(const QModelIndex &parent) const;
    QVariant data(const QModelIndex &index, int role) const;

signals:

public slots:
    void setFileName(const QString &fileName);

private:
    LargeFileCache *cachedData;
};

#include "lagefiledatamodel.h"
#include "largefilecache.h"

static const int kBytesPerRow = 16;

LageFileDataModel::LageFileDataModel(QObject *parent)
    : QAbstractTableModel(parent)
{
    cachedData = new LargeFileCache(this);
}

int LageFileDataModel::rowCount(const QModelIndex &parent) const
{
    if (parent.isValid())
        return 0;
    return (cachedData->FileSize() + kBytesPerRow - 1)/kBytesPerRow;
}

int LageFileDataModel::columnCount(const QModelIndex &parent) const
{
    if (parent.isValid())
        return 0;
    return kBytesPerRow;
}

QVariant LageFileDataModel::data(const QModelIndex &index, int role) const
{
    if (index.parent().isValid())
        return QVariant();
    if (index.isValid()) {
        if (role == Qt::DisplayRole) {
            qint64 pos = index.row()*kBytesPerRow + index.column();
            if (pos>=cachedData->FileSize())
                return QString();
            return QString::number((unsigned char)cachedData->geByte(pos), 0x10);
        }
    }

    return QVariant();
}

void LageFileDataModel::setFileName(const QString &fileName)
{
    beginResetModel();
    cachedData->SetFileName(fileName);
    endResetModel();
}

#include "largefilecache.h"

const int LargeFileCache::kPageSize = 1024*4;

LargeFileCache::LargeFileCache(QObject *parent)
    : QObject(parent)
    , maxPageCount(1024)
{

}

char LargeFileCache::geByte(qint64 pos)
{
    // largefilecache
    if (pos>=fileSize)
        return 0;

    for (int i=0, n=pages.size(); i<n; ++i) {
        int k = pos - pages.at(i).offset;
        if (k>=0 && k< pages.at(i).data.size()) {
            pages.enqueue(pages.takeAt(i));
            return pages.back().data.at(k);
        }
    }

    Page newPage;
    newPage.offset = (pos/kPageSize)*kPageSize;
    file.seek(newPage.offset);
    newPage.data = file.read(kPageSize);
    pages.push_front(newPage);

    while (pages.count()>maxPageCount)
        pages.dequeue();

    return newPage.data.at(pos - newPage.offset);
}

qint64 LargeFileCache::FileSize() const
{
    return fileSize;
}

void LargeFileCache::SetFileName(const QString &filename)
{
    file.close();
    file.setFileName(filename);
    file.open(QFile::ReadOnly);
    fileSize = file.size();
}

It is shorter then I've expected and it needs some improvement, but it should be a good base.

Keeping the GUI Responsive, Writing ODF Files with Qt | The Panel Stack Pattern » The most basic solution is to explicitly ask Qt to process pending events at some point in the computation. As it works in an asynchronous fashion, we create a local event loop to wait for on a set of data—for instance, creating thumbnails of pictures from a directory. This article shows how to handle large datasets with QTable using a model/view approach, and without using QTableItems. Model/view means that only the data that the user can see needs to be retrieved and held in memory. This approach also makes it easy to show the same dataset in multiple tables without duplicating any data. An Abstract Data Source

For a hex viewer, I don't think you're on the right track at all - unless you think it will most likely be used on system with SCSI or RAID arrays for speed. Why load gigabytes of data at a time at all? A file access to fill up a text box happens pretty fast these days. Granted, that, e.g. Notepad++ has an excellent hex viewer plugin, and you have to load the file first; but that's because the file may be edited, and that's the way NPP works.

I think you would likely wind up subclassing a text box, going and getting enough data to load up the text box, or even splurge, and load up 500k of data before and after the current position. Then, say you are starting at byte zero. Load up enough data for your display, and maybe some extra data besides; but set the scrollbar type to always visible. Then, I think you'll probably intercept the scroll events by subclassing QTextBox; and writing your own scrollContentsBy() and changeEvent() and/or paint() event.

Even more simply, you could just create a QTextBox with no scrollbars ever; and a QVerticalScrollbar beside it. Set it's range and starting value. Then, respond to valueChanged() event; and change the contents of the QTextBox. That way, the user doesn't have to wait for along disk read in order to start editing, and it'll be a lot easier on resources (i.e. memory - so that if a lot of apps are open, they don't e.g. get swapped out to disk). It sounds hard to subclass these things, but a lot of times, it seems harder than it actually is. There are often fine examples already of somebody doing something like that.

If you have multiple threads reading a file, by contrast, you may have one reading from the beginning, another from the middle, and another towards the end. A single read head will be jumping around, trying to satisfy all requests, and therefore operate less efficiently. If it is an SDD drive instead, non-linear reads won't hurt you, but they won't help you either. If you'd prefer to make the trade-off of having a perhaps noticeable loading time, so that a user can scroll around a lot willy-nilly, a little faster (a textbox full of data to read really doesn't take very long to load, after all) then you might have a single thread read it in in the background, and then you can let the main one keep processing the event loop. More simply yet, just read in blocks of n megabytes at a time as it opens the whole file at once, and then do a qApp->processEvents(); to let the GUI respond to any GUI events that may have transpired in the meantime after every block read.

If you do believe it will most likely be used on a SCSI or RAID array, then it may make sense to do multitreading to do the reading. A SCSI drive can have multiple read heads; and some RAID arrays are set up to spread their data across multiple disks for speed purposes. Note that you would be better off using a single thread to do the reading, if the RAID array is set up to keep multiple identical copies of the data for data security purposes. When I went to implement multi-threading, I found the lightweight model proposed here most helpful: QThread: You were not doing so wrong. I did have to do Q_DECLARE_METATYPE on the result structure, have a constructor, destructor, and a move operator defined for it (I used memmove), and did a qRegisterMetaType() on both the structure, and the vector to hold the results, for it to return the results right. You pay the price of it blocking the vector in order to return its results; but the actually overhead of that didn't seem to be much at all. Shared memory might be worth pursuing in this context, too - but maybe each thread could have its own, so you won't need to lock out reads from other thread results to write it.

Qt5 Tutorial QFile - 2020, QFile is an I/O device for reading and writing text and binary files and resources. QTextStream takes care of converting the 8-bit data stored on disk into a 16-bit QString filename = "C:/Qt/MyFile.txt"; write(filename); read(filename); return a.​exec(); } Asynchronous QTcpServer - Client and Server using QThreadPool · Qt​  I'm not sure this is quite what you're asking, but I like to create a subset of data to intitially load, and then include search functionality. This is very easy to do using visual studio 15 and DataSources / data sets. In solution explorer, open your dataset.xsd file. It will be named DataSet.xsd Go to the Data Table in question.

Qt5 Tutorial Http File Download with QNetworkRequest and UI, In this tutorial, we will learn how to download a file using QNetworkRequest. We'll start with Qt Gui Application. The QNetworkReply class contains the data and headers for a request sent with QNetworkAccessManager; The QNetworkReply contains a Asynchronous QTcpServer - Client and Server using QThreadPool Loader can load a QML file (using the source property) or a Component object (using the sourceComponent property). It is useful for delaying the creation of a component until it is required: for example, when a component should be created on demand, or when a component should not be created unnecessarily for performance reasons.

Multithreading PyQt applications with QThreadPool (Updated 2020 , Applications based on Qt (like most GUI applications) are event based. for example opening/writing a large file, downloading some data, or rendering some complex QtGui import * from PyQt5. QtCore import * import time class MainWindow(QMainWindow): def Run the file as for any other Python/PyQt application:. Serializing Qt Data Types; SQL Support in Qt. The Qt SQL module uses driver plugins to communicate with several database APIs. Qt has drivers for SQLite, MySQL, DB2, Borland InterBase, Oracle, ODBC, and PostgreSQL. It is also possible to develop your own driver if Qt does not provide the driver needed. Qt's SQL classes can be divided in 3 layers:

c++ - Qt signal emission and QThread execution flow, Here's a piece of the Threads and QObjects doc of Qt. Direct Connection The slot is Linked. 7 · How can I asynchronously load data from large files in Qt? XML files has became a very popular file format. This is mainly due to its simplicity and mostly beacuse these files are human and machine readable. This article shows an example of XML file reading with Qt. Qt configuration. Before processing any XML file, Qt must be configured.

Comments
  • Regarding question 2: Take a look at QtConcurrent::run. It allows you to execute a function (or member function) asynchronously.
  • I would suggest the use of Worker Threads (example here in "Detailed Description" section). About heap allocation, I cannot see the need of it, just pass a const ref. Implement a Controller (your HexViewer I suppose) & Worker thread as shown in the example
  • First point: good catch on the QThread inheritance. I'm aware it won't make my app multithreaded, it was left over from my initial attempts at getting something working. I specifically avoided making the demo code threaded at all to avoid answers based on a possibly-wrong approach I started.
  • Second (and main) point: I'm concerned about what will happen in the situation where multiple signals are emitted from the FileReader before the HexViewer processes them (eg. due to a quirk of threading, perhaps the UI thread doesn't wake up on every emit). Does the QByteArray cope with that? From reading the docs on implicit sharing, I gather that yes, it should hang around while it's in scope in either the signal emitter or receiver, and be deleted when both (a) the emitter scope closes and (b) the receiver scope closes. Correct?
  • But there's a marshalling process in between (to get it across the thread boundary) and I don't know enough about that to reason about it.
  • Third point: from my initial scratchings, the FileReader::State seemed to cross thread boundaries just fine. Was I courting undefined behaviour, or does Qt5 deal automatically with basic enum types?
  • 4. Current implementation of FileReader won't wait for someone to dispose its data. Currenly FileReader is just reading the file and then sends its data chunks to nowhere and that's all. If there is a listener, it will eventually catch that data. The behaviour you want assumes there should be some sort of request-response interaction. The widget requests some data, gets a response from the reader, displays the data and then sends a new request that it wants another part.
  • So I actually tried to implement a semaphore-based consumer/producer setup, but couldn't get it to work (see my repo @da1a1c7a). The problem is, the producer thread blocks when it tries to acquire the semaphore, and so the event loop in that thread can't process any more events. I haven't quite implemented what you've described (there's no third class that holds the data, it
  • @detly , you should have two semaphores defined as is shown in the cited example , and have a circular buffer to put your data. With one semaphore, you are going to block, because while the producer is putting data the consumer cannot consume it. I would also try to make the filereader stateless, and remove both state information and the associated mutex. Please excuse me if I have not helped too much.
  • @detly , check the updated answer and let me know if you have gotten through your problem.
  • I'd rather not focus on the UI itself. The hex viewer is just an example so I can understand the proper way to do asynchronous file IO. The point is, in situations where I'm reading a lot of data from a file, or reading it off a slow filesystem (eg. samba share, NFS), I don't want the UI to freeze just because file reading is blocking. So, if not threads, how would I do that?
  • You have to think about UI. If files are verry big you must have a interaction between UI and data model, which will give you information which part of file user is currently see, to store in memory only that piece of file. Threads here are completely obsolete. Simply you are trying to read data in blocking way and this is only reason you are trying to use threads.
  • Besides, the file doesn't have to be huge to be a problem. My sample app locks up if I try to display so much as 1kb of data. That's not much.
  • You app freezes because you insert 1kb of data char by char. That's terribly inefficient.
  • Even fixing that problem, it behaves exactly the same.