kdb q - save partitions in parallel

kdb+ partitioned table
.q.chk kdb
kdb+ reference
kdb+ create table
kdb+ virtual column
kdb+ serialization
kdb nested table
hsym kdb+

I have a function that "retrieves" (in this case just creates) a table and stores it in a new partition

npertable:100000;
dbname:`:partdb;

newpart:{[date]
    firstofmonth:"d"$"m"$date;
    table:([]date:npertable?firstofmonth+til 25;acc:npertable?`C123`C132`C321`C121`C131;c:npertable?til 100);
    table:`date`acc xasc table;
    partname:`$(("/" sv (string dbname;string "m"$date;enlist "t")),enlist "/");
    partname set .Q.en[dbname;table];
 };

Let's assume it takes a "long" time to create the table inside the function (e.g. a lot of rows). Now, I cannot use this method on different threads

newpart peach 2018.03.01 2018.04.01 2018.05.01

because of

ERROR: 'noupdate: `. `sym

which isn't so surprising because it probably cannot asynchronously update the sym file.

Is there a way to store partitions in parallel at all in kdb?

Thanks for the help

Multi-partitioned kdb+ databases – an equity options case study , File par.txt in the main database directory defines a top-level partitioning of a database into directories. Parallel Processing - peach and .Q.fc. Contents. Parallel processing can be used to speed up calculations. The function must be costly enough to justify the messaging overhead. Take account of how work is being distributed. The normal rule of using vector operations still applies. (prefer .Q.fc)

You can do this, but it relies on you knowing the full universe of possible symbols ahead of the writedown. If, as it appears, these are some sort of account ID, you may well know all possible values before saving I suppose. In this case, you can create the sym vector first in the main thread, and then peach the write down, performing the enumeration with the $ operator, which doesn't update the global variable. For example:

npertable:100000;
dbname:`:partdb;
sym:`C123`C132`C321`C121`C131;    //create sym vector
(` sv dbname,`sym) set sym;       //save sym vector in db

newpart:{[date]
    firstofmonth:"d"$"m"$date;
    table:([]date:npertable?firstofmonth+til 25;acc:npertable?`C123`C132`C321`C121`C131;c:npertable?til 100);
    table:`date`acc xasc table;
    partname:`$(("/" sv (string dbname;string "m"$date;enlist "t")),enlist "/");
    table:@[table;`acc;`sym$];    //enumerate acc column with hardcoded column name
    partname set table;           //table already enumerated, don't use .Q.en
 };

newpart peach 2018.03.01 2018.04.01 2018.05.01

Note that in this case the column name to be enumerated is hardcoded - in a more flexible implementation, you might use some modification of .Q.en to identify columns which require enumeration and perform this for all necessary columns automatically.

Of course, if in your real newpart function there is a chance of new values being added to the acc field, this poses a larger problem. Ideally, you would want to know about any new values in the main thread before performing the peach, so you can add any new values to the sym vector.

14. Introduction to Kdb+ - Q for Mortals - Code at Kx, x is the save path as a file symbol or a vector of file symbols; y is a list of table be stored), while the remaining items are a path within the HDB (e.g. a partition). kdb does not support parallel inserts into in-memory tables. In fact updates to in-memory data may only be made from the q main thread. This means that tables are 'locked' (can't be amended) essentially to all clients if a q server is started with a negative port, and the issue is irrelevant if the q session is in single threaded mode (as most sessions tend to be).

EDIT - found an older online post from Kx that suggests that my approach below is not a good idea: "A handle must not be used concurrently between threads as there is no locking around a socket descriptor". But I'll keep it here for reference

It may be possible (though I haven't tested thoroughly) to set up a separate writer process to handle the writing, then you peach and send the data to the writer which will in turn do the enumeration and writing. Something along the lines of:

{neg[h](`runThisFunc;onThisData);(neg h)[]} peach 1 2

The flush is required I believe. Newer versions of kdb can allow large amounts of data to be sent via IPC so that part shouldn't be a problem.

Again I haven't ever done this in a production setting but in theory I can't think of why you couldn't.

The .Q namespace – Reference – kdb+ and q documentation, kdb .Q dot Q Functions. kdb provides the following functions/variables within the . Q.fc, parallel on cut .Q.fs, Loops over file (in to a file in the db directory. .Q.​hdpf .Q.hdpf[historicalport;directory;partition;`p#field] save all tables and notify host  Saves a table splayed to a specific partition of a database sorted (`p#) on a specified field .Q.en Enumerates any character columns in a table to sym and appends any new entries to a file in the db directory.

Partitioning data bases | Knowledge Base, Q namespace that pertain to writing splayed tables. So, you in parallel. Example 1: Concurrent writes to the same table on the same partition. The .Q namespace contains utility objects for q programming. dsave, Enum Extend, save Enumerating varchar columns in a table Splaying large files Data-management techniques

dsave saves a list of tables | Reference, Kdb+/q originated as an obscure academic language but over the years, it has column is stored in its own file or they are stored partitioned by temporal data. q​)/peach - Parallel each, allows process across slaves If there are too many columns in a table, then we store such tables in splayed format, i.e., we save. The partition field is and is virtual. Kdb+taq is more than 2500 partitions of daily trades and quotes. Each new day is more than 1GB. [Queries on partitioned databases can run in parallel.] To set a subview -- handy for test and development: 22 Limits. Each database runs in memory and/or disk map-on-demand -- possibly partitioned.

kdb .Q dot Q Functions » Kdb+ Tutorials, KDB+ Database. Historical and Each partition contains an amount of splayed tables. Each splayed KDB+ Database. 2. In q Cannot setup slave threads to run queries in parallel Q.dpft - save a table splayed to a partition of a database. A masochist might use a q expression to determine the partition and relative row for the absolute row number. It is less painful to use the dyadic .Q.ind, whose first argument is a partitioned table and whose second argument is a list of long values representing absolute row numbers. The result is a table in memory.