Importing JSON dataset into Cassandra

how to insert json file into cassandra
insert json array into cassandra
cassandra query json column
cassandra insert json
cassandra json
cassandra query nested json
import csv file into cassandra
cassandra json file

I have a large dataset consisting of about 80,000 records. I want to import this into Cassandra. I only see documentation for CSV format. Is this possible for JSON?

In 2020, you can use DataStax Bulk Loader utility (DSBulk) for loading & unloading of Cassandra/DSE data in CSV and JSON formats. It's very flexible, and allows to load only part of data, flexibly map JSON fields into table fields, etc. It supports Cassandra 2.1+, and very fast.

In simplest case, data loading command would look as following:

dsbulk load -k keyspace -t table -c json -url your_file.json

DataStax blog has a series of articles about DSBulk: 1, 2, 3, 4, 5, 6

Inserting JSON data into a table, To insert JSON data, add JSON to the INSERT command.. Note the absence of the keyword VALUES and the list of columns that is present in other INSERT commands. cqlsh> INSERT INTO cycling. Let’s make a table with three columns. Then write data into it using the JSON keyword. That tells Cassandra to maps the JSON values to the columns names, just as you would expect with JSON. create table books (isbn text primary key, title text, publisher text);

To insert JSON data, add JSON to the INSERT command. refer to this link for details https://docs.datastax.com/en/cql/3.3/cql/cql_using/useInsertJSON.html

Importing JSON dataset into Cassandra - json - html, I have a large dataset consisting of about 80,000 records. I want to import this into Cassandra. I only see documentation for CSV format. Is this possible for JSON� JSON support won't be introduced until Cassandra 3.0 (see CASSANDRA-7970) and in this case you still need to define a schema for your json data to map to. You do have some other options: Use maps which sort of map to JSON. Maps can be indexed as of Cassandra 2.1 (CASSANDRA-4511) There is also a good Stack Exchange post about this.

See dsbulk solution as the ultimate one, however you may consider this trick that converts json-formatted messages (one per line) to csv on the fly (no separate conversion necessary) and loads into Cassandra using cqlsh, i.e. :

cat file.json | jq -r '[.uid,.gender,.age] | @csv' | cqlsh -e 'COPY labdata.clients(uid,gender,age) from STDIN;'

Explanations:

This requires a jq utility, installed e.g. for ubuntu as apt install jq.

Here I have a file with the following messages:

{"uid": "d50192e5-c44e-4ae8-ae7a-7cfe67c8b777", "gender": "F", "age": 19}
{"uid": "d502331d-621e-4721-ada2-5d30b2c3801f", "gender": "M", "age": 32}

This is how I convert it to csv on the fly:

cat file | jq -r '[.uid,.gender,.age] | @csv'

where -r will remove some extra \", but you still end up with quoted strings:

"d50192e5-c44e-4ae8-ae7a-7cfe67c8b777","F",19
"d502331d-621e-4721-ada2-5d30b2c3801f","M",32

Now, if you create a table clients in keyspace labdata for this data using cqlsh:

CREATE TABLE clients ( uid ascii PRIMARY KEY, gender ascii, age int);

then you should be able to run the COPY ... FROM STDIN command above

Using JSON with Cassandra – BMC Blogs, Cassandra supports JSON, and in this expert guide , complete with code, we'll show you how to use JSON to write to Cassandra tables. COPY FROM does not truncate the table before importing the new data; it adds to the preexisting data. When exporting data (COPY TO), the default is to output all columns from the Cassandra table metadata, in the order defined. If you only want to export a particular set of columns, you can specify the column names in parentheses after the table

Read JSON File in Cassandra, Load JSON file into the table. Challenge: Cassandra support only 'CSV file' load in Table(As per my understanding and searched till now)� The example JSON data you supplied is hierarchical. The JSON LIBNAME engine flattens the data into, in the case of the example data, 4 SAS data sets. You can write SAS code to take the 4 data sets and produce a single data set that contains the de-normalized data. The JSON LIBNAME engine documentation contains an example.

JSON format in Cassandra, Create database in Cassandra � Difference between Cassandra and MariaDB � Difference between MS SQL Server and Cassandra � Batch statement in� Importing data to Cassandra is fairly simple if the data is small, for bulk data things gets complicated. COPY Command: I can't remember

Apache Cassandra - Tutorial 12 - CQL, This allows us to conveniently view what is in a Cassandra database table as well as quickly Duration: 8:45 Posted: Feb 22, 2019 HOST=127.0.0.1 KEYSPACE=to_keyspace_name ./import It will process all json files in the data directory and import them to corresponding tables in the keyspace. To export/import a single table in a keyspace HOST=127.0.0.1 KEYSPACE=from_keyspace_name TABLE=my_table_name ./export HOST=127.0.0.1 KEYSPACE=to_keyspace_name TABLE=my_table_name ./import

Comments
  • Which version cassandra that you are using.
  • I have version 3.7
  • Thanks! Worked for me, and is indeed fast. Could you please add 'load' agrument to the example? Complains w/o it.
  • I'm surprised there was no follow-up to this. Seems to me the INSERT command is different from the COPY to load from a file -- which still as far as I can see only speaks CSV. Can you please be more specific about how to load an entire JSON file? basically, the inverse of sstabledump...?