I have a large dataset consisting of about 80,000 records. I want to import this into Cassandra. I only see documentation for CSV format. Is this possible for JSON?

In 2020, you can use DataStax Bulk Loader utility (DSBulk) for loading & unloading of Cassandra/DSE data in CSV and JSON formats. It's very flexible, and allows to load only part of data, flexibly map JSON fields into table fields, etc. It supports Cassandra 2.1+, and very fast.

In simplest case, data loading command would look as following:

dsbulk load -k keyspace -t table -c json -url your_file.json

DataStax blog has a series of articles about DSBulk: 1, 2, 3, 4, 5, 6

Inserting JSON data into a table, To insert JSON data, add JSON to the INSERT command.. Note the absence of the keyword VALUES and the list of columns that is present in other INSERT commands. cqlsh> INSERT INTO cycling. Let’s make a table with three columns. Then write data into it using the JSON keyword. That tells Cassandra to maps the JSON values to the columns names, just as you would expect with JSON. create table books (isbn text primary key, title text, publisher text);

To insert JSON data, add JSON to the INSERT command. refer to this link for details

See dsbulk solution as the ultimate one, however you may consider this trick that converts json-formatted messages (one per line) to csv on the fly (no separate conversion necessary) and loads into Cassandra using cqlsh, i.e. :

cat file.json | jq -r '[.uid,.gender,.age] | @csv' | cqlsh -e 'COPY labdata.clients(uid,gender,age) from STDIN;'


This requires a jq utility, installed e.g. for ubuntu as apt install jq.

Here I have a file with the following messages:

{"uid": "d50192e5-c44e-4ae8-ae7a-7cfe67c8b777", "gender": "F", "age": 19}
{"uid": "d502331d-621e-4721-ada2-5d30b2c3801f", "gender": "M", "age": 32}

This is how I convert it to csv on the fly:

cat file | jq -r '[.uid,.gender,.age] | @csv'

where -r will remove some extra \", but you still end up with quoted strings:


Now, if you create a table clients in keyspace labdata for this data using cqlsh:

CREATE TABLE clients ( uid ascii PRIMARY KEY, gender ascii, age int);

then you should be able to run the COPY ... FROM STDIN command above

  • Which version cassandra that you are using.
  • I have version 3.7
  • Thanks! Worked for me, and is indeed fast. Could you please add 'load' agrument to the example? Complains w/o it.
  • I'm surprised there was no follow-up to this. Seems to me the INSERT command is different from the COPY to load from a file -- which still as far as I can see only speaks CSV. Can you please be more specific about how to load an entire JSON file? basically, the inverse of sstabledump...?