What delays a tombstone purge when using LCS in Cassandra

cassandra tombstone performance
cassandra collection tombstones
how to see tombstones in cassandra
cassandra ttl tombstones
cassandra compaction
cassandra truncate tombstone
compaction strategy cassandra
estimated tombstone drop times

In a C* 1.2.x cluster we have 7 keyspaces and each keyspace contains a column family that uses wide rows. The cf uses LCS. I am periodically doing deletes in the rows. Initially each row may contain at most 1 entry per day. Entries older than 3 months are deleted and at max 1 entry per week is kept. I have been running this for a few months but disk space isn't really reclaimed. I need to investigate why. For me it looks like the tombstones are not purged. Each keyspace has around 1300 sstable files (*-Data.db) and each file is around 130 Mb in size (sstable_size_in_mb is 128). GC grace seconds is 864000 in each CF. tombstone_threshold is not specified, so it should default to 0.2. What should I look at to find out why diskspace isn't reclaimed?

How to Choose which Tombstones to Drop, They often become a problem when Cassandra is not able to purge them in a Delays in purging happen because there are a number of conditions that must If a table is using LCS, Cassandra does not allow invoking the  When we look at the source, we see Cassandra will consider deleting a tombstone when a SSTable undergoes a compaction. Cassandra can only delete the tombstone if: The tombstone is older than gc_grace_seconds (a table property). There is no other SSTable outside of this compaction that: Contains a fragment of the same partition the tombstone belongs to, and; The timestamp of any value (in the other SSTable) is younger than the tombstone.

I was hoping for magic sauce here.

We are going to do a JMX-triggered LCS -> STCS -> LCS in a rolling fashion through the cluster. The switching of compaction strategy forces LCS structured sstables to restructure and apply the tombstones (in our version of cassandra we can't force an LCS compact).

There are nodetool commands to force compactions between tables, but that might screw up LCS. There are also nodetool commands to reassign the level of sstables, but again, that might foobar LCS if you muck with its structure.

What really should probably happen is that row tombstones should be placed in a separate sstable type that can be independently processed against "data" sstables to get the purge to occur. The tombstone sstable <-> data sstable processing doesn't remove the tombstone sstable, just removes tombstones from the tombstone sstable that are no longer needed after the data sstable was processed/pared/pruned. Perhaps these can be classified as "PURGE" tombstones for large scale data removals as opposed to more ad-hoc "DELETE" tombstones that would be intermingled with data. But who knows when that would be added to Cassandra.

How is data maintained?, Instead, the database marks deleted data with tombstones. for Time Series Workloads · What delays a tombstone purge when using LCS in Cassandra. Note: Keep in mind that the maximum overhead when using LCS is the sum of N-1 levels. For example, given a maximum table size of 160 megabytes, once past level 3, overhead requirements expand drastically from 1.7 terabytes at level 4 to 17 terabytes at level 5:

Thanks for the great explanation of LCS, @minaguib. I think the statement from Datastax is misleading, at least to me

 at most 10% of space will be wasted by obsolete rows.

Depends on how we define the "obsolete rows". If "obsolete rows" is defined as ALL the rows which are supposed to be compacted, in your example, these "obsolete rows" will be age=30, age=29, age=28. We can end up wasting (N-1)/N space as these "age" can be in different levels.

How is data maintained?, Cassandra processes data at several stages on the write path. Compaction to What delays a tombstone purge when using LCS in Cassandra. On this page. Note: Keep in mind that the maximum overhead when using LCS is the sum of N-1 levels. For example, given a maximum table size of 160 megabytes, once past level 3, overhead requirements expand drastically from 1.7 terabytes at level 4 to 17 terabytes at level 5:

Managing Tombstones in Cassandra, Inserting data into parts of a collection. What is the normal lifecycle of tombstones​? Tombstones are written with a timestamp. Under ideal  Almost 60% of data in column families met the purge rules. Since Cassandra delete is actually an upsert statement in disguise, it essentially translates to a momentary increase in the db storage space until the gc_grace_seconds value is reached and compaction removes the tombstones. However, this was not the case.

Documentation - Apache Cassandra, For STCS this will most likely include all sstables but with LCS it can issue the long Cassandra will retain tombstones through compaction events before finally  Indeed, the ability to avoid tombstone creation by not binding specific parameters needs a combination of the DataStax Java Driver 3.0.0 and Cassandra 2.2+. Workarounds If still running an earlier version of Cassandra, and at the time of this writing most production clusters may still be running Cassandra 2.0 or 2.1, there are a few options to

scylla-tools-java/CHANGES.txt at master · scylladb/scylla-tools-java , Delay hints store excise by write timeout to avoid race with decommission Purge tombstones created by expired cells (CASSANDRA-13643). * Make concat Allow STCS-in-L0 compactions to reduce scope with LCS (CASSANDRA-​12040). Question: Is there any way for me to force Cassandra to purge tombstones for all sstables of a certain columnfamily (running LCS)? I asked this quesion on IRC and AFAIK the only way would be to switch to SizeTiered compaction strategy, issuing a major compaction, and then switching back to LCS.