Been reading up on a Cassandra, and I get shake the feeling thats its REALLY not fault tolerant, is it?
I mean, take a very simple scenario, incoming write, you write to to the WAL, to the memtable and then mark in the WAL that the write succeeded and then the server crashes before the memtable gets full so its not flushed to disk as an SSTable, meaning I just lost this write + I wont be able to redo it since its marked as "Done" in the WAL.
Am I missing something here or is it really not fault tolerant? Which seems very weird to me since its used in so many places and for so much data, which makes me think im missing something.
The commit log is written to before the memtable. You just write the mutation, there is no marking the mutation as applied to the memtable. The mutation is not removed from the commitlog until after the memtable has been completely flushed to a new sstable.
Although it is important to know, with some commitlog strategies they dont block the ack from write on the commitlog flush, so you can still have a data loss window that is only protected with RF. So its important to know the consistency levels and replication factors for durability as well in those cases. In 4.0+ I think the group commitlog sync is great option between batch and periodic.