Change Data Capture¶
Overview¶
Change data capture (CDC) provides a mechanism to flag specific tables for archival as well as rejecting writes to those
tables once a configurable size-on-disk for the combined flushed and unflushed CDC-log is reached. An operator can
enable CDC on a table by setting the table property cdc=true
(either when creating the table or altering it), after which any CommitLogSegments containing
data for a CDC-enabled table are moved to the directory specified in cassandra.yaml
on segment discard. A threshold
of total disk space allowed is specified in the yaml at which time newly allocated CommitLogSegments will not allow CDC
data until a consumer parses and removes data from the destination archival directory.
Configuration¶
Enabling or disable CDC on a table¶
CDC is enable or disable through the cdc table property, for instance:
CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;
ALTER TABLE foo WITH cdc=true;
ALTER TABLE foo WITH cdc=false;
cassandra.yaml parameters¶
The following cassandra.yaml are available for CDC:
cdc_enabled
(default: false)- Enable or disable CDC operations node-wide.
cdc_raw_directory
(default:$CASSANDRA_HOME/data/cdc_raw
)- Destination for CommitLogSegments to be moved after all corresponding memtables are flushed.
cdc_free_space_in_mb
: (default: min of 4096 and 1/8th volume space)- Calculated as sum of all active CommitLogSegments that permit CDC + all flushed CDC segments in
cdc_raw_directory
. cdc_free_space_check_interval_ms
(default: 250)- When at capacity, we limit the frequency with which we re-calculate the space taken up by
cdc_raw_directory
to prevent burning CPU cycles unnecessarily. Default is to check 4 times per second.
Reading CommitLogSegments¶
This implementation included a refactor of CommitLogReplayer into CommitLogReader.java. Usage is fairly straightforward with a variety of signatures available for use. In order to handle mutations read from disk, implement CommitLogReadHandler.
Warnings¶
Do not enable CDC without some kind of consumption process in-place.
The initial implementation of Change Data Capture does not include a parser (see Reading CommitLogSegments above)
so, if CDC is enabled on a node and then on a table, the cdc_free_space_in_mb
will fill up and then writes to
CDC-enabled tables will be rejected unless some consumption process is in place.