I recently went through the exciting task of decommissioning a Cassandra data center. Although the documentation out on the web is pretty good for this I thought it was worth a post on a couple of the technicalities and "what to expect" questions I had prior to starting the process. The basic steps to remove a DC in Cassandra are outlined in the Cassandra 1.2 documentation. To recap the basic steps are to:
- Turn off all clients writing to the DC (don't forget Datastax-Agents)
- Run a repair to ensure your data is replicated out
- Change all Keyspaces so they don't reference the datacenter
- Run nodetool decommission on each node
Of note. The decommission process is actually pretty fast. Based on the decommission documentation the node will "streaming its data to the next node on the ring" but in practice there is not much data to stream, since you did step 3 and no Keyspaces are assigned to this DC. The command took around a minute to run for me. I also had a few nodes that were no longer up in the ring as they had been cannibalized for other tasks and using nodetool removenode had the same end result as decommission.
One thing to note is that decommission and removetoken both create quite a bit of load on the other data centers in the cluster. It wasn't enough to noticeably move the read/write response times but it was enough that I wouldn't do a bunch of nodes all at once. I put them in a loop with a 10 minute sleep in between. Sure the script takes a long time but safety first in production.
Finally a neat little piece of info, if you want to see what you did check out nodetool gossipinfo. It will still show all your removed nodes and also show the difference in which method they were removed. I'm unsure how long this data stays in gossip; I might update this post once it disappears from my cluster. Possibly it'd be related to the gc_grace time. Gossipinfo as a command is super useful to see what your cluster is actually doing, it will show nodes that are bootstrapping but are not fully up yet prior to them showing up in
nodetool status or
The first of these was decommissioned, the second removenode.
/10.x.x.x HOST_ID:25a22f35-5084-431d-9333-988749e80c1a DC:DC_BYE_BYE STATUS:LEFT,1401043794383,55045677001916516148487068849256622533 /10.x.x.x REMOVAL_COORDINATOR:REMOVER,76765f8d-07bc-495a-9e61-2a5ebbe58eb3 HOST_ID:48c08179-685d-41cb-89b8-8d09d4ae5b26 DC:DC_BYE_BYE STATUS:removed,48c08179-685d-41cb-89b8-8d09d4ae5b26,1401036719006