etcd is a distributed, highly available key/value datastore for state within a cluster. etcd is used by Kubernetes cluster store to maintain configuration and other state. etcd is designed for large scale distributed systems, that will never tolerate split brain operation (network partitions) and are willing to sacrifice availability to achieve this. etcd prefers consistence over availability. Also see the CAP Theorem. etcd can be used as a consistent key-value store for configuration management, service discovery and coordinating distributed work.
etcd resolves write consistency issues suing the RAFT consensus algorithm.
etcd stores the physical data as key-value pairs in a persistent B+ tree. Each revision of the store's state only contains the delta from its previous revision. A single revision may correspond to multiple keys in the tree. The key of key-value pair is a 3-tuple (major, sub, type): Major is the store revision holding the key. Sub differentiates among keys with the same revision. Type is an optional suffix for special values.. The value of the key-value pair contains the modifications from previous revision. The b+tree is ordered by key in lexical byte-order. Ranged lookups over revision delta are fast; this enables quickly finding modification from one specific revision to another. Compaction removes out-of-date key-value pairs.
etcd also keeps a secondary in-memory btree index to speed up range queries over keys. The keys in the btree index are the keys of the store exposed to user. The value is a pointer to the modification of the persistent b+tree. Compaction removes dead pointers.