rqlite Design
rqlite has been in development since 2014, and its design and implementation has evolved substantially during that time. The distributed consensus system has changed, the API has improved enormously, and support for automatic clustering and node-discovery was introduced along the way.
High-level design
The diagram below shows a high-level view of a rqlite node, as it’s currently implemented.
Design presentations
There have also been a series of presentations to various groups – both industry and academic.
- Build your own Distributed System using Go given at GopherCon 2023. While not specifically about rqlite, it explains the key principles behind building a system such as rqlite. You can also view a recording of this talk on YouTube.
- Presentation given to Hacker Nights NYC, March 2022.
- Presentation given to the Carnegie Mellon Database Group, September 2021. There is also a video recording of the talk.
- Presentation given to the University of Pittsburgh, April 2018.
- Presentation given at the GoSF April 2016 Meetup.
Blog posts
The most important design articles, linked below, show how the database has evolved through the years:
- Introduction to replicating SQLite with Raft (2014)
- Moving to an upgraded Raft consensus system (2016)
- Building an rqlite discovery service using AWS Lamda (2017)
- Moving from JSON to Protocol Buffers for internal data structures (2020)
- Comparing disk usage across database releases (2021)
- 7 years of open-source database development - lessons learned (2021)
- The evolution of a distributed database design (2021)
- Static linking and smaller Docker images (2021)
- Designing node discovery and automatic clustering (2022)
- Evaluating rqlite consistency with Jepsen-style testing (2022)
- Trading durability for write performance (2022)
- How rqlite exposed a bug in SQLite (2022)
- 9 years of open-source database development (2023)
- Adding large data set support (2023)
You can find many other details on rqlite from the rqlite blog.
Other Design Details
Raft
The Raft layer always creates a file – it creates the Raft log. This log stores the set of committed SQLite commands, in the order which they were executed. This log is authoritative record of every change that has happened to the system. It may also contain some read-only queries as entries, depending on read-consistency choices. Since every node in an rqlite cluster applies the entries log in exactly the same way, this guarantees that the SQLite database is the same on every node.
Log Compaction and Truncation
rqlite automatically performs log compaction, so that disk usage due to the log remains bounded. After a configurable number of changes rqlite snapshots the SQLite database, and truncates the Raft log. This is a technical feature of the Raft consensus system, and most users of rqlite need not be concerned with this.
SQLite
SQLite runs in WAL mode and with SYNCHRONOUS=off
. In normal operation this configuration risks database corruption in the event of crash, but does provide substantially better write performance. However, since the SQLite database is completely recreated everytime rqlited
starts, using the information stored in the Raft log, corruption is a non-issue.
Autoclustering
When using Automatic Bootstrapping, each node notifies all other nodes of its existence. The first node to have been contacted by enough other nodes (set by -boostrap-expect
) bootstraps the cluster. Only one node can bootstrap a cluster, so any other node that attempts to do so later will fail, and instead become a Follower in the new cluster.
When using either Consul or etcd for automatic clustering rqlite uses the key-value store of those systems. Each node attempts to atomically set a special key (the node writes its HTTP and Raft network addresses as the value for the key). Only one node will succeed in doing this and will then declare itself Leader, and other nodes will then join with it. To prevent multiple nodes updating the Leader key at once, nodes uses a check-and-set operation, only updating the special key if its value has not changed since it was last read by the node. See this blog post for more details on the design.
For DNS-based discovery, the rqlite nodes resolve the hostname. Once the number of returned addresses is at least as great as the -bootstrap-expect
the nodes will attempt a bootstrap. Bootstrapping proceeds as though the network addresses were passed at the command line via -join
.