Skip to content

[Config] Document 'snapshot' and 'wal' configuration settings #4023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions doc/book/admin/backups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,15 @@ that are made after the last snapshot are incremental backups. Therefore taking
a backup is a matter of copying the snapshot and WAL files.

1. Use ``tar`` to make a (possibly compressed) copy of the latest .snap and .xlog
files on the :ref:`memtx_dir <cfg_basic-memtx_dir>` and
:ref:`wal_dir <cfg_basic-wal_dir>` directories.
files on the :ref:`snapshot.dir <configuration_reference_snapshot_dir>` and
:ref:`wal.dir <configuration_reference_wal_dir>` directories.

2. If there is a security policy, encrypt the .tar file.

3. Copy the .tar file to a safe place.

Later, restoring the database is a matter of taking the .tar file and putting
its contents back in the ``memtx_dir`` and ``wal_dir`` directories.
its contents back in the ``snapshot.dir`` and ``wal.dir`` directories.

.. _admin-backups-hot_backup_vinyl_memtx:

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
snapshot:
dir: 'var/lib/{{ instance_name }}/snapshots'
count: 3
by:
interval: 7200
wal_size: 1000000000000000000
iproto:
listen:
- uri: '127.0.0.1:3301'
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
instance001:
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
wal:
dir: 'var/lib/{{ instance_name }}/wals'
mode: 'write'
dir_rescan_delay: 3
cleanup_delay: 18000
max_size: 268435456
ext:
new: true
old: true
spaces:
bands:
old: false
iproto:
listen:
- uri: '127.0.0.1:3301'
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
instance001:
2 changes: 1 addition & 1 deletion doc/concepts/atomic/thread_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ There are also several supplementary threads that serve additional capabilities:
Separate threads are required because each replica can point to a different position in the log and can run at different speeds.

* There is a thread pool for ad hoc asynchronous tasks,
such as a DNS resolver or :ref:`fsync <cfg_binary_logging_snapshots-wal_mode>`.
such as a DNS resolver or :ref:`fsync <configuration_reference_wal_mode>`.

* There are OpenMP threads used to parallelize sorting
(hence, to parallelize building :ref:`indexes <concepts-data_model_indexes>`).
Expand Down
5 changes: 4 additions & 1 deletion doc/concepts/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,8 @@ When the limit is reached, ``INSERT`` or ``UPDATE`` requests fail with :ref:`ER_
Snapshots and write-ahead logs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``snapshot.dir`` and ``wal.dir`` options can be used to configure directories for storing snapshots and write-ahead logs.
The :ref:`snapshot.dir <configuration_reference_snapshot_dir>` and :ref:`wal.dir <configuration_reference_wal_dir>`
options can be used to configure directories for storing snapshots and write-ahead logs.
For example, you can place snapshots and write-ahead logs on different hard drives for better reliability.

.. code-block:: yaml
Expand All @@ -438,6 +439,7 @@ For example, you can place snapshots and write-ahead logs on different hard driv
dir: '/media/drive2/wals'

To learn more about the persistence mechanism in Tarantool, see the :ref:`Persistence <concepts-data_model-persistence>` section.
Read more about snapshot and WAL configuration: :ref:`Persistence <configuration_persistence>`.



Expand All @@ -447,6 +449,7 @@ To learn more about the persistence mechanism in Tarantool, see the :ref:`Persis

configuration/configuration_etcd
configuration/configuration_code
configuration/configuration_persistence
configuration/configuration_connections
configuration/configuration_credentials
configuration/configuration_authentication
Expand Down
284 changes: 284 additions & 0 deletions doc/concepts/configuration/configuration_persistence.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
.. _configuration_persistence:

Persistence
===========

To ensure data persistence, Tarantool provides the abilities to:

* Record each data change request into a :ref:`write-ahead log <internals-wal>` (WAL) file (``.xlog`` files).
* Take :ref:`snapshots <internals-snapshot>` that contain an on-disk copy of the entire data set for a given moment
(``.snap`` files). It is possible to set automatic snapshot creation using the :ref:`checkpoint daemon <configuration_persistence_checkpoint_daemon>`.

During the recovery process, Tarantool can load the latest snapshot file and then read the requests from the WAL files,
produced after this snapshot was made.
This topic describes how to configure:

* the snapshot creation in the :ref:`snapshot <configuration_reference_snapshot>` section of a :ref:`YAML configuration <configuration>`.
* the recording to the write-ahead log in the :ref:`wal <configuration_reference_wal>` section of a YAML configuration.

To learn more about the persistence mechanism in Tarantool, see the :ref:`Persistence <concepts-data_model-persistence>` section.
The formats of WAL and snapshot files are described in detail in the :ref:`File formats <internals-data_persistence>` section.

.. _configuration_persistence_snapshot:

Configure the snapshots
-----------------------

**Example on GitHub**: `snapshot <https://github.com/tarantool/doc/tree/latest/doc/code_snippets/snippets/config/instances.enabled/persistence_snapshot>`_

This section describes how to define snapshot settings in the :ref:`snapshot <configuration_reference_snapshot>` section of a YAML configuration.

.. _configuration_persistence_snapshot_creation:

Set up automatic snapshot creation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In Tarantool, it is possible to automate the :ref:`snapshot creation </reference/reference_lua/box_snapshot>`.
Automatic creation is enabled by default and can be configured in two ways:

* A new snapshot is taken once in a given period (see :ref:`snapshot.by.interval <configuration_reference_snapshot_by_interval>`).
* A new snapshot is taken once the size of all WAL files created since the last snapshot exceeds a given limit
(see :ref:`snapshot.by.wal_size <configuration_reference_snapshot_by_wal_size>`).

The ``snapshot.by.interval`` option sets up the :ref:`checkpoint daemon <configuration_persistence_checkpoint_daemon>`
that takes a new snapshot every ``snapshot.by.interval`` seconds.
If the ``snapshot.by.interval`` option is set to zero, the checkpoint daemon is disabled.

The ``snapshot.by.wal_size`` option defines the maximum size in bytes for all WAL files created since the last snapshot taken.
Once this size is exceeded, the checkpoint daemon takes a snapshot. Then, :ref:`Tarantool garbage collector <configuration_persistence_garbage_collector>`
deletes the old WAL files.

The example shows how to specify the ``snapshot.by.interval`` and the ``snapshot.by.wal_size`` options:

.. literalinclude:: /code_snippets/snippets/config/instances.enabled/persistence_snapshot/config.yaml
:language: yaml
:start-at: by:
:end-at: 1000000000000000000
:dedent:

In the example, a new snapshot is created in two cases:

* every 2 hours (every 7200 seconds)
* when the size for all WAL files created since the last snapshot reaches the size of 1e18 (1000000000000000000) bytes.

.. _configuration_persistence_snapshot_dir:

Specify a directory for snapshot files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To configure a directory where the snapshot files are stored, use the :ref:`snapshot.dir <configuration_reference_snapshot_dir>`
configuration option.
The example below shows how to specify a snapshot directory for ``instance001`` explicitly:

.. literalinclude:: /code_snippets/snippets/config/instances.enabled/persistence_snapshot/config.yaml
:language: yaml
:start-at: instance001:
:end-at: 'var/lib/{{ instance_name }}/snapshots'
:dedent:

By default, WAL files and snapshot files are stored in the same directory ``var/lib/{{ instance_name }}``.
However, you can specify different directories for them.
For example, you can place snapshots and write-ahead logs on different hard drives for better reliability:

.. code-block:: yaml

instance001:
snapshot:
dir: '/media/drive1/snapshots'
wal:
dir: '/media/drive2/wals'

.. _configuration_persistence_snapshot_count:

Configure a maximum number of stored snapshots
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can set a limit on the number of snapshots stored in the :ref:`snapshot.dir <configuration_reference_snapshot_dir>`
directory using the :ref:`snapshot.count <configuration_reference_snapshot_count>` option.
Once the number of snapshots reaches the given limit, :ref:`Tarantool garbage collector <configuration_persistence_garbage_collector>`
deletes the oldest snapshot file and any associated WAL files after the new snapshot is taken.

In the example below, the snapshot is created every two hours (every 7200 seconds) until there are three snapshots in the
``snapshot.dir`` directory.
After creating a new snapshot (the fourth one), the oldest snapshot and the corresponding WALs are deleted.

.. literalinclude:: /code_snippets/snippets/config/instances.enabled/persistence_snapshot/config.yaml
:language: yaml
:start-at: count:
:end-at: 7200
:dedent:

.. _configuration_persistence_wal:

Configure the write-ahead log
-----------------------------

**Example on GitHub**: `wal <https://github.com/tarantool/doc/tree/latest/doc/code_snippets/snippets/config/instances.enabled/persistence_wal>`_

This section describes how to define WAL settings in the :ref:`wal <configuration_reference_wal>` section of a YAML configuration.

.. _configuration_persistence_wal_mode:

Set the WAL mode
~~~~~~~~~~~~~~~~

The recording to the write-ahead log is enabled by default.
It means that if an instance restart occurs, the data will be recovered.
The recording to the WAL can be configured using the :ref:`wal.mode <configuration_reference_wal_mode>` configuration option.

There are two modes that enable writing to the WAL:

* ``write`` (default) -- enable WAL and write the data without waiting for the data to be flushed to the storage device.
* ``fsync`` -- enable WAL and ensure that the record is written to the storage device.

The example below shows how to specify the ``write`` WAL mode:

.. literalinclude:: /code_snippets/snippets/config/instances.enabled/persistence_wal/config.yaml
:language: yaml
:start-at: mode:
:end-at: 'write'
:dedent:

To turn the WAL writer off, set the ``wal.mode`` option to ``none``.

.. _configuration_persistence_wal_dir:

Specify a directory for WAL files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To configure a directory where the WAL files are stored, use the :ref:`wal.dir <configuration_reference_wal_dir>` configuration option.
The example below shows how to specify a directory for ``instance001`` explicitly:

.. literalinclude:: /code_snippets/snippets/config/instances.enabled/persistence_wal/config.yaml
:language: yaml
:start-at: instance001:
:end-at: 'var/lib/{{ instance_name }}/wals'
:dedent:


.. _configuration_persistence_wal_rescan:

Set an interval between scans
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In case of :ref:`replication <replication>` or :ref:`hot standby <configuration_reference_database_hot_standby>` mode,
Tarantool scans for changes in the WAL files every :ref:`wal.dir_rescan_delay <configuration_reference_wal_dir_rescan_delay>`
seconds. The example below shows how to specify the interval between scans:

.. literalinclude:: /code_snippets/snippets/config/instances.enabled/persistence_wal/config.yaml
:language: yaml
:start-at: dir_rescan_delay
:end-before: cleanup_delay
:dedent:

.. _configuration_persistence_wal_maxsize:

Set a maximum size for the WAL file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A new WAL file is created when the current one reaches the :ref:`wal.max_size <configuration_reference_wal_max_size>`
size. The configuration for this option might look as follows:

.. literalinclude:: /code_snippets/snippets/config/instances.enabled/persistence_wal/config.yaml
:language: yaml
:start-at: max_size
:end-at: 268435456
:dedent:

.. _configuration_persistence_wal_rescan:

Set a delay for the garbage collector
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In Tarantool, the :ref:`checkpoint daemon <configuration_persistence_checkpoint_daemon>`
takes new snapshots at the given interval (see :ref:`snapshot.by.interval <configuration_reference_snapshot_by_interval>`).
After an instance restart, the Tarantool garbage collector deletes the old WAL files.

To delay the immediate deletion of WAL files, use the :ref:`wal.cleanup_delay <configuration_reference_wal_cleanup_delay>`
configuration option. The delay eliminates possible erroneous situations when the master deletes WALs
needed by :ref:`replicas <replication-roles>` after restart.
As a consequence, replicas sync with the master faster after its restart and
don't need to download all the data again.

In the example, the delay is set to 5 hours (18000 seconds):

.. literalinclude:: /code_snippets/snippets/config/instances.enabled/persistence_wal/config.yaml
:language: yaml
:start-at: cleanup_delay
:end-at: 18000
:dedent:

.. _configuration_persistence_wal_ext:

Specify the WAL extensions
~~~~~~~~~~~~~~~~~~~~~~~~~~

In Tarantool Enterprise, you can store an old and new tuple for each CRUD operation performed.
A detailed description and examples of the WAL extensions are provided in the :ref:`WAL extensions <wal_extensions>` section.

See also: :ref:`wal.ext.* <configuration_reference_wal_ext>` configuration options.

.. _configuration_persistence_checkpoint_daemon:

Checkpoint daemon
-----------------

The checkpoint daemon (snapshot daemon) is a constantly running :ref:`fiber <app-fibers>`.
The checkpoint daemon creates a schedule for the periodic snapshot creation based on
the :ref:`configuration options <configuration_reference_snapshot_by>` and the speed of file size growth.
If enabled, the daemon makes new :ref:`snapshot <concepts-data_model-persistence>` (``.snap``) files according to this schedule.

The work of the checkpoint daemon is based on the following configuration options:

* :ref:`snapshot.by.interval <configuration_reference_snapshot_by_interval>` -- a new snapshot is taken once in a given period.
* :ref:`snapshot.by.wal_size <configuration_reference_snapshot_by_wal_size>` -- a new snapshot is taken once the size
of all WAL files created since the last snapshot exceeds a given limit.

If necessary, the checkpoint daemon also activates the :ref:`Tarantool garbage collector <configuration_persistence_garbage_collector>`
that deletes old snapshots and WAL files.

.. _configuration_persistence_garbage_collector:

Tarantool garbage collector
---------------------------

Tarantool garbage collector can be activated by the :ref:`checkpoint daemon <configuration_persistence_checkpoint_daemon>`.
The garbage collector tracks the snapshots that are to be :ref:`relayed to a replica <memtx-replication>` or needed
by other consumers. When the files are no longer needed, Tarantool garbage collector deletes them.

.. NOTE::

The garbage collector called by the checkpoint daemon is distinct from the `Lua garbage collector <https://www.lua.org/manual/5.1/manual.html#2.10>`_
which is for Lua objects, and distinct from the Tarantool garbage collector that specializes in :ref:`handling shard buckets <vshard-gc>`.

This garbage collector is called as follows:

* When the number of snapshots reaches the limit of :ref:`snapshot.count <configuration_reference_snapshot_count>` size.
After a new snapshot is taken, Tarantool garbage collector deletes the oldest snapshot file and any associated WAL files.

* When the size of all WAL files created since the last snapshot reaches the limit of :ref:`snapshot.by.wal_size <configuration_reference_snapshot_by_wal_size>`.
Once this size is exceeded, the checkpoint daemon takes a snapshot, then the garbage collector deletes the old WAL files.

If an old snapshot file is deleted, the Tarantool garbage collector also deletes
any :ref:`write-ahead log (.xlog) <internals-wal>` files that meet the following conditions:

* The WAL files are older than the snapshot file.
* The WAL files contain information present in the snapshot file.

Tarantool garbage collector also deletes obsolete vinyl ``.run`` files.

Tarantool garbage collector doesn't delete a file in the following cases:

* A backup is running, and the file has not been backed up
(see :ref:`Hot backup <admin-backups-hot_backup_vinyl_memtx>`).

* Replication is running, and the file has not been relayed to a replica
(see :ref:`Replication architecture <replication-architecture>`),

* A replica is connecting.

* A replica has fallen behind.
The progress of each replica is tracked; if a replica's position is far
from being up to date, then the server stops to give it a chance to catch up.
If an administrator concludes that a replica is permanently down, then the
correct procedure is to restart the server, or (preferably) :ref:`remove the replica from the cluster <replication-remove_instances>`.
2 changes: 1 addition & 1 deletion doc/concepts/data_model/operations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ resource usage of each function.
important than the others.
* - WAL settings
- The important setting for the write-ahead log is
:ref:`wal_mode <cfg_binary_logging_snapshots-wal_mode>`.
:ref:`wal.mode <configuration_reference_wal_mode>`.
If the setting causes no writing or
delayed writing, this factor is unimportant. If the
setting causes every data-change request to wait
Expand Down
Loading