|
| 1 | +# Encryption at Rest in Dgraph and Badger |
| 2 | + |
| 3 | +Badger provides encryption at rest using AES encryption, enabling compliance with security standards |
| 4 | +such as HIPAA and PCI DSS. This feature was introduced in Badger v2 and is available to all systems |
| 5 | +built on Badger, including Dgraph. |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +Badger implements encryption at the storage layer, allowing systems like Dgraph to inherit |
| 10 | +encryption capabilities without additional implementation. This separation of concerns means: |
| 11 | + |
| 12 | +- Badger manages data security and encryption at the disk level |
| 13 | +- Higher-level systems like Dgraph focus on distributed operations and graph semantics |
| 14 | +- All Badger-based systems benefit from encryption improvements |
| 15 | + |
| 16 | +## Encryption Algorithm |
| 17 | + |
| 18 | +Badger uses the |
| 19 | +[Advanced Encryption Standard (AES)](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard), |
| 20 | +standardized by NIST and widely adopted across databases including MongoDB, SQLite, and RocksDB. AES |
| 21 | +is a symmetric encryption algorithm: the same key encrypts and decrypts data. |
| 22 | + |
| 23 | +AES key sizes: 128, 192, or 256 bits. All provide strong security; 128-bit keys are computationally |
| 24 | +infeasible to brute force. |
| 25 | + |
| 26 | +## Key Management |
| 27 | + |
| 28 | +Badger uses a two-tier key system: |
| 29 | + |
| 30 | +### Master Key |
| 31 | + |
| 32 | +The user-provided AES encryption key that encrypts data keys. Master key length determines AES |
| 33 | +variant: |
| 34 | + |
| 35 | +- 16 bytes: AES-128 |
| 36 | +- 24 bytes: AES-192 |
| 37 | +- 32 bytes: AES-256 |
| 38 | + |
| 39 | +**Important:** Use a cryptographically secure random key. Never use predictable strings. Generate |
| 40 | +keys using a password manager or secure random generator. |
| 41 | + |
| 42 | +### Data Keys |
| 43 | + |
| 44 | +Auto-generated keys that encrypt actual data on disk. Each encrypted data key is stored alongside |
| 45 | +the encrypted data. Master keys encrypt data keys, not data directly. |
| 46 | + |
| 47 | +**Benefits:** |
| 48 | + |
| 49 | +- Master key rotation only requires re-encrypting data keys (small, fast operation) |
| 50 | +- Data keys rotate automatically without re-encrypting all data |
| 51 | +- Minimal performance impact during key rotation |
| 52 | + |
| 53 | +## Key Rotation |
| 54 | + |
| 55 | +### Data Key Rotation |
| 56 | + |
| 57 | +Badger automatically rotates data keys every 10 days by default. Configure the rotation interval |
| 58 | +using `Options.WithEncryptionKeyRotationDuration`. |
| 59 | + |
| 60 | +All historical data keys are retained to decrypt older data. Each data key is 32 bytes; 1000 keys |
| 61 | +consume 32KB. At 10-day intervals, this represents approximately 27 years of keys. |
| 62 | + |
| 63 | +### Master Key Rotation |
| 64 | + |
| 65 | +Users must manually rotate master keys. Use the `rotate` command: |
| 66 | + |
| 67 | +```shell |
| 68 | +badger rotate --dir=badger_dir --old-key-path=old/path --new-key-path=new/path |
| 69 | +``` |
| 70 | + |
| 71 | +**Requirements:** |
| 72 | + |
| 73 | +- Database must be offline during master key rotation |
| 74 | +- Only data keys are re-encrypted (fast operation) |
| 75 | +- Future versions may support online rotation |
| 76 | + |
| 77 | +## Initialization Vectors |
| 78 | + |
| 79 | +To prevent identical plaintext from producing identical ciphertext, Badger uses Initialization |
| 80 | +Vectors (IVs). |
| 81 | + |
| 82 | +### SSTable Encryption |
| 83 | + |
| 84 | +Each 4KB block in SSTables uses a unique 16-byte IV stored in plaintext at the end of the encrypted |
| 85 | +block. Storage overhead: 0.4% (16 bytes per 4KB block). |
| 86 | + |
| 87 | +**Security:** IVs can be stored in plaintext. Decryption requires the data key, which requires the |
| 88 | +master key. Knowledge of the IV alone is insufficient. |
| 89 | + |
| 90 | +### Value Log Encryption |
| 91 | + |
| 92 | +Value log entries are encrypted individually to match access patterns. To minimize storage overhead, |
| 93 | +Badger uses a 12-byte file-level IV combined with a 4-byte value offset to form the 16-byte IV. |
| 94 | + |
| 95 | +**Benefits:** |
| 96 | + |
| 97 | +- Saves 16 bytes per value entry |
| 98 | +- 12-byte overhead per vlog file (vs 16 bytes per value) |
| 99 | +- For 10,000 entries: 12 bytes total vs 160,000 bytes with per-value IVs |
| 100 | + |
| 101 | +## Enabling Encryption |
| 102 | + |
| 103 | +### New Database |
| 104 | + |
| 105 | +Enable encryption when creating a new database: |
| 106 | + |
| 107 | +```go |
| 108 | +opts := badger.DefaultOptions("/tmp/badger"). |
| 109 | + WithEncryptionKey(masterKey). |
| 110 | + WithEncryptionKeyRotationDuration(dataKeyRotationDuration) // defaults to 10 days |
| 111 | +``` |
| 112 | + |
| 113 | +### Existing Database |
| 114 | + |
| 115 | +Enable encryption on an unencrypted database: |
| 116 | + |
| 117 | +```shell |
| 118 | +badger rotate --dir=badger_dir --new-key-path=new/path |
| 119 | +``` |
| 120 | + |
| 121 | +**Note:** This enables encryption for new files only. Existing data is encrypted during compaction |
| 122 | +as new files are generated. Badger operates in hybrid mode, tracking encryption status per file. |
| 123 | + |
| 124 | +### Immediate Full Encryption |
| 125 | + |
| 126 | +To encrypt all existing data immediately: |
| 127 | + |
| 128 | +1. Export the database: `badger backup --dir=badger_dir -f backup.bak` |
| 129 | +2. Create a new encrypted database instance |
| 130 | +3. Restore the data: `badger restore --dir=new_badger_dir -f backup.bak` |
| 131 | + |
| 132 | +Alternatively, use the Stream Framework and StreamWriter interface for in-place encryption with high |
| 133 | +throughput. |
| 134 | + |
| 135 | +## Security Considerations |
| 136 | + |
| 137 | +### Key Security |
| 138 | + |
| 139 | +- Store master keys securely (key management service, secure vault) |
| 140 | +- Rotate master keys regularly |
| 141 | +- Use strong, randomly generated keys |
| 142 | +- Protect physical access to systems performing encryption |
| 143 | + |
| 144 | +### Key Leakage |
| 145 | + |
| 146 | +Key security is more critical than key size. Threats include: |
| 147 | + |
| 148 | +- Side-channel attacks (electromagnetic radiation analysis) |
| 149 | +- Key reuse patterns enabling cryptanalysis |
| 150 | +- Physical access to encryption systems |
| 151 | + |
| 152 | +Regular key rotation mitigates these risks. |
| 153 | + |
| 154 | +## Terminology |
| 155 | + |
| 156 | +In this context, "key" refers to: |
| 157 | + |
| 158 | +- **Database key**: The key in a key-value pair stored in Badger |
| 159 | +- **Encryption key**: The cryptographic key used for encryption/decryption (master key or data key) |
| 160 | + |
| 161 | +When ambiguous, this document uses "encryption key" for cryptographic keys. |
0 commit comments