|
| 1 | +# File Name Encryption using AES-SIV-512 + Base64-URL-Encoding |
| 2 | + |
| 3 | +## Directory Metadata |
| 4 | + |
| 5 | +Every directory requires certain metadata that affects the file name encryption of its direct children: |
| 6 | + |
| 7 | +* the seed used to derive keys |
| 8 | +* the directory ID |
| 9 | + |
| 10 | +This data is immutable and therefore linked with a directory eternally, surviving renames/moves. This data is stored in a file called `dir.uvf`, which is stored in two places: |
| 11 | +1. Within the parent dir (except for root), where it serves a link to the child dir |
| 12 | +2. In the child dir itself (allowing disaster recovery without the parent). |
| 13 | + |
| 14 | + |
| 15 | +> [!NOTE] Disaster recovery without the parent |
| 16 | +> Imagine cleartext folder structure `a/b/c/` but sync fails and `a/b/` gets lost. With this information, `?/?/c/` and |
| 17 | +> all its children can still be recovered. The `dirId` required for name decryption would otherwise only be available |
| 18 | +> within the lost parent dir. |
| 19 | +
|
| 20 | +The exact file structure of `dir.uvf` will be discussed in more detail [below](#format-of-diruvf-and-symlinkuvf). |
| 21 | + |
| 22 | +### Directory Seed |
| 23 | + |
| 24 | +At creation time, a directory's seed is always the `latestSeed` as defined in the [vault metadata file](../vault%20metadata/README.md#encrypted-content). In case of the root directory this happens to also be the `initialSeed`. |
| 25 | + |
| 26 | +When navigating into a directory, the seed is read from the corresponding `dir.uvf`'s file header. More precisely, the header contains the seed ID, and the seed can the be looked up in the `seeds` in [vault metadata file](../vault%20metadata/README.md#encrypted-content). |
| 27 | + |
| 28 | +> [!NOTE] |
| 29 | +> A directory's seed is used to derive file name encryption keys used to encrypt the directory's direct children's names. Since the seed is immutable, new child nodes added at a later time will use the same file name encryption keys as old children. |
| 30 | +> |
| 31 | +> Consequently, key rotation only affects file names of nodes added to newly created directories. Key rotation is ineffective for children added to preexisting directories. |
| 32 | +
|
| 33 | +### Directory ID |
| 34 | + |
| 35 | +The directory ID is a unique sequence of 32 random bytes (taking the birthday paradox into account, collision probability is therefore 2^-128). |
| 36 | + |
| 37 | +```ts |
| 38 | +let dirId = csprng(len: 32) |
| 39 | +``` |
| 40 | + |
| 41 | +The only exception to this is the root directory ID, which is deterministically derived from the `initialSeed` using the [KDF](../kdf/README.md): |
| 42 | + |
| 43 | +```ts |
| 44 | +let rootDirId = kdf(secret: initialSeed, len: 32, context: "rootDirId") |
| 45 | +``` |
| 46 | + |
| 47 | +## Deriving Encryption Keys |
| 48 | + |
| 49 | +All file names are encrypted using AES-SIV, which requires a 512 bit key (which is internally split into two 256 bit AES keys). Furthermore we need a 256 bit key for HMAC computations. We use the directory-specific seed from `dir.uvf` and feed it into the [KDF](../kdf/README.md): |
| 50 | + |
| 51 | +```ts |
| 52 | +let sivKey = kdf(secret: seed, len: 64, context: "siv") |
| 53 | +let hmacKey = kdf(secret: seed, len: 32, context: "hmac") |
| 54 | +``` |
| 55 | + |
| 56 | +## Mapping Directory IDs to Paths |
| 57 | + |
| 58 | +When traversing directories, the directory ID of a given subdirectory is processed in three steps to determine the storage path inside the vault: |
| 59 | + |
| 60 | +1. Compute the HMAC of the `dirId` using SHA-256 and the `hmacKey` |
| 61 | +1. Truncate the result. Keep the leftmost 160 bits, discard the remaining 96 bits |
| 62 | +1. Encode the truncated hash with Base32 to get a string of printable chars |
| 63 | +1. Construct the directory path by resolving substrings of the encoded hash relative to `{vaultRoot}/d/` |
| 64 | + * split of the first two characters of the encoded hash (allowing for a total of 1024 directories within the base directory) |
| 65 | + * use the remaining 30 characters of the encoded hash as the second level directory |
| 66 | + |
| 67 | +```ts |
| 68 | +let dirIdHash = hmacSha256(data: dirId, key: hmacKey) |
| 69 | +let truncatedHash = dirIdHash[0..20] |
| 70 | +let dirIdString = base32(truncatedHash) |
| 71 | +let dirPath = vaultRoot + '/d/' + dirIdString[0..2] + '/' + dirIdString[2..32] |
| 72 | +``` |
| 73 | + |
| 74 | +> [!NOTE] |
| 75 | +> Due to the nature of hierarchical data structures, traversing file trees is an inherently top-down process, allowing the use of one-way hash functions. |
| 76 | +> |
| 77 | +> Base32 is used to get an encoding that works with case insensitive file systems and limits the number of nodes within `d/` to `32^2`. |
| 78 | +> |
| 79 | +> The truncation of the hash is done to to balance sufficient collision resistance and output length (to accommodate systems that have path length limitations). |
| 80 | +
|
| 81 | +> [!TIP] |
| 82 | +> Splitting the `dirIdString` into a path like `d/AB/CDEFGHIJKLMNOPQRSTUVWXYZ234567` is inspired by Cryptomator's former vault formats and serves two purposes: |
| 83 | +> 1. Gather all data within a single data dir, uncluttering the root dir |
| 84 | +> 2. Having at most `32^2` subdirectories within `d` |
| 85 | +
|
| 86 | +Regardless of the hierarchy of cleartext paths, ciphertext directories are always stored in a flattened structure. All directories will therefore effectively be siblings (or cousins, to be precise). |
| 87 | + |
| 88 | + |
| 89 | +## Encryption of Node Names |
| 90 | + |
| 91 | +The cleartext name of a node gets encoded using UTF-8 in [Normalization Form C](https://unicode.org/reports/tr15/#Norm*Forms) to get a unique binary representation. |
| 92 | + |
| 93 | +The byte sequence is then encrypted using AES-SIV as defined in [RFC 5297](https://tools.ietf.org/html/rfc5297). In order to bind the node to the containing directory, preventing undetected manipulation of the folder structure, the directory ID of the parent folder is used as associated data. |
| 94 | + |
| 95 | +Lastly, the ciphertext is encoded with unpadded base64url and a file extension is added. |
| 96 | + |
| 97 | +```ts |
| 98 | +let ciphertext = aesSiv(secret: cleartextName, ad: parentDirId, key: sivKey) |
| 99 | +let ciphertextName = base64url(data: ciphertext) + '.uvf' |
| 100 | +``` |
| 101 | + |
| 102 | +## Ciphertext Directory Structure |
| 103 | + |
| 104 | +### Node Types |
| 105 | + |
| 106 | +Depending on the kind of a cleartext node, the encrypted name is then either used to create a file or a directory: |
| 107 | + |
| 108 | +| cleartext node type | ciphertext structure | |
| 109 | +|---------------------|-------------------------------------| |
| 110 | +| file | file | |
| 111 | +| directory | directory containing `dir.uvf` | |
| 112 | +| symlink | directory containing `symlink.uvf` | |
| 113 | + |
| 114 | +### Format of `dir.uvf` and `symlink.uvf` |
| 115 | + |
| 116 | +Both, `dir.uvf` and `symlink.uvf` files are encrypted using the [content encryption mechanism](../file%20content%20encryption/README.md) configured for the vault. |
| 117 | + |
| 118 | +The cleartext content of `dir.uvf` is the 32 byte dirId. The seed referenced by this file's header doubles as the seed of the child directory. |
| 119 | + |
| 120 | +The cleartext content of `symlink.uvf` is an UTF-8 string in Normalization Form C, denoting the cleartext target of the symlink. |
| 121 | + |
| 122 | +> [!CAUTION] |
| 123 | +> Every `*.uvf` file MUST be encrypted independently, particularly the two `dir.uvf` copies that contain the same dirId. This is required for indistinguishable ciphertexts, avoiding the leakage of the nested dir structure. |
| 124 | +
|
| 125 | +### Example Directory Structure |
| 126 | + |
| 127 | +Thus, for a given cleartext directory structure like this... |
| 128 | + |
| 129 | +```txt |
| 130 | +. |
| 131 | +├─ File.txt |
| 132 | +├─ Symlink |
| 133 | +├─ Subdirectory |
| 134 | +│ └─ ... |
| 135 | +└─ ... |
| 136 | +``` |
| 137 | + |
| 138 | +...the corresponding ciphertext directory structure will be: |
| 139 | + |
| 140 | +```txt |
| 141 | +. |
| 142 | +├─ vault.uvf |
| 143 | +└─ d |
| 144 | + ├─ BZ |
| 145 | + │ └─ R4VZSS5PEF7TU3PMFIMON5GJRNBDWA # Root Directory |
| 146 | + │ ├─ dir.uvf # Root Directory's metadata |
| 147 | + │ ├─ 5TyvCyF255sRtfrIv83ucADQ.uvf # File.txt |
| 148 | + │ ├─ FHTa55bHsUfVDbEb0gTL9hZ8nho.uvf # Subdirectory |
| 149 | + │ │ └─ dir.uvf # Subdirectory's metadata |
| 150 | + │ └─ gLeOGMCN358UBf2Qk9cWCQl.uvf # Symlink |
| 151 | + │ └─ symlink.uvf # Symlink's target |
| 152 | + ├─ FC |
| 153 | + │ └─ ZKZRLZUODUUYTYA4457CSBPZXB5A77 # Subdirectory |
| 154 | + │ ├─ dir.uvf # Subdirectory's metadata |
| 155 | + | └─ ... # Subdirectory's children |
| 156 | + └─ ... |
| 157 | +``` |
| 158 | + |
| 159 | +### Traversing the Example Directory Structure |
| 160 | + |
| 161 | +#### List contents of `/`: |
| 162 | + |
| 163 | +1. Use `initialSeed` as a seed for the root directory |
| 164 | +1. compute `rootDirId` and corresponding ciphertext dir path -> `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA` |
| 165 | +1. list direct children within `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA` |
| 166 | + * `dir.uvf` (file) |
| 167 | + * `5TyvCyF255sRtfrIv83ucADQ.uvf` (file) |
| 168 | + * `FHTa55bHsUfVDbEb0gTL9hZ8nho.uvf` (dir) |
| 169 | + * `gLeOGMCN358UBf2Qk9cWCQl.uvf` (dir) |
| 170 | +1. for each subdirectory, determine node type |
| 171 | + * `FHTa55bHsUfVDbEb0gTL9hZ8nho.uvf` denotes a dir (contains `dir.uvf`) |
| 172 | + * `gLeOGMCN358UBf2Qk9cWCQl.uvf` denotes a symlink (contains `symlink.uvf`) |
| 173 | +1. strip file extension and decrypt file names |
| 174 | + * `File.txt` |
| 175 | + * `Subdirectory` |
| 176 | + * `Symlink` |
| 177 | + |
| 178 | +#### List contents of `/Subdirectory/`: |
| 179 | + |
| 180 | +1. read seed and decrypt `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA/FHTa55bHsUfVDbEb0gTL9hZ8nho.uvf/dir.uvf` |
| 181 | +1. read `dirId` from said file and compute ciphertext path -> `d/FC/ZKZRLZUODUUYTYA4457CSBPZXB5A77` |
| 182 | +1. Repeat dir listing procedure for `d/FC/ZKZRLZUODUUYTYA4457CSBPZXB5A77` |
| 183 | + |
| 184 | +#### Read target of `/Symlink`: |
| 185 | + |
| 186 | +1. decrypt file `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA/gLeOGMCN358UBf2Qk9cWCQl.uvf/symlink.uvf` |
| 187 | + |
| 188 | +#### Read content of `/File.txt`: |
| 189 | + |
| 190 | +1. decrypt file `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA/5TyvCyF255sRtfrIv83ucADQ.uvf` |
| 191 | + |
| 192 | +## Overview |
| 193 | + |
| 194 | +### Derivation of Directory Seed and Directory ID |
| 195 | +```mermaid |
| 196 | +--- |
| 197 | +title: Choosing a Seed and Directory ID |
| 198 | +--- |
| 199 | +flowchart TD |
| 200 | + subgraph "Which Seed?" |
| 201 | + isRoot1{is root dir?} |
| 202 | + isRoot1 -->|y| initialSeed |
| 203 | + isRoot1 -->|n| isExistingDir1 |
| 204 | +
|
| 205 | + isExistingDir1{is existing dir?} |
| 206 | + isExistingDir1 -->|y| readSeed |
| 207 | + isExistingDir1 -->|n| latestSeed |
| 208 | +
|
| 209 | + readSeed{{"read seed from `dir.uvf`"}} |
| 210 | + readSeed --> dirFileSeed[seed from dir.uvf] |
| 211 | + end |
| 212 | +
|
| 213 | + subgraph "Which dirId?" |
| 214 | + isRoot2{is root dir?} |
| 215 | + isRoot2 -->|n| isExistingDir2 |
| 216 | + isRoot2 -->|y| kdfRootDirId |
| 217 | +
|
| 218 | + isExistingDir2{is existing dir?} |
| 219 | + isExistingDir2 -->|y| readDirId |
| 220 | + isExistingDir2 -->|n| csprng32 |
| 221 | + |
| 222 | + csprng32 --> randomDirId[random dirId] |
| 223 | + csprng32{{"csprng(32)"}} |
| 224 | +
|
| 225 | + initialSeed -->|secret:| kdfRootDirId |
| 226 | + kdfRootDirId{{"kdf(secret,32,'rootDirId')"}} |
| 227 | + kdfRootDirId --> rootDirId[root dirId] |
| 228 | +
|
| 229 | + readDirId{{"read dirId from `dir.uvf`"}} |
| 230 | + readDirId --> dirFileId[dirId from dir.uvf] |
| 231 | + end |
| 232 | +``` |
| 233 | + |
| 234 | +### Mapping Directory IDs to Paths, Encryption of Directory Names and Directory Metadata |
| 235 | +```mermaid |
| 236 | +flowchart TD |
| 237 | + directorySeed -->|secret:| kdfSiv |
| 238 | + kdfSiv{{"kdf(secret,64,'siv')"}} |
| 239 | + kdfSiv --> sivKey |
| 240 | + directorySeed -->|secret:| kdfHmac |
| 241 | + kdfHmac{{"kdf(secret,32,'hmac')"}} |
| 242 | + kdfHmac --> hmacKey |
| 243 | + hmacKey -->|key:| hmacSha256 |
| 244 | + hmacSha256{{hmacSha256}} |
| 245 | + hmacSha256 --> dirIdHash |
| 246 | + dirIdHash --> truncate |
| 247 | + truncate{{"_[0..20]"}} |
| 248 | + truncate --> base32 |
| 249 | + base32{{base32}} |
| 250 | + base32 --> dirIdString |
| 251 | + dirIdString --> head |
| 252 | + head{{"_[0..2]"}} |
| 253 | + dirIdString --> tail |
| 254 | + tail{{"_[2..32]"}} |
| 255 | + head -->|$0:| pattern |
| 256 | + tail -->|$1:| pattern |
| 257 | + pattern{{"d/$0/$1"}} |
| 258 | + pattern --> dirPath |
| 259 | + sivKey -->|sivKey:| aesSiv |
| 260 | + aesSiv{{aesSiv}} |
| 261 | + parentDirId -->|ad:| aesSiv |
| 262 | + clearTextName -->|secret:| aesSiv |
| 263 | + aesSiv --> base64Url |
| 264 | + base64Url{{base64url}} |
| 265 | + base64Url --> appendFileExt |
| 266 | + appendFileExt{{"append '.uvf'"}} |
| 267 | + appendFileExt --> ciphertextName |
| 268 | + dirId -->|cleartextBlocks:| fileContentEncryption |
| 269 | + fileContentEncryption{{file content encryption}} |
| 270 | + directorySeed -->|seed:| fileContentEncryption |
| 271 | + fileContentEncryption --> dirUvf |
| 272 | + dirUvf["directory metadata dir.uvf content"] |
| 273 | +``` |
0 commit comments