Skip to content

Commit 8d6fd67

Browse files
Merge pull request #24 from encryption-alliance/feature/filenames
Suggest filename encryption with AES-SIV-512-B64URL
2 parents f071ec4 + 0a7993c commit 8d6fd67

File tree

4 files changed

+357
-14
lines changed

4 files changed

+357
-14
lines changed

file content encryption/AES-256-GCM.md

Lines changed: 60 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# File Content Encryption using AES-256-GCM
22

33
## Format-specific file header fields
4+
45
Following the [_general header_ fields](README.md), this format requires 60 additional bytes for its _format-specific header_ fields:
56

67
* 12 byte nonce
@@ -10,13 +11,33 @@ Following the [_general header_ fields](README.md), this format requires 60 addi
1011
The header needs to be encrypted using a 256 bit key derived from the seed using the KDF defined in the [vault metadata file](../vault%20metadata/README.md).
1112

1213
```txt
13-
headerKey := kdf(secret: latestFileKey, length: 32, context: "fileHeader")
14+
headerKey := kdf(secret: latestSeed, length: 32, context: "fileHeader")
1415
headerNonce := csprng(bytes: 12)
1516
fileKey := csprng(bytes: 32)
1617
encryptedFileKey, tag := aesGcm(cleartext: fileKey, key: headerKey, nonce: headerNonce, ad: generalHeaderFields)
1718
header := generalHeaderFields . headerNonce . encryptedFileKey . tag
1819
```
1920

21+
```mermaid
22+
---
23+
title: Derivation of Encrypted File Content Key for AES-256-GCM-XXk format
24+
---
25+
flowchart TD
26+
seed -->|secret:| kdf0
27+
kdf0{{"kdf(secret,32,'fileHeader')"}}
28+
kdf0 --> headerKey
29+
headerKey -->|key:| aesGcm
30+
aesGcm{{aesGcm}}
31+
aesGcm --> encryptedFileKey
32+
csprng32{{"csprng(32)"}}
33+
csprng32 --> fileKey
34+
fileKey -->|secret:| aesGcm
35+
csprng12{{"csprng(12)"}}
36+
csprng12 --> headerNonce
37+
headerNonce -->|nonce:| aesGcm
38+
generalHeaderFields -->|ad:| aesGcm
39+
```
40+
2041
## File Body Encryption
2142

2243
The body is split up into chunks. Each chunk consists of:
@@ -38,4 +59,41 @@ body := join(ciphertextBlocks[])
3859

3960
### 32k
4061

41-
This variant uses 32740 payload bytes per block (resulting in 32768 encrypted bytes per chunk).
62+
This variant uses 32740 payload bytes per block (resulting in 32768 encrypted bytes per chunk).
63+
64+
## Overview
65+
66+
```mermaid
67+
---
68+
title: File Content Encryption for AES-256-GCM-XXk format
69+
---
70+
erDiagram
71+
FILE["encrypted file"]
72+
FILE_HEADER["file header"]
73+
FILE_BODY["encrypted body"]
74+
FILE ||--|| FILE_HEADER: has
75+
FILE ||--|| FILE_BODY: has
76+
77+
FILE_HEADER ||--|| GENERALHEADERFIELDS: has
78+
FILE_HEADER ||--|| CUSTOMHEADERFIELDS: has
79+
80+
FILE_BODY ||--|{ CIPHERTEXTBLOCK: "consists of"
81+
82+
GENERALHEADERFIELDS["general header fields"] {
83+
byte(3) fileSignature "ASCII `uvf` (big-endian) magic bytes"
84+
byte(1) spec "uvf spec version (0-255)"
85+
byte(4) seedId "ID of the seed used to derive the file key"
86+
}
87+
88+
CUSTOMHEADERFIELDS["custom header fields"] {
89+
byte(12) headerNonce "header nonce"
90+
byte(32) encryptedFileKey "encrypted file content key"
91+
byte(16) tag "tag for verification"
92+
}
93+
94+
CIPHERTEXTBLOCK["cipherTextBlock[i]"] {
95+
byte(12) blockNonce "block nonce"
96+
byte(n) encryptedPayload "n bytes encrypted payload"
97+
byte(16) tag "tag"
98+
}
99+
```
Lines changed: 273 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
# File Name Encryption using AES-SIV-512 + Base64-URL-Encoding
2+
3+
## Directory Metadata
4+
5+
Every directory requires certain metadata that affects the file name encryption of its direct children:
6+
7+
* the seed used to derive keys
8+
* the directory ID
9+
10+
This data is immutable and therefore linked with a directory eternally, surviving renames/moves. This data is stored in a file called `dir.uvf`, which is stored in two places:
11+
1. Within the parent dir (except for root), where it serves a link to the child dir
12+
2. In the child dir itself (allowing disaster recovery without the parent).
13+
14+
15+
> [!NOTE] Disaster recovery without the parent
16+
> Imagine cleartext folder structure `a/b/c/` but sync fails and `a/b/` gets lost. With this information, `?/?/c/` and
17+
> all its children can still be recovered. The `dirId` required for name decryption would otherwise only be available
18+
> within the lost parent dir.
19+
20+
The exact file structure of `dir.uvf` will be discussed in more detail [below](#format-of-diruvf-and-symlinkuvf).
21+
22+
### Directory Seed
23+
24+
At creation time, a directory's seed is always the `latestSeed` as defined in the [vault metadata file](../vault%20metadata/README.md#encrypted-content). In case of the root directory this happens to also be the `initialSeed`.
25+
26+
When navigating into a directory, the seed is read from the corresponding `dir.uvf`'s file header. More precisely, the header contains the seed ID, and the seed can the be looked up in the `seeds` in [vault metadata file](../vault%20metadata/README.md#encrypted-content).
27+
28+
> [!NOTE]
29+
> A directory's seed is used to derive file name encryption keys used to encrypt the directory's direct children's names. Since the seed is immutable, new child nodes added at a later time will use the same file name encryption keys as old children.
30+
>
31+
> Consequently, key rotation only affects file names of nodes added to newly created directories. Key rotation is ineffective for children added to preexisting directories.
32+
33+
### Directory ID
34+
35+
The directory ID is a unique sequence of 32 random bytes (taking the birthday paradox into account, collision probability is therefore 2^-128).
36+
37+
```ts
38+
let dirId = csprng(len: 32)
39+
```
40+
41+
The only exception to this is the root directory ID, which is deterministically derived from the `initialSeed` using the [KDF](../kdf/README.md):
42+
43+
```ts
44+
let rootDirId = kdf(secret: initialSeed, len: 32, context: "rootDirId")
45+
```
46+
47+
## Deriving Encryption Keys
48+
49+
All file names are encrypted using AES-SIV, which requires a 512 bit key (which is internally split into two 256 bit AES keys). Furthermore we need a 256 bit key for HMAC computations. We use the directory-specific seed from `dir.uvf` and feed it into the [KDF](../kdf/README.md):
50+
51+
```ts
52+
let sivKey = kdf(secret: seed, len: 64, context: "siv")
53+
let hmacKey = kdf(secret: seed, len: 32, context: "hmac")
54+
```
55+
56+
## Mapping Directory IDs to Paths
57+
58+
When traversing directories, the directory ID of a given subdirectory is processed in three steps to determine the storage path inside the vault:
59+
60+
1. Compute the HMAC of the `dirId` using SHA-256 and the `hmacKey`
61+
1. Truncate the result. Keep the leftmost 160 bits, discard the remaining 96 bits
62+
1. Encode the truncated hash with Base32 to get a string of printable chars
63+
1. Construct the directory path by resolving substrings of the encoded hash relative to `{vaultRoot}/d/`
64+
* split of the first two characters of the encoded hash (allowing for a total of 1024 directories within the base directory)
65+
* use the remaining 30 characters of the encoded hash as the second level directory
66+
67+
```ts
68+
let dirIdHash = hmacSha256(data: dirId, key: hmacKey)
69+
let truncatedHash = dirIdHash[0..20]
70+
let dirIdString = base32(truncatedHash)
71+
let dirPath = vaultRoot + '/d/' + dirIdString[0..2] + '/' + dirIdString[2..32]
72+
```
73+
74+
> [!NOTE]
75+
> Due to the nature of hierarchical data structures, traversing file trees is an inherently top-down process, allowing the use of one-way hash functions.
76+
>
77+
> Base32 is used to get an encoding that works with case insensitive file systems and limits the number of nodes within `d/` to `32^2`.
78+
>
79+
> The truncation of the hash is done to to balance sufficient collision resistance and output length (to accommodate systems that have path length limitations).
80+
81+
> [!TIP]
82+
> Splitting the `dirIdString` into a path like `d/AB/CDEFGHIJKLMNOPQRSTUVWXYZ234567` is inspired by Cryptomator's former vault formats and serves two purposes:
83+
> 1. Gather all data within a single data dir, uncluttering the root dir
84+
> 2. Having at most `32^2` subdirectories within `d`
85+
86+
Regardless of the hierarchy of cleartext paths, ciphertext directories are always stored in a flattened structure. All directories will therefore effectively be siblings (or cousins, to be precise).
87+
88+
89+
## Encryption of Node Names
90+
91+
The cleartext name of a node gets encoded using UTF-8 in [Normalization Form C](https://unicode.org/reports/tr15/#Norm*Forms) to get a unique binary representation.
92+
93+
The byte sequence is then encrypted using AES-SIV as defined in [RFC 5297](https://tools.ietf.org/html/rfc5297). In order to bind the node to the containing directory, preventing undetected manipulation of the folder structure, the directory ID of the parent folder is used as associated data.
94+
95+
Lastly, the ciphertext is encoded with unpadded base64url and a file extension is added.
96+
97+
```ts
98+
let ciphertext = aesSiv(secret: cleartextName, ad: parentDirId, key: sivKey)
99+
let ciphertextName = base64url(data: ciphertext) + '.uvf'
100+
```
101+
102+
## Ciphertext Directory Structure
103+
104+
### Node Types
105+
106+
Depending on the kind of a cleartext node, the encrypted name is then either used to create a file or a directory:
107+
108+
| cleartext node type | ciphertext structure |
109+
|---------------------|-------------------------------------|
110+
| file | file |
111+
| directory | directory containing `dir.uvf` |
112+
| symlink | directory containing `symlink.uvf` |
113+
114+
### Format of `dir.uvf` and `symlink.uvf`
115+
116+
Both, `dir.uvf` and `symlink.uvf` files are encrypted using the [content encryption mechanism](../file%20content%20encryption/README.md) configured for the vault.
117+
118+
The cleartext content of `dir.uvf` is the 32 byte dirId. The seed referenced by this file's header doubles as the seed of the child directory.
119+
120+
The cleartext content of `symlink.uvf` is an UTF-8 string in Normalization Form C, denoting the cleartext target of the symlink.
121+
122+
> [!CAUTION]
123+
> Every `*.uvf` file MUST be encrypted independently, particularly the two `dir.uvf` copies that contain the same dirId. This is required for indistinguishable ciphertexts, avoiding the leakage of the nested dir structure.
124+
125+
### Example Directory Structure
126+
127+
Thus, for a given cleartext directory structure like this...
128+
129+
```txt
130+
.
131+
├─ File.txt
132+
├─ Symlink
133+
├─ Subdirectory
134+
│ └─ ...
135+
└─ ...
136+
```
137+
138+
...the corresponding ciphertext directory structure will be:
139+
140+
```txt
141+
.
142+
├─ vault.uvf
143+
└─ d
144+
├─ BZ
145+
│ └─ R4VZSS5PEF7TU3PMFIMON5GJRNBDWA # Root Directory
146+
│ ├─ dir.uvf # Root Directory's metadata
147+
│ ├─ 5TyvCyF255sRtfrIv83ucADQ.uvf # File.txt
148+
│ ├─ FHTa55bHsUfVDbEb0gTL9hZ8nho.uvf # Subdirectory
149+
│ │ └─ dir.uvf # Subdirectory's metadata
150+
│ └─ gLeOGMCN358UBf2Qk9cWCQl.uvf # Symlink
151+
│ └─ symlink.uvf # Symlink's target
152+
├─ FC
153+
│ └─ ZKZRLZUODUUYTYA4457CSBPZXB5A77 # Subdirectory
154+
│ ├─ dir.uvf # Subdirectory's metadata
155+
| └─ ... # Subdirectory's children
156+
└─ ...
157+
```
158+
159+
### Traversing the Example Directory Structure
160+
161+
#### List contents of `/`:
162+
163+
1. Use `initialSeed` as a seed for the root directory
164+
1. compute `rootDirId` and corresponding ciphertext dir path -> `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA`
165+
1. list direct children within `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA`
166+
* `dir.uvf` (file)
167+
* `5TyvCyF255sRtfrIv83ucADQ.uvf` (file)
168+
* `FHTa55bHsUfVDbEb0gTL9hZ8nho.uvf` (dir)
169+
* `gLeOGMCN358UBf2Qk9cWCQl.uvf` (dir)
170+
1. for each subdirectory, determine node type
171+
* `FHTa55bHsUfVDbEb0gTL9hZ8nho.uvf` denotes a dir (contains `dir.uvf`)
172+
* `gLeOGMCN358UBf2Qk9cWCQl.uvf` denotes a symlink (contains `symlink.uvf`)
173+
1. strip file extension and decrypt file names
174+
* `File.txt`
175+
* `Subdirectory`
176+
* `Symlink`
177+
178+
#### List contents of `/Subdirectory/`:
179+
180+
1. read seed and decrypt `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA/FHTa55bHsUfVDbEb0gTL9hZ8nho.uvf/dir.uvf`
181+
1. read `dirId` from said file and compute ciphertext path -> `d/FC/ZKZRLZUODUUYTYA4457CSBPZXB5A77`
182+
1. Repeat dir listing procedure for `d/FC/ZKZRLZUODUUYTYA4457CSBPZXB5A77`
183+
184+
#### Read target of `/Symlink`:
185+
186+
1. decrypt file `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA/gLeOGMCN358UBf2Qk9cWCQl.uvf/symlink.uvf`
187+
188+
#### Read content of `/File.txt`:
189+
190+
1. decrypt file `d/BZ/R4VZSS5PEF7TU3PMFIMON5GJRNBDWA/5TyvCyF255sRtfrIv83ucADQ.uvf`
191+
192+
## Overview
193+
194+
### Derivation of Directory Seed and Directory ID
195+
```mermaid
196+
---
197+
title: Choosing a Seed and Directory ID
198+
---
199+
flowchart TD
200+
subgraph "Which Seed?"
201+
isRoot1{is root dir?}
202+
isRoot1 -->|y| initialSeed
203+
isRoot1 -->|n| isExistingDir1
204+
205+
isExistingDir1{is existing dir?}
206+
isExistingDir1 -->|y| readSeed
207+
isExistingDir1 -->|n| latestSeed
208+
209+
readSeed{{"read seed from `dir.uvf`"}}
210+
readSeed --> dirFileSeed[seed from dir.uvf]
211+
end
212+
213+
subgraph "Which dirId?"
214+
isRoot2{is root dir?}
215+
isRoot2 -->|n| isExistingDir2
216+
isRoot2 -->|y| kdfRootDirId
217+
218+
isExistingDir2{is existing dir?}
219+
isExistingDir2 -->|y| readDirId
220+
isExistingDir2 -->|n| csprng32
221+
222+
csprng32 --> randomDirId[random dirId]
223+
csprng32{{"csprng(32)"}}
224+
225+
initialSeed -->|secret:| kdfRootDirId
226+
kdfRootDirId{{"kdf(secret,32,'rootDirId')"}}
227+
kdfRootDirId --> rootDirId[root dirId]
228+
229+
readDirId{{"read dirId from `dir.uvf`"}}
230+
readDirId --> dirFileId[dirId from dir.uvf]
231+
end
232+
```
233+
234+
### Mapping Directory IDs to Paths, Encryption of Directory Names and Directory Metadata
235+
```mermaid
236+
flowchart TD
237+
directorySeed -->|secret:| kdfSiv
238+
kdfSiv{{"kdf(secret,64,'siv')"}}
239+
kdfSiv --> sivKey
240+
directorySeed -->|secret:| kdfHmac
241+
kdfHmac{{"kdf(secret,32,'hmac')"}}
242+
kdfHmac --> hmacKey
243+
hmacKey -->|key:| hmacSha256
244+
hmacSha256{{hmacSha256}}
245+
hmacSha256 --> dirIdHash
246+
dirIdHash --> truncate
247+
truncate{{"_[0..20]"}}
248+
truncate --> base32
249+
base32{{base32}}
250+
base32 --> dirIdString
251+
dirIdString --> head
252+
head{{"_[0..2]"}}
253+
dirIdString --> tail
254+
tail{{"_[2..32]"}}
255+
head -->|$0:| pattern
256+
tail -->|$1:| pattern
257+
pattern{{"d/$0/$1"}}
258+
pattern --> dirPath
259+
sivKey -->|sivKey:| aesSiv
260+
aesSiv{{aesSiv}}
261+
parentDirId -->|ad:| aesSiv
262+
clearTextName -->|secret:| aesSiv
263+
aesSiv --> base64Url
264+
base64Url{{base64url}}
265+
base64Url --> appendFileExt
266+
appendFileExt{{"append '.uvf'"}}
267+
appendFileExt --> ciphertextName
268+
dirId -->|cleartextBlocks:| fileContentEncryption
269+
fileContentEncryption{{file content encryption}}
270+
directorySeed -->|seed:| fileContentEncryption
271+
fileContentEncryption --> dirUvf
272+
dirUvf["directory metadata dir.uvf content"]
273+
```

file name encryption/README.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,23 @@
11
# File Name Encryption
22

3-
:warning: this is a working draft
3+
## Approved File Name Formats
44

5-
| Format ID | Description | Pros | Cons |
5+
File name formats are specified by the [vault metadata file](../vault%20metadata/README.md) in its `nameFormat` field.
6+
7+
This is an exhaustive list of file name formats that have been defined in this version of the specification and MUST be supported by conforming applications:
8+
9+
| Format ID | Description | Properties | Restrictions |
10+
|---|---|---|---|
11+
| [AES-SIV-512-B64URL](AES-SIV-512-B64URL.md) | encrypt using AES-SIV, then base64url-encode file name, case-sensitive | just ASCII characters in ciphertext; case-sensitive | 16 byte overhead<br>4/3 expansion |
12+
13+
## Possible Future Formats
14+
15+
> [!NOTE]
16+
> Future versions of this standard might add further formats or deprecate existing ones. Existing formats MUST NOT be changed while keeping the same ID, though.
17+
18+
| Format ID | Description | Properties | Restrictions |
619
|---|---|---|---|
7-
| NONE | Don't encrypt file names, just append a file extension | no issues | :warning: no privacy |
8-
| AES-SIV-BASE64URL | Encrypt using AES-SIV, then base64url-encode | no fancy characters | 16 byte overhead<br>4/3 expansion |
9-
| AES-SIV-BASE32-CI | Encrypt using AES-SIV, then base32-encode, apply case information on encoded ciphertext | no fancy characters<br>case-insensitive | :warning: leaks case information <br>16 byte overhead<br>8/5 expansion |
10-
| AES-SIV-BASE4K | Encrypt using AES-SIV, then base4k-encode | short file names (in terms of chars) | 16 bytes overhead<br>unicode required |
20+
| NONE | No file name encryption, just file extensions | - | no confidentiality |
21+
| AES-SIV-512-B32-CI | encrypt using AES-SIV, then base32-encode, apply case information on encoded ciphertext | just ASCII characters in ciphertext; case-insensitive | :warning: leaks case information <br>16 byte overhead<br>8/5 expansion |
22+
| AES-SIV-512-B4K | encrypt using AES-SIV, then base4k-encode | short file names (using multi-byte chars); case-sensitive | 16 bytes overhead<br>unicode required |
1123
| ... | ... | ... | ... |

0 commit comments

Comments
 (0)