Skip to content

Commit 87f38e0

Browse files
Copilotkwannoel
andcommitted
docs: add dedicated Iceberg compactor deployment documentation
Co-authored-by: kwannoel <47273164+kwannoel@users.noreply.github.com>
1 parent b16f0a5 commit 87f38e0

File tree

4 files changed

+201
-2
lines changed

4 files changed

+201
-2
lines changed

docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -592,7 +592,8 @@
592592
"group": "Internal Iceberg tables",
593593
"pages": [
594594
"iceberg/ov-internal",
595-
"iceberg/internal-iceberg-tables"
595+
"iceberg/internal-iceberg-tables",
596+
"iceberg/deploy-iceberg-compactor"
596597
]
597598
},
598599
{
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
title: "Deploy a dedicated Iceberg compactor"
3+
sidebarTitle: "Deploy Iceberg compactor"
4+
description: "Learn how to deploy and size a dedicated compactor node for RisingWave's built-in Iceberg maintenance when using internal Iceberg tables (ENGINE = iceberg)."
5+
---
6+
7+
RisingWave's built-in Iceberg maintenance — including automatic compaction and snapshot expiration — runs on the compactor node. When you enable `enable_compaction = true` on an internal Iceberg table or Iceberg sink, the compactor node executes those background maintenance tasks.
8+
9+
<Warning>
10+
**Dedicated compactor required for automatic Iceberg maintenance**
11+
12+
Before enabling `enable_compaction = true`, ensure your cluster has at least one compactor node deployed. Without a compactor, automatic Iceberg maintenance will not run, small files will accumulate, and query performance will degrade over time.
13+
</Warning>
14+
15+
## Why a dedicated compactor is needed
16+
17+
When RisingWave writes to Iceberg, it produces many small data files and frequent snapshots. Without compaction:
18+
19+
- Query performance degrades due to excessive file scanning.
20+
- Storage costs increase from accumulated small files and stale snapshots.
21+
- Metadata overhead grows with each new snapshot, slowing down catalog operations.
22+
23+
RisingWave's compactor node handles this by periodically merging small files and expiring old snapshots. It uses an embedded Rust/DataFusion engine that can outperform a single-node Apache Spark setup for Iceberg compaction tasks. See the [compaction benchmark](/iceberg/compaction-benchmark) for details.
24+
25+
The compactor node is separate from the compute node and can be scaled independently, so it will not interfere with your streaming workloads.
26+
27+
## Deploy a compactor node
28+
29+
### Kubernetes (Helm)
30+
31+
If you deployed RisingWave using the Helm chart, add or update the `compactorComponent` section in your `values.yaml` file.
32+
33+
#### Minimal configuration
34+
35+
```yaml values.yaml
36+
compactorComponent:
37+
replicas: 1
38+
resources:
39+
limits:
40+
cpu: "2"
41+
memory: 4Gi
42+
requests:
43+
cpu: "1"
44+
memory: 2Gi
45+
```
46+
47+
Apply the change:
48+
49+
```bash
50+
helm upgrade -n risingwave <my-risingwave> risingwavelabs/risingwave -f values.yaml
51+
```
52+
53+
#### Production configuration
54+
55+
For production workloads with frequent writes or large data volumes, allocate more CPU and memory:
56+
57+
```yaml values.yaml
58+
compactorComponent:
59+
replicas: 1
60+
resources:
61+
limits:
62+
cpu: "8"
63+
memory: 16Gi
64+
requests:
65+
cpu: "4"
66+
memory: 8Gi
67+
```
68+
69+
See [Helm chart configuration](https://github.com/risingwavelabs/helm-charts/blob/main/docs/CONFIGURATION.md#customize-pods-of-different-components) for the full list of supported `compactorComponent` fields.
70+
71+
### Kubernetes (Operator)
72+
73+
If you deployed RisingWave using the Kubernetes Operator, add or update the `compactor` section under `spec.components` in your `RisingWave` custom resource.
74+
75+
#### Minimal configuration
76+
77+
```yaml risingwave.yaml
78+
apiVersion: risingwave.risingwavelabs.com/v1alpha1
79+
kind: RisingWave
80+
metadata:
81+
name: risingwave
82+
spec:
83+
# ... other fields ...
84+
components:
85+
compactor:
86+
nodeGroups:
87+
- name: ""
88+
replicas: 1
89+
template:
90+
spec:
91+
resources:
92+
limits:
93+
cpu: "2"
94+
memory: 4Gi
95+
requests:
96+
cpu: "1"
97+
memory: 2Gi
98+
```
99+
100+
Apply the change:
101+
102+
```bash
103+
kubectl apply -f risingwave.yaml
104+
```
105+
106+
#### Production configuration
107+
108+
```yaml risingwave.yaml
109+
apiVersion: risingwave.risingwavelabs.com/v1alpha1
110+
kind: RisingWave
111+
metadata:
112+
name: risingwave
113+
spec:
114+
# ... other fields ...
115+
components:
116+
compactor:
117+
nodeGroups:
118+
- name: ""
119+
replicas: 1
120+
template:
121+
spec:
122+
resources:
123+
limits:
124+
cpu: "8"
125+
memory: 16Gi
126+
requests:
127+
cpu: "4"
128+
memory: 8Gi
129+
```
130+
131+
## Verify the compactor is running
132+
133+
After applying the configuration, check that the compactor Pod is running:
134+
135+
```bash
136+
# Helm deployment
137+
kubectl -n risingwave get pods -l app.kubernetes.io/component=compactor
138+
139+
# Operator deployment
140+
kubectl get pods -l risingwave/component=compactor
141+
```
142+
143+
The output should show a compactor Pod with status `Running`:
144+
145+
```
146+
NAME READY STATUS RESTARTS AGE
147+
risingwave-compactor-8dd799db6-hdjjz 1/1 Running 0 2m
148+
```
149+
150+
## Sizing guidelines
151+
152+
The right compactor size depends on your write volume and compaction frequency. Use the following guidelines as a starting point.
153+
154+
### Minimum requirements
155+
156+
| Resource | Value |
157+
|:--|:--|
158+
| CPU | 1 core |
159+
| Memory | 2 GB |
160+
161+
This is sufficient for small workloads with infrequent writes (for example, test environments or low-volume pipelines).
162+
163+
### Recommended sizing by workload
164+
165+
| Workload | Write volume | Compaction frequency | CPU | Memory |
166+
|:--|:--|:--|:--|:--|
167+
| Light | < 10 GB/day | Hourly (default) | 2 cores | 4 GB |
168+
| Medium | 10–100 GB/day | Hourly or more frequent | 4 cores | 8 GB |
169+
| Heavy | > 100 GB/day | Sub-hourly | 8+ cores | 16+ GB |
170+
171+
### Sizing considerations
172+
173+
- **CPU**: Compaction is CPU-intensive due to file reading, sorting, and writing. Allocate more CPU for high write volumes or shorter compaction intervals.
174+
- **Memory**: The compactor buffers file data in memory during compaction. For large target file sizes (for example, `compaction.target_file_size_mb = 512`), increase memory proportionally.
175+
- **Replicas**: In most cases, a single compactor replica is sufficient. Consider adding a second replica if the compactor consistently becomes a bottleneck (observable via the [RisingWave monitoring dashboard](/operate/monitor-risingwave-cluster)).
176+
177+
<Tip>
178+
The [compaction benchmark](/iceberg/compaction-benchmark) tested RisingWave's compaction engine on a 16-core, 64 GB machine against ~193 GB of data (17,000+ small files). For reference, that configuration compacted the dataset significantly faster than a single-node Apache Spark setup.
179+
</Tip>
180+
181+
### Adjusting compaction frequency
182+
183+
Reducing `compaction_interval_sec` increases how often compaction runs, which keeps tables healthier but increases compactor load. Increase CPU and memory if you lower the interval significantly.
184+
185+
```sql
186+
-- Run compaction every 30 minutes instead of the default 1 hour
187+
CREATE TABLE my_table (id INT PRIMARY KEY, name VARCHAR)
188+
WITH (
189+
enable_compaction = true,
190+
compaction_interval_sec = 1800
191+
) ENGINE = iceberg;
192+
```
193+
194+
For complete maintenance configuration options, see [Iceberg table maintenance](/iceberg/maintenance).

iceberg/maintenance.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ You can enable automatic maintenance to run periodically in the background for y
2222
<Warning>
2323
**Dedicated compactor required**
2424

25-
Automatic Iceberg maintenance requires a dedicated compactor service. Please contact us via the [RisingWave Slack workspace](https://www.risingwave.com/slack) to have the necessary resources allocated for your cluster.
25+
Automatic Iceberg maintenance requires a dedicated compactor node. Before enabling `enable_compaction = true`, ensure your cluster has at least one compactor node deployed. For deployment instructions and sizing guidelines, see [Deploy a dedicated Iceberg compactor](/iceberg/deploy-iceberg-compactor).
2626
</Warning>
2727

2828
### Compaction types

iceberg/ov-internal.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@ RisingWave provides a managed compaction service that helps maintain table healt
4747

4848
You can enable automatic maintenance to run periodically or trigger it manually using the `VACUUM` command. Using RisingWave's service is optional, and you can also connect an external compactor from providers like Amazon EMR, or use a self-hosted Spark job.
4949

50+
<Note>
51+
Automatic Iceberg maintenance requires a dedicated compactor node in your cluster. Before enabling `enable_compaction = true`, see [Deploy a dedicated Iceberg compactor](/iceberg/deploy-iceberg-compactor) for deployment and sizing instructions.
52+
</Note>
53+
5054
For complete details on configuration, see the [Iceberg table maintenance](/iceberg/maintenance).
5155

5256
## Catalog and compaction summary

0 commit comments

Comments
 (0)