|
| 1 | +--- |
| 2 | +title: "Deploy a dedicated Iceberg compactor" |
| 3 | +sidebarTitle: "Deploy Iceberg compactor" |
| 4 | +description: "Learn how to deploy and size a dedicated compactor node for RisingWave's built-in Iceberg maintenance when using internal Iceberg tables (ENGINE = iceberg)." |
| 5 | +--- |
| 6 | + |
| 7 | +RisingWave's built-in Iceberg maintenance — including automatic compaction and snapshot expiration — runs on the compactor node. When you enable `enable_compaction = true` on an internal Iceberg table or Iceberg sink, the compactor node executes those background maintenance tasks. |
| 8 | + |
| 9 | +<Warning> |
| 10 | +**Dedicated compactor required for automatic Iceberg maintenance** |
| 11 | + |
| 12 | +Before enabling `enable_compaction = true`, ensure your cluster has at least one compactor node deployed. Without a compactor, automatic Iceberg maintenance will not run, small files will accumulate, and query performance will degrade over time. |
| 13 | +</Warning> |
| 14 | + |
| 15 | +## Why a dedicated compactor is needed |
| 16 | + |
| 17 | +When RisingWave writes to Iceberg, it produces many small data files and frequent snapshots. Without compaction: |
| 18 | + |
| 19 | +- Query performance degrades due to excessive file scanning. |
| 20 | +- Storage costs increase from accumulated small files and stale snapshots. |
| 21 | +- Metadata overhead grows with each new snapshot, slowing down catalog operations. |
| 22 | + |
| 23 | +RisingWave's compactor node handles this by periodically merging small files and expiring old snapshots. It uses an embedded Rust/DataFusion engine that can outperform a single-node Apache Spark setup for Iceberg compaction tasks. See the [compaction benchmark](/iceberg/compaction-benchmark) for details. |
| 24 | + |
| 25 | +The compactor node is separate from the compute node and can be scaled independently, so it will not interfere with your streaming workloads. |
| 26 | + |
| 27 | +## Deploy a compactor node |
| 28 | + |
| 29 | +### Kubernetes (Helm) |
| 30 | + |
| 31 | +If you deployed RisingWave using the Helm chart, add or update the `compactorComponent` section in your `values.yaml` file. |
| 32 | + |
| 33 | +#### Minimal configuration |
| 34 | + |
| 35 | +```yaml values.yaml |
| 36 | +compactorComponent: |
| 37 | + replicas: 1 |
| 38 | + resources: |
| 39 | + limits: |
| 40 | + cpu: "2" |
| 41 | + memory: 4Gi |
| 42 | + requests: |
| 43 | + cpu: "1" |
| 44 | + memory: 2Gi |
| 45 | +``` |
| 46 | +
|
| 47 | +Apply the change: |
| 48 | +
|
| 49 | +```bash |
| 50 | +helm upgrade -n risingwave <my-risingwave> risingwavelabs/risingwave -f values.yaml |
| 51 | +``` |
| 52 | + |
| 53 | +#### Production configuration |
| 54 | + |
| 55 | +For production workloads with frequent writes or large data volumes, allocate more CPU and memory: |
| 56 | + |
| 57 | +```yaml values.yaml |
| 58 | +compactorComponent: |
| 59 | + replicas: 1 |
| 60 | + resources: |
| 61 | + limits: |
| 62 | + cpu: "8" |
| 63 | + memory: 16Gi |
| 64 | + requests: |
| 65 | + cpu: "4" |
| 66 | + memory: 8Gi |
| 67 | +``` |
| 68 | +
|
| 69 | +See [Helm chart configuration](https://github.com/risingwavelabs/helm-charts/blob/main/docs/CONFIGURATION.md#customize-pods-of-different-components) for the full list of supported `compactorComponent` fields. |
| 70 | + |
| 71 | +### Kubernetes (Operator) |
| 72 | + |
| 73 | +If you deployed RisingWave using the Kubernetes Operator, add or update the `compactor` section under `spec.components` in your `RisingWave` custom resource. |
| 74 | + |
| 75 | +#### Minimal configuration |
| 76 | + |
| 77 | +```yaml risingwave.yaml |
| 78 | +apiVersion: risingwave.risingwavelabs.com/v1alpha1 |
| 79 | +kind: RisingWave |
| 80 | +metadata: |
| 81 | + name: risingwave |
| 82 | +spec: |
| 83 | + # ... other fields ... |
| 84 | + components: |
| 85 | + compactor: |
| 86 | + nodeGroups: |
| 87 | + - name: "" |
| 88 | + replicas: 1 |
| 89 | + template: |
| 90 | + spec: |
| 91 | + resources: |
| 92 | + limits: |
| 93 | + cpu: "2" |
| 94 | + memory: 4Gi |
| 95 | + requests: |
| 96 | + cpu: "1" |
| 97 | + memory: 2Gi |
| 98 | +``` |
| 99 | + |
| 100 | +Apply the change: |
| 101 | + |
| 102 | +```bash |
| 103 | +kubectl apply -f risingwave.yaml |
| 104 | +``` |
| 105 | + |
| 106 | +#### Production configuration |
| 107 | + |
| 108 | +```yaml risingwave.yaml |
| 109 | +apiVersion: risingwave.risingwavelabs.com/v1alpha1 |
| 110 | +kind: RisingWave |
| 111 | +metadata: |
| 112 | + name: risingwave |
| 113 | +spec: |
| 114 | + # ... other fields ... |
| 115 | + components: |
| 116 | + compactor: |
| 117 | + nodeGroups: |
| 118 | + - name: "" |
| 119 | + replicas: 1 |
| 120 | + template: |
| 121 | + spec: |
| 122 | + resources: |
| 123 | + limits: |
| 124 | + cpu: "8" |
| 125 | + memory: 16Gi |
| 126 | + requests: |
| 127 | + cpu: "4" |
| 128 | + memory: 8Gi |
| 129 | +``` |
| 130 | + |
| 131 | +## Verify the compactor is running |
| 132 | + |
| 133 | +After applying the configuration, check that the compactor Pod is running: |
| 134 | + |
| 135 | +```bash |
| 136 | +# Helm deployment |
| 137 | +kubectl -n risingwave get pods -l app.kubernetes.io/component=compactor |
| 138 | +
|
| 139 | +# Operator deployment |
| 140 | +kubectl get pods -l risingwave/component=compactor |
| 141 | +``` |
| 142 | + |
| 143 | +The output should show a compactor Pod with status `Running`: |
| 144 | + |
| 145 | +``` |
| 146 | +NAME READY STATUS RESTARTS AGE |
| 147 | +risingwave-compactor-8dd799db6-hdjjz 1/1 Running 0 2m |
| 148 | +``` |
| 149 | + |
| 150 | +## Sizing guidelines |
| 151 | + |
| 152 | +The right compactor size depends on your write volume and compaction frequency. Use the following guidelines as a starting point. |
| 153 | + |
| 154 | +### Minimum requirements |
| 155 | + |
| 156 | +| Resource | Value | |
| 157 | +|:--|:--| |
| 158 | +| CPU | 1 core | |
| 159 | +| Memory | 2 GB | |
| 160 | + |
| 161 | +This is sufficient for small workloads with infrequent writes (for example, test environments or low-volume pipelines). |
| 162 | + |
| 163 | +### Recommended sizing by workload |
| 164 | + |
| 165 | +| Workload | Write volume | Compaction frequency | CPU | Memory | |
| 166 | +|:--|:--|:--|:--|:--| |
| 167 | +| Light | < 10 GB/day | Hourly (default) | 2 cores | 4 GB | |
| 168 | +| Medium | 10–100 GB/day | Hourly or more frequent | 4 cores | 8 GB | |
| 169 | +| Heavy | > 100 GB/day | Sub-hourly | 8+ cores | 16+ GB | |
| 170 | + |
| 171 | +### Sizing considerations |
| 172 | + |
| 173 | +- **CPU**: Compaction is CPU-intensive due to file reading, sorting, and writing. Allocate more CPU for high write volumes or shorter compaction intervals. |
| 174 | +- **Memory**: The compactor buffers file data in memory during compaction. For large target file sizes (for example, `compaction.target_file_size_mb = 512`), increase memory proportionally. |
| 175 | +- **Replicas**: In most cases, a single compactor replica is sufficient. Consider adding a second replica if the compactor consistently becomes a bottleneck (observable via the [RisingWave monitoring dashboard](/operate/monitor-risingwave-cluster)). |
| 176 | + |
| 177 | +<Tip> |
| 178 | +The [compaction benchmark](/iceberg/compaction-benchmark) tested RisingWave's compaction engine on a 16-core, 64 GB machine against ~193 GB of data (17,000+ small files). For reference, that configuration compacted the dataset significantly faster than a single-node Apache Spark setup. |
| 179 | +</Tip> |
| 180 | + |
| 181 | +### Adjusting compaction frequency |
| 182 | + |
| 183 | +Reducing `compaction_interval_sec` increases how often compaction runs, which keeps tables healthier but increases compactor load. Increase CPU and memory if you lower the interval significantly. |
| 184 | + |
| 185 | +```sql |
| 186 | +-- Run compaction every 30 minutes instead of the default 1 hour |
| 187 | +CREATE TABLE my_table (id INT PRIMARY KEY, name VARCHAR) |
| 188 | +WITH ( |
| 189 | + enable_compaction = true, |
| 190 | + compaction_interval_sec = 1800 |
| 191 | +) ENGINE = iceberg; |
| 192 | +``` |
| 193 | + |
| 194 | +For complete maintenance configuration options, see [Iceberg table maintenance](/iceberg/maintenance). |
0 commit comments