Skip to content

Commit 99f77c2

Browse files
authored
Doc changes for compression on continuous aggregates feature (#666)
API changes Add a page that describes compression on continuous aggregates under the How-To guides (for continuous aggregates).
1 parent 7110c66 commit 99f77c2

File tree

7 files changed

+136
-32
lines changed

7 files changed

+136
-32
lines changed

api/add_compression_policy.md

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,21 @@
22
Allows you to set a policy by which the system compresses a chunk
33
automatically in the background after it reaches a given age.
44

5-
Note that compression policies can only be created on hypertables that already
6-
have compression enabled, e.g., via the [`ALTER TABLE`][compression_alter-table] command
7-
to set `timescaledb.compress` and other configuration parameters.
5+
Note that compression policies can only be created on hypertables or continuous
6+
aggregates that already have compression enabled. Use the [`ALTER TABLE`][compression_alter-table] command
7+
to set `timescaledb.compress` and other configuration parameters for hypertables.
8+
Use [`ALTER MATERIALIZED VIEW`][compression_continuous-aggregate] command to
9+
enable compression on continuous aggregated
810

9-
### Required Arguments
11+
### Required arguments
1012

1113
|Name|Type|Description|
1214
|---|---|---|
13-
| `hypertable` |REGCLASS| Name of the hypertable|
15+
| `hypertable` |REGCLASS| Name of the hypertable or continuous aggregate|
1416
| `compress_after` | INTERVAL or INTEGER | The age after which the policy job compresses chunks|
1517

16-
The `compress_after` parameter should be specified differently depending on the type of the time column of the hypertable:
18+
The `compress_after` parameter should be specified differently depending
19+
on the type of the time column of the hypertable or continuous aggregate:
1720
- For hypertables with TIMESTAMP, TIMESTAMPTZ, and DATE time columns: the time interval should be an INTERVAL type.
1821
- For hypertables with integer-based timestamps: the time interval should be an integer type (this requires
1922
the [integer_now_func][set_integer_now_func] to be set).
@@ -24,7 +27,14 @@ the [integer_now_func][set_integer_now_func] to be set).
2427
|---|---|---|
2528
| `if_not_exists` | BOOLEAN | Setting to true causes the command to fail with a warning instead of an error if a compression policy already exists on the hypertable. Defaults to false.|
2629

27-
### Sample Usage
30+
<highlight type="important">
31+
Compression policies on continuous aggregates should be set up so that they do
32+
not overlap with refresh policies on continuous aggregates. This is due to a
33+
current TimescaleDB limitation that prevents refresh of compressed regions of
34+
continuous aggregates.
35+
</highlight>
36+
37+
### Sample usage
2838
Add a policy to compress chunks older than 60 days on the 'cpu' hypertable.
2939

3040
``` sql
@@ -37,6 +47,12 @@ Add a compress chunks policy to a hypertable with an integer-based time column:
3747
SELECT add_compression_policy('table_with_bigint_time', BIGINT '600000');
3848
```
3949

50+
Add a policy to compress chunks of a continuous aggregate called `cpu_weekly`, that are
51+
older than eight weeks:
52+
``` sql
53+
SELECT add_compression_policy('cpu_weekly', INTERVAL '8 weeks');
54+
```
4055

4156
[compression_alter-table]: /api/:currentVersion:/compression/alter_table_compression/
57+
[compression_continuous-aggregate]: /api/:currentVersion:/continuous-aggregates/alter_materialized_view/
4258
[set_integer_now_func]: /hypertable/set_integer_now_func

api/alter_materialized_view.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,28 @@ ALTER MATERIALIZED VIEW <view_name> SET ( timescaledb.<option> = <value> [, ...
1818
|---|---|---|
1919
| `<view_name>` | TEXT | Name (optionally schema-qualified) of continuous aggregate view to be created.|
2020

21-
### Sample Usage
21+
### Options
22+
|Name|Description|
23+
|-|-|
24+
|timescaledb.materialized_only|Enable and disable real time aggregation|
25+
|timescaledb.compress|Enable and disable compression|
2226

27+
### Sample usage
2328
To disable real-time aggregates for a
2429
continuous aggregate:
2530

2631
```sql
2732
ALTER MATERIALIZED VIEW contagg_view SET (timescaledb.materialized_only = true);
2833
```
2934

30-
The only option that currently can be modified with `ALTER
31-
MATERIALIZED VIEW` is `materialized_only`. The other options
35+
To enable compression for a continuous aggregate:
36+
37+
```sql
38+
ALTER MATERIALIZED VIEW contagg_view SET (timescaledb.compress = true);
39+
```
40+
41+
The only options that currently can be modified with `ALTER
42+
MATERIALIZED VIEW` are `materialized_only` and `compress`. The other options
3243
`continuous` and `create_group_indexes` can only be set when creating
3344
the continuous aggregate.
3445

api/continuous_aggregates.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ Get metadata and settings information for continuous aggregates.
1111
|`view_schema` | TEXT | Schema for continuous aggregate view |
1212
|`view_name` | TEXT | User supplied name for continuous aggregate view |
1313
|`view_owner` | TEXT | Owner of the continuous aggregate view|
14-
|`materialized_only` | BOOLEAN | Return only materialized data when querying the continuous aggregate view. |
14+
|`materialized_only` | BOOLEAN | Return only materialized data when querying the continuous aggregate view|
15+
|`compression_enabled` | BOOLEAN | Is compression enabled for the continuous aggregate view?|
1516
|`materialization_hypertable_schema` | TEXT | Schema of the underlying materialization table|
1617
|`materialization_hypertable_name` | TEXT | Name of the underlying materialization table|
1718
|`view_definition` | TEXT | `SELECT` query for continuous aggregate view|
@@ -27,11 +28,12 @@ view_schema | public
2728
view_name | contagg_view
2829
view_owner | postgres
2930
materialized_only | f
31+
compression_enabled | f
3032
materialization_hypertable_schema | _timescaledb_internal
3133
materialization_hypertable_name | _materialized_hypertable_2
3234
view_definition | SELECT foo.a, +
3335
| COUNT(foo.b) AS countb +
3436
| FROM foo +
3537
| GROUP BY (time_bucket('1 day', foo.a)), foo.a;
3638

37-
```
39+
```

api/remove_compression_policy.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,10 @@ If you need to remove the compression policy. To re-start policy-based compressi
44
### Required Arguments
55

66
|Name|Type|Description|
7-
|---|---|---|
8-
| `hypertable` | REGCLASS | Name of the hypertable the policy should be removed from.|
9-
10-
### Optional Arguments
7+
|-|-|-|
8+
|`hypertable`|REGCLASS|Name of the hypertable or continuous aggregate the policy should be removed from|
119

10+
### Optional arguments
1211
|Name|Type|Description|
1312
|---|---|---|
1413
| `if_exists` | BOOLEAN | Setting to true causes the command to fail with a notice instead of an error if a compression policy does not exist on the hypertable. Defaults to false.|
@@ -18,3 +17,8 @@ Remove the compression policy from the 'cpu' table:
1817
``` sql
1918
SELECT remove_compression_policy('cpu');
2019
```
20+
21+
Remove the compression policy from the 'cpu_weekly' continuous aggregate:
22+
``` sql
23+
SELECT remove_compression_policy('cpu_weekly');
24+
```
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Compression on continuous aggregates
2+
Continuous aggregates are often used to store downsampled historical data.
3+
The historical data is almost never modified or recomputed and is only used
4+
for serving analytic queries. For this use case, it is often beneficial to
5+
store the materialized data in compressed form to save on storage costs.
6+
You can get these cost savings by enabling compression on continuous
7+
aggregates.
8+
9+
Currently, TimescaleDB does not support refreshing compressed regions of a
10+
continuous aggregate. To do this, you have to manually decompress
11+
the compressed chunk and then execute a `refresh_continuous_aggregate` call.
12+
13+
## Enable compression on continuous aggregates
14+
You can enable and disable compression on continuous aggregated by setting
15+
`compress` parameter when you alter the view.
16+
17+
<procedure>
18+
19+
### Enabling and disabling compression on continuous aggregates
20+
1. For an existing continuous aggregate, at the `psql` prompt, enable
21+
compression:
22+
```sql
23+
ALTER MATERIALIZED VIEW cagg_name set (timescaledb.compress = true);
24+
```
25+
1. Disable compression:
26+
```sql
27+
ALTER MATERIALIZED VIEW cagg_name set (timescaledb.compress = false);
28+
```
29+
</procedure>
30+
The decompress command fails if there are compressed chunks associated with the
31+
continuous aggregate. In this case, you need to decompress the chunks, and then
32+
drop any compression policy on the continuous aggregate, before you disable
33+
compression. For more detailed information, see the
34+
[decompress chunks] [decompress-chunks] section:
35+
```sql
36+
SELECT decompress_chunk(c, true) FROM show_chunks('cagg_name') c;
37+
38+
39+
## Compression policies on continuous aggregates
40+
Before setting up a compression policy on a continuous aggregate, you should
41+
set up a refresh policy. The compression policy interval should be set so that
42+
actively refreshed regions are not compressed. This is to prevent refresh
43+
policies from failing. For example, consider a refresh policy like this:
44+
45+
```sql
46+
SELECT add_continuous_aggregate_policy('cagg_name', refresh_start=>'30 days', refresh_end=>'1 day', '1 h');
47+
```
48+
49+
With this kind of refresh policy, the compression policy needs the `compress_after`
50+
parameter greater than the `refresh_start` parameter of the continuous aggregate policy:
51+
```sql
52+
SELECT add_compression_policy('cagg_name', compress_after=>'45 days'::interval);
53+
```
54+
55+
After a chunk is compressed, manual refresh calls that attempt to refresh the
56+
continuous aggregate's compressed region will fail with an error like this:
57+
58+
```sql
59+
CALL refresh_continuous_aggregate('cagg_name', NULL, now() - '30 days'::interval );
60+
ERROR: cannot update/delete rows from chunk "_hyper_3_3_chunk" as it is compressed
61+
```
62+
63+
[decompress-chunks]: how-to-guides/compression/decompress-chunks.md

timescaledb/how-to-guides/continuous-aggregates/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ only the data that has changed needs to be computed, not the entire dataset.
1313
* [Drop data][cagg-drop] from your continuous aggregates.
1414
* [Manage materialized hypertables][cagg-mat-hypertables].
1515
* [Use real-time aggregates][cagg-realtime].
16+
* [Compression with continuous aggregates][cagg-compression].
1617
* [Troubleshoot][cagg-tshoot] continuous aggregates.
1718

1819

@@ -24,4 +25,5 @@ only the data that has changed needs to be computed, not the entire dataset.
2425
[cagg-drop]: /how-to-guides/continuous-aggregates/drop-data
2526
[cagg-mat-hypertables]: /how-to-guides/continuous-aggregates/materialized-hypertables
2627
[cagg-realtime]: /how-to-guides/continuous-aggregates/real-time-aggregates
28+
[cagg-compression]: /how-to-guides/continuous-aggregates/compression-on-continuous-aggregates
2729
[cagg-tshoot]: /how-to-guides/continuous-aggregates/troubleshooting

timescaledb/how-to-guides/continuous-aggregates/troubleshooting.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@ with continuous aggregates.
1212
* Copy this comment at the top of every troubleshooting page
1313
-->
1414

15-
## Compression policies
15+
## Retention policies
1616
If you have hypertables that use a different retention policy to your continuous
1717
aggregates, the retention policies are applied separately. The retention policy
1818
on a hypertable determines how long the raw data is kept for. The retention
1919
policy on a continuous aggregate determines how long the continuous aggregate is
2020
kept for. For example, if you have a hypertable with a retention policy of a
21-
week, but a continuous aggregate with a retention policy of a month, the raw
21+
week and a continuous aggregate with a retention policy of a month, the raw
2222
data is kept for a week, and the continuous aggregate is kept for a month.
2323

2424
## Insert irregular data into a continuous aggregate
@@ -50,13 +50,14 @@ be hard to refresh and would make more sense to isolate these columns in another
5050
hypertable. Alternatively, you might create one hypertable per metric and
5151
refresh them independently.
5252

53-
### New data is not shown in real-time aggregates
53+
### Updates to previously materialized regions are not shown in real-time aggregates
5454
If you have a time bucket that has already been materialized, the real-time
55-
aggregate won't show the data that has been inserted, updated, or deleted. In
56-
this worked example, `refresh_continuous_aggregate()` is called for the data
57-
that is not going to change. When you need to change data that has already been
58-
materialized, use `refresh_continuous_aggregate()` for the corresponding
59-
buckets.
55+
aggregate does not show the data that has been inserted, updated, or deleted
56+
into that bucket until the next `refresh_continuous_aggregate` call is executed.
57+
The continuous aggregate is refreshed either when you manually call
58+
`refresh_continuous_aggregate` or when a continuous aggregate policy is executed.
59+
This worked example shows the expected behavior of continuous aggregates, when
60+
real time aggregation is enabled.
6061

6162
Create and fill the hypertable:
6263
```sql
@@ -87,7 +88,8 @@ INSERT INTO conditions (day, city, temperature) VALUES
8788
('2021-06-27', 'Moscow', 31);
8889
```
8990

90-
Create a real-time aggregate, but don't refresh the data:
91+
Create a continuous aggregate but do not materialize any data. Note that real
92+
time aggregation is enabled by default:
9193
```sql
9294
CREATE MATERIALIZED VIEW conditions_summary
9395
WITH (timescaledb.continuous) AS
@@ -99,18 +101,21 @@ FROM conditions
99101
GROUP BY city, bucket
100102
WITH NO DATA;
101103

104+
The select query returns data as real time aggregates are enabled. The query on
105+
the continuous aggregate fetches data directly from the hypertable:
102106
SELECT * FROM conditions_summary ORDER BY bucket;
103107
city | bucket | min | max
104108
--------+------------+-----+-----
105109
Moscow | 2021-06-14 | 22 | 30
106110
Moscow | 2021-06-21 | 31 | 34
107111
```
108112

109-
Refresh the data:
113+
Materialize data into the continuous aggregate:
110114
```
111115
CALL refresh_continuous_aggregate('conditions_summary', '2021-06-14', '2021-06-21');
112116
113-
-- The CAGG didn't change, that's expected
117+
The select query returns the same data, as expected, but this time the data is
118+
fetched from the underlying materialized table
114119
SELECT * FROM conditions_summary ORDER BY bucket;
115120
city | bucket | min | max
116121
--------+------------+-----+-----
@@ -125,8 +130,9 @@ SET temperature = 35
125130
WHERE day = '2021-06-14' and city = 'Moscow';
126131
```
127132

128-
The updated data is not yet visible in the continuous aggregate. Additionally,
129-
INSERT and DELETE are not visible:
133+
The updated data is not yet visible when you query the continuous aggregate. This
134+
is because these changes have not been materialized.( Similarly, any
135+
INSERTs or DELETEs would also not be visible).
130136
```sql
131137
SELECT * FROM conditions_summary ORDER BY bucket;
132138
city | bucket | min | max
@@ -135,7 +141,7 @@ SELECT * FROM conditions_summary ORDER BY bucket;
135141
Moscow | 2021-06-21 | 31 | 34
136142
```
137143

138-
Refresh the data again to see the updates:
144+
Refresh the data again to update the previously materialized region:
139145
```sql
140146
CALL refresh_continuous_aggregate('conditions_summary', '2021-06-14', '2021-06-21');
141147

@@ -159,8 +165,8 @@ aggregates like `SUM` and `AVG`. You can also use more complex expressions on
159165
top of the aggregate functions, for example `max(temperature)-min(temperature)`.
160166

161167
However, aggregates using `ORDER BY` and `DISTINCT` cannot be used with
162-
continuous aggregates since they are not possible to parallelize with
163-
PostgreSQL. TimescaleDB does not currently support `FILTER` or `JOIN` clauses,
168+
continuous aggregates since they cannot be parallelized with
169+
PostgreSQL. TimescaleDB does not support `FILTER` or `JOIN` clauses,
164170
or window functions in continuous aggregates.
165171

166172
[postgres-parallel-agg]: https://www.postgresql.org/docs/current/parallel-plans.html#PARALLEL-AGGREGATION

0 commit comments

Comments
 (0)