Doc changes for compression on continuous aggregates feature (#666)

gayyappan · web-flow · commit 99f77c265b3e · 2021-12-23T08:59:46.000-05:00
API changes
Add a page that describes compression on continuous aggregates
under the How-To guides (for continuous aggregates).
diff --git a/api/add_compression_policy.md b/api/add_compression_policy.md
@@ -2,18 +2,21 @@
 Allows you to set a policy by which the system compresses a chunk
 automatically in the background after it reaches a given age.
 
-Note that compression policies can only be created on hypertables that already
-have compression enabled, e.g., via the [`ALTER TABLE`][compression_alter-table] command
-to set `timescaledb.compress` and other configuration parameters.
+Note that compression policies can only be created on hypertables or continuous
+aggregates that already have compression enabled. Use the [`ALTER TABLE`][compression_alter-table] command
+to set `timescaledb.compress` and other configuration parameters for hypertables.
+Use [`ALTER MATERIALIZED VIEW`][compression_continuous-aggregate] command to
+enable compression on continuous aggregated
 
-### Required Arguments
+### Required arguments
 
 |Name|Type|Description|
 |---|---|---|
-| `hypertable` |REGCLASS| Name of the hypertable|
+| `hypertable` |REGCLASS| Name of the hypertable or continuous aggregate|
 | `compress_after` | INTERVAL or INTEGER | The age after which the policy job compresses chunks|
 
-The `compress_after` parameter should be specified differently depending on the type of the time column of the hypertable:
+The `compress_after` parameter should be specified differently depending 
+on the type of the time column of the hypertable or continuous aggregate:
 - For hypertables with TIMESTAMP, TIMESTAMPTZ, and DATE time columns: the time interval should be an INTERVAL type.
 - For hypertables with integer-based timestamps: the time interval should be an integer type (this requires
 the [integer_now_func][set_integer_now_func] to be set).
@@ -24,7 +27,14 @@ the [integer_now_func][set_integer_now_func] to be set).
 |---|---|---|
 | `if_not_exists` | BOOLEAN | Setting to true causes the command to fail with a warning instead of an error if a compression policy already exists on the hypertable. Defaults to false.|
 
-### Sample Usage
+<highlight type="important">
+Compression policies on continuous aggregates should be set up so that they do
+not overlap with refresh policies on continuous aggregates. This is due to a 
+current TimescaleDB limitation that prevents refresh of compressed regions of
+continuous aggregates.
+</highlight>
+
+### Sample usage
 Add a policy to compress chunks older than 60 days on the 'cpu' hypertable.
 
 ``` sql
@@ -37,6 +47,12 @@ Add a compress chunks policy to a hypertable with an integer-based time column:
 SELECT add_compression_policy('table_with_bigint_time', BIGINT '600000');
 ```
 
+Add a policy to compress chunks of a continuous aggregate called `cpu_weekly`, that are 
+older than eight weeks:
+``` sql
+SELECT add_compression_policy('cpu_weekly', INTERVAL '8 weeks');
+```
 
 [compression_alter-table]: /api/:currentVersion:/compression/alter_table_compression/
+[compression_continuous-aggregate]: /api/:currentVersion:/continuous-aggregates/alter_materialized_view/
 [set_integer_now_func]: /hypertable/set_integer_now_func
diff --git a/api/alter_materialized_view.md b/api/alter_materialized_view.md
@@ -18,17 +18,28 @@ ALTER MATERIALIZED VIEW <view_name> SET ( timescaledb.<option> =  <value> [, ...
 |---|---|---|
 | `<view_name>` | TEXT | Name (optionally schema-qualified) of continuous aggregate view to be created.|
 
-### Sample Usage
+### Options
+|Name|Description|
+|-|-|
+|timescaledb.materialized_only|Enable and disable real time aggregation|
+|timescaledb.compress|Enable and disable compression|
 
+### Sample usage
 To disable real-time aggregates for a
 continuous aggregate:
 
 ```sql
 ALTER MATERIALIZED VIEW contagg_view SET (timescaledb.materialized_only = true);
 ```
 
-The only option that currently can be modified with `ALTER
-MATERIALIZED VIEW` is `materialized_only`. The other options
+To enable compression for a continuous aggregate:
+
+```sql
+ALTER MATERIALIZED VIEW contagg_view SET (timescaledb.compress = true);
+```
+
+The only options that currently can be modified with `ALTER
+MATERIALIZED VIEW` are `materialized_only` and `compress`. The other options
 `continuous` and `create_group_indexes` can only be set when creating
 the continuous aggregate.
 
diff --git a/api/continuous_aggregates.md b/api/continuous_aggregates.md
@@ -11,7 +11,8 @@ Get metadata and settings information for continuous aggregates.
 |`view_schema` | TEXT | Schema for continuous aggregate view |
 |`view_name` | TEXT | User supplied name for continuous aggregate view |
 |`view_owner` | TEXT | Owner of the continuous aggregate view|
-|`materialized_only` | BOOLEAN | Return only materialized data when querying the continuous aggregate view. |
+|`materialized_only` | BOOLEAN | Return only materialized data when querying the continuous aggregate view|
+|`compression_enabled` | BOOLEAN | Is compression enabled for the continuous aggregate view?|
 |`materialization_hypertable_schema` | TEXT | Schema of the underlying materialization table|
 |`materialization_hypertable_name` | TEXT | Name of the underlying materialization table|
 |`view_definition` | TEXT | `SELECT` query for continuous aggregate view|
@@ -27,11 +28,12 @@ view_schema                       | public
 view_name                         | contagg_view
 view_owner                        | postgres
 materialized_only                 | f
+compression_enabled               | f
 materialization_hypertable_schema | _timescaledb_internal
 materialization_hypertable_name   | _materialized_hypertable_2
 view_definition                   |  SELECT foo.a,                                  +
                                   |     COUNT(foo.b) AS countb                      +
                                   |    FROM foo                                     +
                                   |   GROUP BY (time_bucket('1 day', foo.a)), foo.a;
 
-```
+```
diff --git a/api/remove_compression_policy.md b/api/remove_compression_policy.md
@@ -4,11 +4,10 @@ If you need to remove the compression policy. To re-start policy-based compressi
 ### Required Arguments
 
 |Name|Type|Description|
-|---|---|---|
-| `hypertable` | REGCLASS | Name of the hypertable the policy should be removed from.|
-
-### Optional Arguments
+|-|-|-|
+|`hypertable`|REGCLASS|Name of the hypertable or continuous aggregate the policy should be removed from|
 
+### Optional arguments
 |Name|Type|Description|
 |---|---|---|
 | `if_exists` | BOOLEAN | Setting to true causes the command to fail with a notice instead of an error if a compression policy does not exist on the hypertable. Defaults to false.|
@@ -18,3 +17,8 @@ Remove the compression policy from the 'cpu' table:
 ``` sql
 SELECT remove_compression_policy('cpu');
 ```
+
+Remove the compression policy from the 'cpu_weekly' continuous aggregate:
+``` sql
+SELECT remove_compression_policy('cpu_weekly');
+```
diff --git a/timescaledb/how-to-guides/continuous-aggregates/compression-on-continuous-aggregates.md b/timescaledb/how-to-guides/continuous-aggregates/compression-on-continuous-aggregates.md
@@ -0,0 +1,63 @@
+# Compression on continuous aggregates
+Continuous aggregates are often used to store downsampled historical data.
+The historical data is almost never modified or recomputed and is only used 
+for serving analytic queries. For this use case, it is often beneficial to 
+store the materialized data in compressed form to save on storage costs. 
+You can get these cost savings by enabling compression on continuous 
+aggregates.
+
+Currently, TimescaleDB does not support refreshing compressed regions of a 
+continuous aggregate. To do this, you have to manually decompress 
+the compressed chunk and then execute a `refresh_continuous_aggregate` call.
+
+## Enable compression on continuous aggregates
+You can enable and disable compression on continuous aggregated by setting
+`compress` parameter when you alter the view.
+
+<procedure>
+
+### Enabling and disabling compression on continuous aggregates
+1.  For an existing continuous aggregate, at the `psql` prompt, enable
+ compression:
+    ```sql
+    ALTER MATERIALIZED VIEW cagg_name set (timescaledb.compress = true);
+    ```
+1.  Disable compression:
+    ```sql
+    ALTER MATERIALIZED VIEW cagg_name set (timescaledb.compress = false);
+    ```
+</procedure>
+The decompress command fails if there are compressed chunks associated with the 
+continuous aggregate. In this case, you need to decompress the chunks, and then 
+drop any compression policy on the continuous aggregate, before you disable 
+compression. For more detailed information, see the
+[decompress chunks] [decompress-chunks] section:
+```sql
+SELECT decompress_chunk(c, true) FROM show_chunks('cagg_name') c;
+ 
+
+## Compression policies on continuous aggregates
+Before  setting up a compression policy on a continuous aggregate, you should
+set up a refresh policy. The compression policy interval should be set so that
+actively refreshed regions are not compressed. This is to prevent refresh
+policies from failing. For example, consider a refresh policy like this:
+
+```sql
+SELECT add_continuous_aggregate_policy('cagg_name', refresh_start=>'30 days', refresh_end=>'1 day', '1 h');
+```
+
+With this kind of refresh policy, the compression policy needs the `compress_after` 
+parameter greater than the `refresh_start` parameter of the continuous aggregate policy:
+```sql
+SELECT add_compression_policy('cagg_name', compress_after=>'45 days'::interval);
+```
+
+After a chunk is compressed, manual refresh calls that attempt to refresh the 
+continuous aggregate's compressed region will fail with an error like this:
+
+```sql
+CALL refresh_continuous_aggregate('cagg_name', NULL, now() - '30 days'::interval );
+ERROR:  cannot update/delete rows from chunk "_hyper_3_3_chunk" as it is compressed
+```
+
+[decompress-chunks]:  how-to-guides/compression/decompress-chunks.md 
diff --git a/timescaledb/how-to-guides/continuous-aggregates/index.md b/timescaledb/how-to-guides/continuous-aggregates/index.md
@@ -13,6 +13,7 @@ only the data that has changed needs to be computed, not the entire dataset.
 *   [Drop data][cagg-drop] from your continuous aggregates.
 *   [Manage materialized hypertables][cagg-mat-hypertables].
 *   [Use real-time aggregates][cagg-realtime].
+*   [Compression with continuous aggregates][cagg-compression].
 *   [Troubleshoot][cagg-tshoot] continuous aggregates.
 
 
@@ -24,4 +25,5 @@ only the data that has changed needs to be computed, not the entire dataset.
 [cagg-drop]: /how-to-guides/continuous-aggregates/drop-data
 [cagg-mat-hypertables]: /how-to-guides/continuous-aggregates/materialized-hypertables
 [cagg-realtime]: /how-to-guides/continuous-aggregates/real-time-aggregates
+[cagg-compression]: /how-to-guides/continuous-aggregates/compression-on-continuous-aggregates
 [cagg-tshoot]: /how-to-guides/continuous-aggregates/troubleshooting
diff --git a/timescaledb/how-to-guides/continuous-aggregates/troubleshooting.md b/timescaledb/how-to-guides/continuous-aggregates/troubleshooting.md
@@ -12,13 +12,13 @@ with continuous aggregates.
 * Copy this comment at the top of every troubleshooting page
 -->
 
-## Compression policies
+## Retention policies
 If you have hypertables that use a different retention policy to your continuous
 aggregates, the retention policies are applied separately.  The retention policy
 on a hypertable determines how long the raw data is kept for. The retention
 policy on a continuous aggregate determines how long the continuous aggregate is
 kept for. For  example, if you have a hypertable with a retention policy of a
-week, but a continuous aggregate with a retention policy of a month, the raw
+week and a continuous aggregate with a retention policy of a month, the raw
 data is kept for a week, and the continuous aggregate is kept for a month.
 
 ## Insert irregular data into a continuous aggregate
@@ -50,13 +50,14 @@ be hard to refresh and would make more sense to isolate these columns in another
 hypertable. Alternatively, you might create one hypertable per metric and
 refresh them independently.
 
-### New data is not shown in real-time aggregates
+### Updates to previously materialized regions are not shown in real-time aggregates
 If you have a time bucket that has already been materialized, the real-time
-aggregate won't show the data that has been inserted, updated, or deleted. In
-this worked example, `refresh_continuous_aggregate()` is called for the data
-that is not going to change. When you need to change data that has already been
-materialized, use `refresh_continuous_aggregate()` for the corresponding
-buckets.
+aggregate does not show the data that has been inserted, updated, or deleted 
+into that bucket until the next `refresh_continuous_aggregate` call is executed.
+The continuous aggregate is refreshed either when you manually call 
+`refresh_continuous_aggregate` or when a continuous aggregate policy is executed. 
+This worked example shows the expected behavior of continuous aggregates, when
+real time aggregation is enabled.
 
 Create and fill the hypertable:
 ```sql
@@ -87,7 +88,8 @@ INSERT INTO conditions (day, city, temperature) VALUES
   ('2021-06-27', 'Moscow', 31);
 ```
 
-Create a real-time aggregate, but don't refresh the data:
+Create a continuous aggregate but do not materialize any data. Note that real
+ time aggregation is enabled by default:
 ```sql
 CREATE MATERIALIZED VIEW conditions_summary
 WITH (timescaledb.continuous) AS
@@ -99,18 +101,21 @@ FROM conditions
 GROUP BY city, bucket
 WITH NO DATA;
 
+The select query returns data as real time aggregates are enabled. The query on 
+the continuous aggregate fetches data directly from the hypertable:
 SELECT * FROM conditions_summary ORDER BY bucket;
   city  |   bucket   | min | max
 --------+------------+-----+-----
  Moscow | 2021-06-14 |  22 |  30
  Moscow | 2021-06-21 |  31 |  34
  ```
 
-Refresh the data:
+Materialize data into the continuous aggregate:
 ```
 CALL refresh_continuous_aggregate('conditions_summary', '2021-06-14', '2021-06-21');
 
--- The CAGG didn't change, that's expected
+The select query returns the same data, as expected, but this time the data is 
+fetched from the underlying materialized table
 SELECT * FROM conditions_summary ORDER BY bucket;
   city  |   bucket   | min | max
 --------+------------+-----+-----
@@ -125,8 +130,9 @@ SET temperature = 35
 WHERE day = '2021-06-14' and city = 'Moscow';
 ```
 
-The updated data is not yet visible in the continuous aggregate. Additionally,
-INSERT and DELETE are not visible:
+The updated data is not yet visible when you query the continuous aggregate. This
+is because these changes have not been materialized.( Similarly, any
+INSERTs or DELETEs would also not be visible).
 ```sql
 SELECT * FROM conditions_summary ORDER BY bucket;
   city  |   bucket   | min | max
@@ -135,7 +141,7 @@ SELECT * FROM conditions_summary ORDER BY bucket;
  Moscow | 2021-06-21 |  31 |  34
 ```
 
-Refresh the data again to see the updates:
+Refresh the data again to update the previously materialized region:
 ```sql
 CALL refresh_continuous_aggregate('conditions_summary', '2021-06-14', '2021-06-21');
 
@@ -159,8 +165,8 @@ aggregates like `SUM` and `AVG`. You can also use more complex expressions on
 top of the aggregate functions, for example `max(temperature)-min(temperature)`.
 
 However, aggregates using `ORDER BY` and `DISTINCT` cannot be used with
-continuous aggregates since they are not possible to parallelize with
-PostgreSQL. TimescaleDB does not currently support `FILTER` or `JOIN` clauses,
+continuous aggregates since they cannot be parallelized with
+PostgreSQL. TimescaleDB does not support `FILTER` or `JOIN` clauses,
 or window functions in continuous aggregates.
 
 [postgres-parallel-agg]: https://www.postgresql.org/docs/current/parallel-plans.html#PARALLEL-AGGREGATION