stackabletech · fhennig · Mar 27, 2023 · Mar 27, 2023 · Mar 28, 2023 · Mar 28, 2023
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,11 +12,13 @@
 - `operator-rs` `0.31.0` -> `0.35.0` ([#322], [#326]).
 - Bumped stackable image versions to "23.4.0-rc2" ([#322], [#326]).
 - Fragmented `SupersetConfig` ([#323]).
+- Restructured documentation ([#344]).
 
 [#322]: https://github.com/stackabletech/superset-operator/pull/322
 [#323]: https://github.com/stackabletech/superset-operator/pull/323
 [#326]: https://github.com/stackabletech/superset-operator/pull/326
 [#337]: https://github.com/stackabletech/superset-operator/pull/337
+[#344]: https://github.com/stackabletech/superset-operator/pull/344
 
 ## [23.1.0] - 2023-01-23
 

diff --git a/docs/modules/superset/images/superset_overview.drawio.svg b/docs/modules/superset/images/superset_overview.drawio.svg
diff --git a/docs/modules/superset/pages/getting_started/first_steps.adoc b/docs/modules/superset/pages/getting_started/first_steps.adoc
@@ -106,4 +106,4 @@ Great! You have set up a Superset instance and connected to it!
 
 == What's next
 
-Look at the xref:usage.adoc[Usage page] to find out more about configuring your Superset instance or have a look at the Superset documentation to link:https://superset.apache.org/docs/creating-charts-dashboards/creating-your-first-dashboard[create your first dashboard].
+Look at the xref:usage-guide/index.adoc[] to find out more about configuring your Superset instance or have a look at the Superset documentation to https://superset.apache.org/docs/creating-charts-dashboards/creating-your-first-dashboard[create your first dashboard].
diff --git a/docs/modules/superset/pages/index.adoc b/docs/modules/superset/pages/index.adoc
@@ -1,20 +1,59 @@
 = Stackable Operator for Apache Superset
+:description: The Stackable Operator for Apache Superset is a Kubernetes operator that can manage Apache Superset clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Superset versions.
+:keywords: Stackable Operator, Apache Superset, Kubernetes, operator, data science, data exploration, SQL, engineer, big data, CRD, StatefulSet, ConfigMap, Service, Druid, Trino, S3, demo, version
 
-This is an operator for Kubernetes that can manage https://superset.apache.org/[Apache Superset]
-clusters.
+The Stackable Operator for Apache Superset is an operator that can deploy and manage https://superset.apache.org/[Apache Superset] clusters on Kubernetes. Superset is a data exploration and visualization tool that connects to data sources via SQL. Store your data in Apache Druid or Trino, and manage your Druid and Trino instances with the Stackable Operators for xref:druid:index.adoc[Apache Druid] or xref:trino:index.adoc[Trino]. This operator helps you manage your Superset instances on Kubernetes efficiently.
 
-WARNING: This operator is part of the Stackable Data Platform and only works with images from the
-https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fsuperset[Stackable] repository.
+== Getting started
+
+Get started using Superset with Stackable Operator by following the xref:getting_started/index.adoc[]. It guides you through installing the Operator alongside a PostgreSQL database, connecting to your Superset instance and analyzing some preloaded example data.
+
+== Resources
+
+The Operator manages three https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/[custom resources]: The _SupersetCluster_, _SupersetDB_ and _DruidConnection_. It creates a number of different Kubernetes resources based on the custom resources.
+
+=== Custom resources
+
+The SupersetCluster is the main resource for the configuration of the Superset instance. The resource defines only one xref:concepts:roles-and-role-groups.adoc[role], the `node`. The various configuration options are explained in the xref:usage-guide/index.adoc[]. It helps you tune your cluster to your needs by configuring xref:usage-guide/storage-resource-configuration.adoc[resource usage], xref:usage-guide/security.adoc[security], xref:usage-guide/logging.adoc[logging] and more.
+
+When a SupersetCluster is first deployed, a SupersetDB resource is created. The SupersetDB resource is a wrapper resource for the SQL database that is used by Superset for its metadata. The resource contains some configuration but also keeps track of whether the database has been initialized or not. It is not deleted automatically if a SupersetCluster is deleted, and so can be reused.
+
+DruidConnection resources link a Superset and Druid instance. It lets you define this connection in the familiar way of deploying a resource (instead of configuring the connection via the Superset UI or API). The operator configures the connection between Druid and the Superset instance.
+
+=== Kubernetes resources
+
+Based on the custom resources you define, the Operator creates ConfigMaps, StatefulSets and Services.
+
+image::superset_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the operator]
+
+The diagram above depicts all the Kubernetes resources created by the operator, and how they relate to each other. The Jobs created for the SupersetDB and DruidConnnection resources are not shown.
+
+For every xref:concepts:roles-and-role-groups.adoc#_role_groups[role group] you define, the Operator creates a StatefulSet with the amount of replicas defined in the RoleGroup. Every Pod in the StatefulSet has two containers: the main container running Superset and a sidecar container gathering metrics for xref:operators:monitoring.adoc[]. The Operator creates a Service for the `node` role as well as a single service per role group.
+
+ConfigMaps are created, one per RoleGroup and also one for the SupersetDB. Both ConfigMaps contains two files: `log_config.py` and `superset_config.py` which contain logging and general Superset configuration respectively.
+
+== Dependencies
+
+Superset requires an SQL database in which to store its metadata, dashboards and users. The Stackable platform does not have its own Operator for an SQL database but the xref:getting_started/index.adoc[] guides you through installing an example database with a Superset instance that you can use to get started.
+
+== Connecting to data sources
+
+Superset does not store its own data, instead it connects to other products where data is stored. On the Stackable Platform the two commonly used choices are xref:druid:index.adoc[Apache Druid] and xref:trino:index.adoc[Trino]. For Druid there is a way to xref:usage-guide/connecting-druid.adoc[connect a Druid instance declaratively] with a custom resource. For Trino this is on the roadmap. Have a look at the demos linked <<demos, below>> for examples of using Superset with Druid or Trino.
+
+== [[demos]]Demos
+
+Many of the Stackable xref:stackablectl::demos/index.adoc[demos] use Superset in the stack for data visualization and explaration. The demos come in two main variants.
+
+=== With Druid
+
+The xref:stackablectl::demos/nifi-kafka-druid-earthquake-data.adoc[] and xref:stackablectl::demos/nifi-kafka-druid-water-level-data.adoc[] demos show Superset connected to xref:druid:index.adoc[Druid], exploring earthquake and water level data respectively.
+
+=== With Trino
+
+The xref:stackablectl::demos/spark-k8s-anomaly-detection-taxi-data.adoc[], xref:stackablectl::demos/trino-taxi-data.adoc[], xref:stackablectl::demos/trino-iceberg.adoc[] and xref:stackablectl::demos/data-lakehouse-iceberg-trino-spark.adoc[] demos all use a xref:trino:index.adoc[Trino] instance on top of S3 storage that hold data to analyze. Superset is connected to Trino to analyze a variety of different datasets.
 
 == Supported Versions
 
 The Stackable Operator for Apache Superset currently supports the following versions of Superset:
 
 include::partial$supported-versions.adoc[]
-
-== Getting the Docker image
-
-[source]
-----
-docker pull docker.stackable.tech/stackable/superset:<version>-stackable<stackable-version>
-----
diff --git a/docs/modules/superset/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/superset/pages/usage-guide/configuration-environment-overrides.adoc
@@ -0,0 +1,88 @@
+= Configuration & Environment Overrides
+
+The cluster definition also supports overriding configuration properties and environment variables,
+either per role or per role group, where the more specific override (role group) has precedence over
+the less specific one (role).
+
+IMPORTANT: Overriding certain properties which are set by the operator (such as the `STATS_LOGGER`)
+can interfere with the operator and can lead to problems.
+
+== Configuration Properties
+
+For a role or role group, at the same level of `config`, you can specify `configOverrides` for the
+`superset_config.py`. For example, if you want to set the CSV export encoding and the preferred
+databases adapt the `nodes` section of the cluster resource as follows:
+
+[source,yaml]
+----
+nodes:
+  roleGroups:
+    default:
+      config: {}
+      configOverrides:
+        superset_config.py:
+          CSV_EXPORT: "{'encoding': 'utf-8'}"
+          PREFERRED_DATABASES: |-
+            [
+                'PostgreSQL',
+                'Presto',
+                'MySQL',
+                'SQLite',
+                # etc.
+            ]
+----
+
+Just as for the `config`, it is possible to specify this at the role level as well:
+
+[source,yaml]
+----
+nodes:
+  configOverrides:
+    superset_config.py:
+      CSV_EXPORT: "{'encoding': 'utf-8'}"
+      PREFERRED_DATABASES: |-
+        [
+            'PostgreSQL',
+            'Presto',
+            'MySQL',
+            'SQLite',
+            # etc.
+        ]
+  roleGroups:
+    default:
+      config: {}
+----
+
+All override property values must be strings. They are treated as Python expressions. So care must
+be taken to produce a valid configuration.
+
+For a full list of configuration options we refer to the
+https://github.com/apache/superset/blob/master/superset/config.py[main config file for Superset].
+
+== Environment Variables
+
+In a similar fashion, environment variables can be (over)written. For example per role group:
+
+[source,yaml]
+----
+nodes:
+  roleGroups:
+    default:
+      config: {}
+      envOverrides:
+        FLASK_ENV: development
+----
+
+or per role:
+
+[source,yaml]
+----
+nodes:
+  envOverrides:
+    FLASK_ENV: development
+  roleGroups:
+    default:
+      config: {}
+----
+
+// cliOverrides don't make sense for this operator, so the feature is omitted for now
diff --git a/docs/modules/superset/pages/usage-guide/connecting-druid.adoc b/docs/modules/superset/pages/usage-guide/connecting-druid.adoc
@@ -0,0 +1,33 @@
+= Connecting Apache Druid clusters
+
+The operator can automatically connect Superset to Apache Druid clusters managed by the https://docs.stackable.tech/druid/index.html[Stackable Druid Cluster].
+
+To do so, create a _DruidConnection_ resource:
+
+[source,yaml]
+----
+apiVersion: superset.stackable.tech/v1alpha1
+kind: DruidConnection
+metadata:
+  name: superset-druid-connection
+spec:
+  superset:
+    name: superset
+    namespace: default
+  druid:
+    name: my-druid-cluster
+    namespace: default
+
+----
+
+The `name` and `namespace` in `spec.superset` refer to the Superset cluster that you want to connect. Following our example above, the name is `superset`.
+
+In `spec.druid` you specify the `name` and `namespace` of your Druid cluster.
+
+The `namespace` part is optional; if it is omitted it will default to the namespace of the DruidConnection.
+
+The namespace for both the Superset and Druid clusters can be omitted, in which case the Operator will assume that they are in the same namespace as the DruidConnection.
+
+Once the database is initialized, the connection will be added to the cluster by the operator. You can see it in the user interface under Data > Databases:
+
+image::superset-databases.png[Superset databases showing the connected Druid cluster]
diff --git a/docs/modules/superset/pages/usage-guide/index.adoc b/docs/modules/superset/pages/usage-guide/index.adoc
@@ -0,0 +1,8 @@
+= Usage guide
+:page-aliases: usage.doc
+
+The usage guide covers various aspects of configuring Superset and interconnection with other tools.
+
+Learn about defining the amount of xref:usage-guide/storage-resource-configuration.adoc[resources] Superset uses and how to configure xref:usage-guide/pod-placement.adoc[]. Learn how to xref:usage-guide/connecting-druid.adoc[connect to Apache Druid] operated by the xref:druid:index.adoc[].
+
+Configure xref:usage-guide/security.adoc#authentication[authentication] with LDAP and xref:usage-guide/security.adoc#authorization[authorization]. Observe your Superset instance with xref:usage-guide/logging.adoc[log aggregation] and xref:usage-guide/monitoring.adoc[monitoring].
diff --git a/docs/modules/superset/pages/usage-guide/logging.adoc b/docs/modules/superset/pages/usage-guide/logging.adoc
@@ -0,0 +1,20 @@
+= Log aggregation
+
+The logs can be forwarded to a Vector log aggregator by providing a discovery
+ConfigMap for the aggregator and by enabling the log agent:
+
+[source,yaml]
+----
+spec:
+  vectorAggregatorConfigMapName: vector-aggregator-discovery
+  nodes:
+    config:
+      logging:
+        enableVectorAgent: true
+  databaseInitialization:
+    logging:
+      enableVectorAgent: true
+----
+
+Further information on how to configure logging, can be found in
+xref:home:concepts:logging.adoc[].
diff --git a/docs/modules/superset/pages/usage-guide/monitoring.adoc b/docs/modules/superset/pages/usage-guide/monitoring.adoc
@@ -0,0 +1,4 @@
+= Monitoring
+
+The managed Superset instances are automatically configured to export Prometheus metrics. See
+xref:home:operators:monitoring.adoc[] for more details.
diff --git a/docs/modules/superset/pages/usage-guide/pod-placement.adoc b/docs/modules/superset/pages/usage-guide/pod-placement.adoc
@@ -0,0 +1,7 @@
+= Pod Placement
+
+You can configure the Pod placement of the Superset pods as described in xref:concepts:pod_placement.adoc[].
+
+The default affinities created by the operator are:
+
+1. Distribute all the Superset Pods (weight 70)
diff --git a/docs/modules/superset/pages/usage-guide/security.adoc b/docs/modules/superset/pages/usage-guide/security.adoc
@@ -0,0 +1,61 @@
+= Security
+
+== [[authentication]]Authentication
+Every user has to be authenticated before using Superset: there are several ways in which this can be set up.
+
+=== Webinterface
+The default setting is to manually set up users via the Webinterface.
+
+=== LDAP
+
+Superset supports xref:nightly@home:concepts:authentication.adoc[authentication] of users against an LDAP server. This requires setting up an xref:nightly@home:concepts:authentication.adoc#authenticationclass[AuthenticationClass] for the LDAP server.
+The AuthenticationClass is then referenced in the SupersetCluster resource as follows:
+
+[source,yaml]
+----
+apiVersion: superset.stackable.tech/v1alpha1
+kind: SupersetCluster
+metadata:
+  name: superset-with-ldap-server
+spec:
+  image:
+    productVersion: 1.5.1
+    stackableVersion: 23.4.0-rc2
+  [...]
+  authenticationConfig:
+    authenticationClass: ldap    # <1>
+    userRegistrationRole: Admin  # <2>
+----
+
+<1> The reference to an AuthenticationClass called `ldap`
+<2> The default role to which all users are assigned
+
+Users that log in with LDAP are assigned to a default https://superset.apache.org/docs/security/#roles[Role] which is specified with the `userRegistrationRole` property.
+
+You can follow the xref:nightly@home:tutorials:authentication_with_openldap.adoc[] tutorial to learn how to set up an AuthenticationClass for an LDAP server, as well as consulting the xref:nightly@home:reference:authenticationclass.adoc[] reference.
+
+== [[authorization]]Authorization
+Superset has a concept called `Roles` which allows you to grant user permissions based on roles.
+Have a look at the https://superset.apache.org/docs/security[Superset documentation on Security].
+
+=== Webinterface
+You can view all the available roles in the Webinterface of Superset and can also assign users to these roles.
+
+=== LDAP
+Superset supports assigning https://superset.apache.org/docs/security/#roles[Roles] to users based on their LDAP group membership, though this is not yet supported by the Stackable operator.
+All the users logging in via LDAP get assigned to the same role which you can configure via the attribute `authenticationConfig.userRegistrationRole` on the `SupersetCluster` object:
+
+[source,yaml]
+----
+apiVersion: superset.stackable.tech/v1alpha1
+kind: SupersetCluster
+metadata:
+  name: superset-with-ldap-server
+spec:
+  [...]
+  authenticationConfig:
+    authenticationClass: ldap
+    userRegistrationRole: Admin  # <1>
+----
+
+<1> All users are assigned to the `Admin` role
diff --git a/docs/modules/superset/pages/usage-guide/storage-resource-configuration.adoc b/docs/modules/superset/pages/usage-guide/storage-resource-configuration.adoc
@@ -0,0 +1,23 @@
+= Storage and resource configuration
+
+== Resource Requests
+
+include::concepts:stackable_resource_requests.adoc[]
+
+If no resource requests are configured explicitly, the Superset operator uses the following defaults:
+
+[source,yaml]
+----
+nodes:
+  roleGroups:
+    default:
+      config:
+        resources:
+          cpu:
+            min: '200m'
+            max: "4"
+          memory:
+            limit: '2Gi'
+----
+
+WARNING: The default values are _most likely_ not sufficient to run a production cluster. Please adapt according to your requirements.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -106,4 +106,4 @@ Great! You have set up a Superset instance and connected to it!

		== What's next

		Look at the xref:usage.adoc[Usage page] to find out more about configuring your Superset instance or have a look at the Superset documentation to link:https://superset.apache.org/docs/creating-charts-dashboards/creating-your-first-dashboard[create your first dashboard].
		Look at the xref:usage-guide/index.adoc[] to find out more about configuring your Superset instance or have a look at the Superset documentation to https://superset.apache.org/docs/creating-charts-dashboards/creating-your-first-dashboard[create your first dashboard].