Skip to content

[Merged by Bors] - restructured usage guide & new landing page #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@
- `operator-rs` `0.31.0` -> `0.35.0` ([#322], [#326]).
- Bumped stackable image versions to "23.4.0-rc2" ([#322], [#326]).
- Fragmented `SupersetConfig` ([#323]).
- Restructured documentation ([#344]).

[#322]: https://github.com/stackabletech/superset-operator/pull/322
[#323]: https://github.com/stackabletech/superset-operator/pull/323
[#326]: https://github.com/stackabletech/superset-operator/pull/326
[#337]: https://github.com/stackabletech/superset-operator/pull/337
[#344]: https://github.com/stackabletech/superset-operator/pull/344

## [23.1.0] - 2023-01-23

Expand Down
4 changes: 4 additions & 0 deletions docs/modules/superset/images/superset_overview.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -106,4 +106,4 @@ Great! You have set up a Superset instance and connected to it!

== What's next

Look at the xref:usage.adoc[Usage page] to find out more about configuring your Superset instance or have a look at the Superset documentation to link:https://superset.apache.org/docs/creating-charts-dashboards/creating-your-first-dashboard[create your first dashboard].
Look at the xref:usage-guide/index.adoc[] to find out more about configuring your Superset instance or have a look at the Superset documentation to https://superset.apache.org/docs/creating-charts-dashboards/creating-your-first-dashboard[create your first dashboard].
61 changes: 50 additions & 11 deletions docs/modules/superset/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,20 +1,59 @@
= Stackable Operator for Apache Superset
:description: The Stackable Operator for Apache Superset is a Kubernetes operator that can manage Apache Superset clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Superset versions.
:keywords: Stackable Operator, Apache Superset, Kubernetes, operator, data science, data exploration, SQL, engineer, big data, CRD, StatefulSet, ConfigMap, Service, Druid, Trino, S3, demo, version

This is an operator for Kubernetes that can manage https://superset.apache.org/[Apache Superset]
clusters.
The Stackable Operator for Apache Superset is an operator that can deploy and manage https://superset.apache.org/[Apache Superset] clusters on Kubernetes. Superset is a data exploration and visualization tool that connects to data sources via SQL. Store your data in Apache Druid or Trino, and manage your Druid and Trino instances with the Stackable Operators for xref:druid:index.adoc[Apache Druid] or xref:trino:index.adoc[Trino]. This operator helps you manage your Superset instances on Kubernetes efficiently.

WARNING: This operator is part of the Stackable Data Platform and only works with images from the
https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fsuperset[Stackable] repository.
== Getting started

Get started using Superset with Stackable Operator by following the xref:getting_started/index.adoc[]. It guides you through installing the Operator alongside a PostgreSQL database, connecting to your Superset instance and analyzing some preloaded example data.

== Resources

The Operator manages three https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/[custom resources]: The _SupersetCluster_, _SupersetDB_ and _DruidConnection_. It creates a number of different Kubernetes resources based on the custom resources.

=== Custom resources

The SupersetCluster is the main resource for the configuration of the Superset instance. The resource defines only one xref:concepts:roles-and-role-groups.adoc[role], the `node`. The various configuration options are explained in the xref:usage-guide/index.adoc[]. It helps you tune your cluster to your needs by configuring xref:usage-guide/storage-resource-configuration.adoc[resource usage], xref:usage-guide/security.adoc[security], xref:usage-guide/logging.adoc[logging] and more.

When a SupersetCluster is first deployed, a SupersetDB resource is created. The SupersetDB resource is a wrapper resource for the SQL database that is used by Superset for its metadata. The resource contains some configuration but also keeps track of whether the database has been initialized or not. It is not deleted automatically if a SupersetCluster is deleted, and so can be reused.

DruidConnection resources link a Superset and Druid instance. It lets you define this connection in the familiar way of deploying a resource (instead of configuring the connection via the Superset UI or API). The operator configures the connection between Druid and the Superset instance.

=== Kubernetes resources

Based on the custom resources you define, the Operator creates ConfigMaps, StatefulSets and Services.

image::superset_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the operator]

The diagram above depicts all the Kubernetes resources created by the operator, and how they relate to each other. The Jobs created for the SupersetDB and DruidConnnection resources are not shown.

For every xref:concepts:roles-and-role-groups.adoc#_role_groups[role group] you define, the Operator creates a StatefulSet with the amount of replicas defined in the RoleGroup. Every Pod in the StatefulSet has two containers: the main container running Superset and a sidecar container gathering metrics for xref:operators:monitoring.adoc[]. The Operator creates a Service for the `node` role as well as a single service per role group.

ConfigMaps are created, one per RoleGroup and also one for the SupersetDB. Both ConfigMaps contains two files: `log_config.py` and `superset_config.py` which contain logging and general Superset configuration respectively.

== Dependencies

Superset requires an SQL database in which to store its metadata, dashboards and users. The Stackable platform does not have its own Operator for an SQL database but the xref:getting_started/index.adoc[] guides you through installing an example database with a Superset instance that you can use to get started.

== Connecting to data sources

Superset does not store its own data, instead it connects to other products where data is stored. On the Stackable Platform the two commonly used choices are xref:druid:index.adoc[Apache Druid] and xref:trino:index.adoc[Trino]. For Druid there is a way to xref:usage-guide/connecting-druid.adoc[connect a Druid instance declaratively] with a custom resource. For Trino this is on the roadmap. Have a look at the demos linked <<demos, below>> for examples of using Superset with Druid or Trino.

== [[demos]]Demos

Many of the Stackable xref:stackablectl::demos/index.adoc[demos] use Superset in the stack for data visualization and explaration. The demos come in two main variants.

=== With Druid

The xref:stackablectl::demos/nifi-kafka-druid-earthquake-data.adoc[] and xref:stackablectl::demos/nifi-kafka-druid-water-level-data.adoc[] demos show Superset connected to xref:druid:index.adoc[Druid], exploring earthquake and water level data respectively.

=== With Trino

The xref:stackablectl::demos/spark-k8s-anomaly-detection-taxi-data.adoc[], xref:stackablectl::demos/trino-taxi-data.adoc[], xref:stackablectl::demos/trino-iceberg.adoc[] and xref:stackablectl::demos/data-lakehouse-iceberg-trino-spark.adoc[] demos all use a xref:trino:index.adoc[Trino] instance on top of S3 storage that hold data to analyze. Superset is connected to Trino to analyze a variety of different datasets.

== Supported Versions

The Stackable Operator for Apache Superset currently supports the following versions of Superset:

include::partial$supported-versions.adoc[]

== Getting the Docker image
[source]
----
docker pull docker.stackable.tech/stackable/superset:<version>-stackable<stackable-version>
----
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
= Configuration & Environment Overrides

The cluster definition also supports overriding configuration properties and environment variables,
either per role or per role group, where the more specific override (role group) has precedence over
the less specific one (role).

IMPORTANT: Overriding certain properties which are set by the operator (such as the `STATS_LOGGER`)
can interfere with the operator and can lead to problems.

== Configuration Properties

For a role or role group, at the same level of `config`, you can specify `configOverrides` for the
`superset_config.py`. For example, if you want to set the CSV export encoding and the preferred
databases adapt the `nodes` section of the cluster resource as follows:

[source,yaml]
----
nodes:
roleGroups:
default:
config: {}
configOverrides:
superset_config.py:
CSV_EXPORT: "{'encoding': 'utf-8'}"
PREFERRED_DATABASES: |-
[
'PostgreSQL',
'Presto',
'MySQL',
'SQLite',
# etc.
]
----

Just as for the `config`, it is possible to specify this at the role level as well:

[source,yaml]
----
nodes:
configOverrides:
superset_config.py:
CSV_EXPORT: "{'encoding': 'utf-8'}"
PREFERRED_DATABASES: |-
[
'PostgreSQL',
'Presto',
'MySQL',
'SQLite',
# etc.
]
roleGroups:
default:
config: {}
----

All override property values must be strings. They are treated as Python expressions. So care must
be taken to produce a valid configuration.

For a full list of configuration options we refer to the
https://github.com/apache/superset/blob/master/superset/config.py[main config file for Superset].

== Environment Variables

In a similar fashion, environment variables can be (over)written. For example per role group:

[source,yaml]
----
nodes:
roleGroups:
default:
config: {}
envOverrides:
FLASK_ENV: development
----

or per role:

[source,yaml]
----
nodes:
envOverrides:
FLASK_ENV: development
roleGroups:
default:
config: {}
----

// cliOverrides don't make sense for this operator, so the feature is omitted for now
33 changes: 33 additions & 0 deletions docs/modules/superset/pages/usage-guide/connecting-druid.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
= Connecting Apache Druid clusters

The operator can automatically connect Superset to Apache Druid clusters managed by the https://docs.stackable.tech/druid/index.html[Stackable Druid Cluster].

To do so, create a _DruidConnection_ resource:

[source,yaml]
----
apiVersion: superset.stackable.tech/v1alpha1
kind: DruidConnection
metadata:
name: superset-druid-connection
spec:
superset:
name: superset
namespace: default
druid:
name: my-druid-cluster
namespace: default

----

The `name` and `namespace` in `spec.superset` refer to the Superset cluster that you want to connect. Following our example above, the name is `superset`.

In `spec.druid` you specify the `name` and `namespace` of your Druid cluster.

The `namespace` part is optional; if it is omitted it will default to the namespace of the DruidConnection.

The namespace for both the Superset and Druid clusters can be omitted, in which case the Operator will assume that they are in the same namespace as the DruidConnection.

Once the database is initialized, the connection will be added to the cluster by the operator. You can see it in the user interface under Data > Databases:

image::superset-databases.png[Superset databases showing the connected Druid cluster]
8 changes: 8 additions & 0 deletions docs/modules/superset/pages/usage-guide/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
= Usage guide
:page-aliases: usage.doc

The usage guide covers various aspects of configuring Superset and interconnection with other tools.

Learn about defining the amount of xref:usage-guide/storage-resource-configuration.adoc[resources] Superset uses and how to configure xref:usage-guide/pod-placement.adoc[]. Learn how to xref:usage-guide/connecting-druid.adoc[connect to Apache Druid] operated by the xref:druid:index.adoc[].

Configure xref:usage-guide/security.adoc#authentication[authentication] with LDAP and xref:usage-guide/security.adoc#authorization[authorization]. Observe your Superset instance with xref:usage-guide/logging.adoc[log aggregation] and xref:usage-guide/monitoring.adoc[monitoring].
20 changes: 20 additions & 0 deletions docs/modules/superset/pages/usage-guide/logging.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
= Log aggregation

The logs can be forwarded to a Vector log aggregator by providing a discovery
ConfigMap for the aggregator and by enabling the log agent:

[source,yaml]
----
spec:
vectorAggregatorConfigMapName: vector-aggregator-discovery
nodes:
config:
logging:
enableVectorAgent: true
databaseInitialization:
logging:
enableVectorAgent: true
----

Further information on how to configure logging, can be found in
xref:home:concepts:logging.adoc[].
4 changes: 4 additions & 0 deletions docs/modules/superset/pages/usage-guide/monitoring.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
= Monitoring

The managed Superset instances are automatically configured to export Prometheus metrics. See
xref:home:operators:monitoring.adoc[] for more details.
7 changes: 7 additions & 0 deletions docs/modules/superset/pages/usage-guide/pod-placement.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
= Pod Placement

You can configure the Pod placement of the Superset pods as described in xref:concepts:pod_placement.adoc[].

The default affinities created by the operator are:

1. Distribute all the Superset Pods (weight 70)
61 changes: 61 additions & 0 deletions docs/modules/superset/pages/usage-guide/security.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
= Security

== [[authentication]]Authentication
Every user has to be authenticated before using Superset: there are several ways in which this can be set up.

=== Webinterface
The default setting is to manually set up users via the Webinterface.

=== LDAP

Superset supports xref:nightly@home:concepts:authentication.adoc[authentication] of users against an LDAP server. This requires setting up an xref:nightly@home:concepts:authentication.adoc#authenticationclass[AuthenticationClass] for the LDAP server.
The AuthenticationClass is then referenced in the SupersetCluster resource as follows:

[source,yaml]
----
apiVersion: superset.stackable.tech/v1alpha1
kind: SupersetCluster
metadata:
name: superset-with-ldap-server
spec:
image:
productVersion: 1.5.1
stackableVersion: 23.4.0-rc2
[...]
authenticationConfig:
authenticationClass: ldap # <1>
userRegistrationRole: Admin # <2>
----

<1> The reference to an AuthenticationClass called `ldap`
<2> The default role to which all users are assigned

Users that log in with LDAP are assigned to a default https://superset.apache.org/docs/security/#roles[Role] which is specified with the `userRegistrationRole` property.

You can follow the xref:nightly@home:tutorials:authentication_with_openldap.adoc[] tutorial to learn how to set up an AuthenticationClass for an LDAP server, as well as consulting the xref:nightly@home:reference:authenticationclass.adoc[] reference.

== [[authorization]]Authorization
Superset has a concept called `Roles` which allows you to grant user permissions based on roles.
Have a look at the https://superset.apache.org/docs/security[Superset documentation on Security].

=== Webinterface
You can view all the available roles in the Webinterface of Superset and can also assign users to these roles.

=== LDAP
Superset supports assigning https://superset.apache.org/docs/security/#roles[Roles] to users based on their LDAP group membership, though this is not yet supported by the Stackable operator.
All the users logging in via LDAP get assigned to the same role which you can configure via the attribute `authenticationConfig.userRegistrationRole` on the `SupersetCluster` object:

[source,yaml]
----
apiVersion: superset.stackable.tech/v1alpha1
kind: SupersetCluster
metadata:
name: superset-with-ldap-server
spec:
[...]
authenticationConfig:
authenticationClass: ldap
userRegistrationRole: Admin # <1>
----

<1> All users are assigned to the `Admin` role
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
= Storage and resource configuration

== Resource Requests

include::concepts:stackable_resource_requests.adoc[]

If no resource requests are configured explicitly, the Superset operator uses the following defaults:

[source,yaml]
----
nodes:
roleGroups:
default:
config:
resources:
cpu:
min: '200m'
max: "4"
memory:
limit: '2Gi'
----

WARNING: The default values are _most likely_ not sufficient to run a production cluster. Please adapt according to your requirements.
Loading