Getting started working with Apache Superset - the open source data exploration and visualization platform image 8

Getting started working with Apache Superset – the open source data exploration and visualization platform

SupersetApache Superset is an open source platform for data exploration and visualization. It can be described as an open, free alternative for Microsoft PowerBI, Tableau, Qlik and Oracle Analytics Desktop. Superset connects (through the SQL Alchemy framework) to dozens of SQL compliant databases and can work with CSV and JSON data sets. This article very briefly introduces Superset and then invites you to immediately start working with it in a Gitpod workspace (cloud based ephemeral quickstart and free workspace – you click on the link and the workspace opens up in the cloud, ready to start working with a freshly installed Superset instance).

Superset provides a SQL IDE for preparing data for visualization (define calculated attributes and ser formats and other characteristics for columns), including a rich metadata browser. Note: data is queried from its source by Superset and the results are held in Superset memory for analysis and presentation. A live connection to the source data set is required because that is where the source data is queried. Superset has a lightweight semantic layer which empowers data analysts to quickly define custom dimensions and metrics.  Superset has its own datastore for definitions of datasets, charts, dashboards and additional metadata. This datastore is shared across all users who have access to a specific Superset instance

Getting started working with Apache Superset - the open source data exploration and visualization platform gallery

Superset makes it quite easy to assemble quite rich visualizations – offering many dozens of chart types. Charts can be annotated – for example to point out specific events that help clarify the data or that describes conclusions drawn from the data. Charts can be collected in Dashboards. Dashboards can be published – applying role based access on who is allowed to see which dashboard.

Gitpod Workspace for Apache Superset

Gitpod is an open source project and a cloud service that provides ephemeral development environments. You can host Gitpod yourself or use the cloud service that offers 50 hours of free workspace usage. I have written about Gitpod in this article. The Gitpod workspace I have prepared for Superset is available at this URL: https://gitpod.io/#https://github.com/lucasjellema/gitpod-apache-superset . Simply click on the URL and a workspace will open with Superset installed and running. Open port 8088 to enter the Superset web UI. Login with user admin and password admin.

image

You will enter the main page where you can start adding database connections, datasets, charts:

image

A Postgres database, named examples, is included and pre-configured in Superset for you. image

Quite a few data sets – derived from this examples database – are predefined in the workspace environment:

image

To quickly create a Visualization for one of these datasets, start the Explore workflow from the Datasets tab, start by clicking the name of the dataset that will be powering your chart: cleaned_sales_data:

Getting started working with Apache Superset - the open source data exploration and visualization platform.

This is the no code, drag & drop visualization editor that you will see: image

By clicking, dragging and dropping the data set fields the following stacked bar chart visualization is quickly composed:

image

Click on the link View all charts to select the desired chart type:

image

Then select Stacked category and Time Series Bar Chart

image

Drag sales to the metrics field. Select SUM as the Aggregate to apply

image

Set Time Grain to Quarter:

image

And Click on Customize then check the box for Stack Series

image

The chart will roughly look like this:image

The chart can be added to one or more dashboards. The result can also be exported – the data summary as well as the image.

image

Note: you can easily add datasets by uploading CSV or Excel files:

image

The Gitpod workspace ships with many sample datasets, charts and dashboards to give you a taste of what can be one with Superset. One such dashboard is shown in the next figure:

image

Gitpod Workspace composition

The Gitpod workspace uses docker-compose and contains the installation described in the Superset documentation. In addition to Superset, Redis and PostgreSQL are installed, the latter to provide the metadata store ( that can double as database for data sets to analyze and visualize).

image

Resources

Apache Superset Homepage https://superset.apache.org/

Apache Superset GitHub – https://github.com/apache/superset

The Superset documentation is a little sparse. However, Preset provides rich documentation about its managed Superset cloud service that is by and large applicable to any Superset environment: https://docs.preset.io/v1/en  (Preset is a cloud-hosted data exploration and visualization platform built on top of the popular open source project, Apache Superset.)

Article: 9 new chart types in Superset: https://preset.io/blog/2021-6-14-superset-nine-new-charts/

Tutorial for starting to work with Superset: https://censius.ai/blogs/apache-superset-tutorial#blogpost-toc-6

Introducing Gitpod https://lucasjellema.medium.com/first-steps-with-gitpod-great-for-try-out-quick-open-source-contributions-and-for-workshops-9590c322c18e

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.