Course recordings on DaDesktop for Training platform
Visit NobleProg websites for related course
Visit outline: Getting Started with Apache Superset (Course code: superset)
Categories: Apache Superset
Summary
Overview
This course session provides a hands-on tutorial on customizing and extending Apache Superset, focusing on adding country map visualizations, configuring Mapbox API keys, managing feature flags for user roles, creating virtual datasets via SQL joins, and building interactive dashboards. The instructor guides participants through modifying plugin code to support custom country mappings, integrating geographic data, and resolving data aggregation and filtering challenges in Superset’s dashboard environment. The session concludes with a practical exercise on joining datasets and visualizing multi-dimensional data using bubble charts.
Topic (Timeline)
1. Custom Country Map Plugin Configuration [00:00:05.500 - 00:03:14.460]
- Participants are guided to locate the Superset repository root and navigate to
front-end/plugins/legacy/plugin-chart-country-map/src/. - Two key modifications are required in the source file:
- Import declaration linking the plugin name to the file.
- Addition of custom country names (e.g., “France” and “France_Région”) in the country list.
- Country names must follow a specific format: lowercase, underscore-separated (e.g.,
france_region). - Geospatial data (GeoJSON files) must be placed in the
countries/directory with filenames matching the exact country identifier used in the source file. - Emphasis on strict naming consistency between code and data files to ensure proper rendering.
2. Mapbox API Key Setup and Documentation [00:03:17.020 - 00:04:50.840]
- Configuration of the Mapbox API key is required for rendering interactive maps (Climabox and country maps).
- Location of the key setting depends on deployment method; typically found in Superset’s configuration file.
- Documentation for all prior steps (plugin modification, data integration) is noted as comprehensive and publicly available in Superset’s official docs.
3. Feature Flags and User Role Management [00:04:50.360 - 00:06:33.420]
- Instructions for granting public user access with the “Gamma” role are referenced via a separate slide deck.
- The slide deck, titled “Unlocked User,” is stored on a shared drive and includes step-by-step guidance for role assignment and eFrame examples.
- Participants are encouraged to use the shared drive for reference, with a note that examples may be removed during future cleanup.
- Contact information for support (Romain Gauthier: romain.arobazdatatata.io) is provided for follow-up questions.
4. Hands-On TP: Creating and Customizing Charts [00:06:34.000 - 00:25:59.850]
- Participants begin a practical exercise (TP) using a provided drive link.
- Key tasks include:
- Creating a country map chart using custom GeoJSON data.
- Troubleshooting missing legend display: legend is not auto-generated for country maps but can be enabled via the “Customize” panel in the chart configuration.
- Confirming that tooltips are available but legends must be manually toggled on.
- Verifying that zoom/pan functionality is limited for world maps; no native zoom controls exist.
- Clarification: Legend visibility depends on chart type and configuration, not data source.
5. Data Field Mapping and Country Name Resolution [00:25:59.850 - 00:29:10.570]
- Critical step: When using full country names (e.g., “France”, “Nigeria”) instead of ISO codes, the “Country Field Type” in the chart configuration must be set to “Full Name”.
- Failure to set this correctly results in blank or misrendered maps.
- Participants confirm successful creation of country maps and proceed to the next TP step.
6. Virtual Dataset Creation via SQL Join [00:29:12.590 - 00:43:45.730]
- Participants join two datasets:
transactions_fullandworld_dataon thecountryfield. - SQL example provided:
SELECT SUM(transactions_full.amount) AS amount, world_data.birthrate, world_data.life_expectancy, COUNT(transactions_full.id) AS transaction_count FROM transactions_full JOIN world_data ON transactions_full.country = world_data.country GROUP BY world_data.country, world_data.birthrate, world_data.life_expectancy - Important: Column names with spaces or accents must be wrapped in double quotes (e.g.,
"birth rate"). - Saving the query: Use “Save Data Set” to create a virtual dataset; “Save” only saves the SQL query in Query History.
- Virtual datasets appear in blue in the Data Sets list and can be reused for visualization.
7. Bubble Chart Visualization and Aggregation Logic [00:43:49.530 - 00:53:52.610]
- Participants create a bubble chart using the virtual dataset.
- Dimensions: Country (X), Life Expectancy (Y), Birth Rate (Size), Transaction Count (Color).
- Clarification on aggregation:
- Since the dataset is already aggregated by country, metrics like
SUM,MAX, orCOUNTare arbitrary — only one value exists per country. - Using
COUNTfor transaction count yields 1 per country;SUMof amount is meaningful. - Alternative: Avoid aggregation in the virtual dataset and use
AVGin the chart for non-aggregated raw data.
- Since the dataset is already aggregated by country, metrics like
- Emphasis: Bubble charts are useful for multi-variable exploration, but tabular views may be more appropriate for single-value displays.
8. Dashboard Filtering Limitations and Workarounds [00:53:57.350 - 00:55:58.330]
- Challenge: Filters in Superset are tied to specific datasets.
- Issue: A filter on “Country” in
transactions_fulldoes not apply to charts using the virtual dataset (which usesworld_data.country), and vice versa. - Result: Cannot apply a unified country filter across all charts in a dashboard if they originate from different datasets.
- Workaround: Ensure all charts use the same virtual dataset or synchronize field names and data sources.
- Instructor confirms this is a known limitation and advises consistency in data modeling for dashboard interoperability.
Appendix
Key Principles
- Naming Consistency: Country identifiers in code, GeoJSON filenames, and dataset fields must match exactly.
- Field Type Selection: Always set “Country Field Type” to “Full Name” when using country names instead of ISO codes.
- Aggregation Awareness: Superset requires aggregation for charting; use
SUM,COUNT, etc., even on pre-aggregated data — the choice is arbitrary if one value per group exists. - Column Naming: Use double quotes for column names containing spaces or accents in SQL queries.
Tools Used
- Apache Superset (v unspecified)
- GeoJSON files for country boundaries
- Mapbox API for interactive map rendering
- SQL Lab for virtual dataset creation
- Superset’s Chart Configuration UI for customization
Common Pitfalls
- Missing legends on country maps — must be manually enabled in “Customize” panel.
- Incorrect “Country Field Type” causing blank maps.
- Forgetting to wrap non-standard column names in double quotes in SQL.
- Assuming filters apply universally across datasets — filters are dataset-bound.
Practice Suggestions
- Create a custom country map using a new country (e.g., “New_Zealand”) and validate rendering.
- Build a dashboard with charts from both raw and virtual datasets, then attempt to unify filters — observe limitations.
- Experiment with different aggregation functions (
SUM,AVG,COUNT) on the same virtual dataset to understand their impact. - Export static charts and verify legend inclusion — use “Customize” > “Legend” > “Show” to ensure visibility.