7 videos 📅 2024-11-21 09:00:00 Africa/Ceuta
1:38:36
2024-11-21 09:42:06
18:30
2024-11-21 11:49:40
2:57:14
2024-11-21 13:02:19
2:10:52
2024-11-22 09:10:59
1:51:09
2024-11-22 09:32:15
56:00
2024-11-22 13:05:24
1:24:05
2024-11-22 14:24:08

Course recordings on DaDesktop for Training platform

Visit NobleProg websites for related course

Summary

Overview

This course session provides a comprehensive, hands-on tutorial on building and securing interactive dashboards using Apache Superset, with a focus on data analysis for fraud detection and HR metrics. The trainer guides participants through SQL-based data manipulation (including type casting for division operations), dashboard creation with time-series and geographic visualizations, advanced analytics (rolling windows, time shifts), filter management (native and cross-filters), annotation layers, role-based access control, and row-level security. The session concludes with practical exercises on creating restricted user roles and sharing dashboards, emphasizing real-world use cases for internal reporting and public-facing analytics.

Topic (Timeline)

1. SQL Type Handling and Data Casting [00:00:03 - 00:05:35]

The session begins with troubleshooting SQL division errors in integer fields, where dividing two integers returns an integer result (e.g., 1/2 = 0). The trainer demonstrates solutions: casting to FLOAT (CAST(column AS FLOAT)), multiplying by 1.0, or adding +0.0 to force floating-point arithmetic. This ensures accurate percentage calculations in metrics. The same TP1 dataset is confirmed as the basis for all exercises. Participants validate results in SQL and confirm correct formatting of outputs (e.g., displaying “19.0%” instead of “19%”).

2. Dashboard Construction: Key Metrics and Formatting [00:05:41 - 00:13:04]

Participants build a dashboard with three key performance indicators (KPIs): total transactions, total fraud amount, and fraud rate. The trainer shows how to use aggregated metrics (SUM of amount) with filters (isFraud = True) and explains the difference between saved metrics, simple column aggregations, and ad-hoc Custom SQL. Formatting options are explored: currency ($), percentage (.2S), and custom prefixes/suffixes. Conditional formatting rules (e.g., highlighting values >100B in dark red) are introduced, along with date format customization and the use of cache to refresh visualizations.

3. Temporal Analysis: Time-Series and Rolling Windows [00:13:17 - 00:26:35]

The trainer demonstrates time-series analysis using line charts with timestamp as the dimension. Participants learn to create rolling window averages (e.g., 3-hour moving average of fraud amount) using Advanced Analytics. The difference between Rolling Windows (local smoothing) and Time Shift (overlaying past/future data) is clarified. The trainer explains that a 3-period rolling window with minimum 3 periods ensures no value is shown if fewer than 3 data points exist. The example uses hourly granularity, and the rolling mean at 13:00 includes data from 11:00, 12:00, and 13:00.

4. Geographic Visualization: Heatmaps and Map Types [00:28:38 - 00:31:29]

A heatmap is created to visualize fraud rate by country and transaction type. The trainer distinguishes between standard heatmap, legacy (deprecated D3/NVD3), and modern components, advising against legacy types. The importance of using normalized values (0–1 scale) to improve visual contrast is emphasized. Participants learn to select the correct map type (e.g., “Heatmap” not “s-map legacy”) and apply filters to refine the view.

5. Dashboard Layout, Caching, and Native Filters [00:33:18 - 00:42:58]

The dashboard is organized using layout mode. Caching behavior is discussed: slow reloads are due to unconfigured cache settings. Native filters are added (e.g., time range, country) and their scope explained: they apply to all dashboard components using the same dataset. Participants encounter issues with filters auto-applying incorrect dimensions (e.g., “transaction_hour” instead of “timestamp”) and learn to remove them via the filter panel, not the chart. The importance of using “Save and Go to Dashboard” is stressed to avoid losing changes.

6. Cross-Filters, Scoping, and Dashboard Interactions [00:42:58 - 00:44:23]

Cross-filters are introduced: clicking on a data point in one chart filters all other charts on the same dashboard that share a common dimension (e.g., department, status). The trainer shows how to enable/disable cross-filtering and use “scoping” to restrict which charts respond to a filter. This allows selective filtering (e.g., only compensation charts respond to department clicks, not time-series charts).

7. User Roles, Permissions, and Row-Level Security [00:44:23 - 00:54:05]

The trainer details Superset’s three-tier security model: database-level permissions, role-based access (Admin, Alpha, Gamma, Public), and row-level security (RLS). RLS is demonstrated as a WHERE clause applied to datasets (e.g., department != 'Direction') to hide sensitive rows from specific roles. The trainer explains why column-level security is not supported (it breaks chart dependencies). Participants create a new “RH Standard” role and apply RLS rules to prevent HR users from seeing salary data from the Direction department.

8. Annotation Layers and Data-Driven Context [00:54:05 - 00:57:25]

Annotation layers are created to mark events (e.g., “Covid period”) on time-series charts. Annotations can be based on a table (e.g., Google Sheets) or defined directly in Superset. The trainer shows how to link an annotation layer to a chart and use it to add context (e.g., policy changes, holidays) without modifying the underlying data. The importance of saving annotations before use is emphasized.

9. Geospatial Data: GeoJSON vs. Deck.gl [00:57:27 - 01:31:56]

The trainer compares three geospatial methods: country-level choropleths (using ISO codes), GeoJSON polygon maps (for regions/communes), and Deck.gl (GPU-accelerated 3D/point maps). GeoJSON is powerful but complex to load (large files, need for ID matching). Deck.gl is simpler, using latitude/longitude and Mapbox tiles, but requires a Mapbox API key (free tier with credit card). The group discusses trade-offs: GeoJSON for static administrative boundaries, Deck.gl for dynamic, high-resolution point data (e.g., fraud incidents by location).

10. Dashboard Sharing, Export, and Public Access [01:31:56 - 01:40:16]

The session covers sharing options: permalinks (with filter state), PDF export, scheduled email reports, and Slack alerts. The trainer distinguishes between public dashboards (accessible via URL without login) and private ones (requiring authentication). A key discussion arises: how to share dashboards with non-Superset users (e.g., executives) without requiring login. The group explores embedding dashboards in internal web pages with SSO or using public links with restricted access via network/VPN.

11. HR Dashboard Case Study: Permissions and Row-Level Security [01:42:17 - 02:33:33]

Participants build an HR dashboard showing compensation by department and status using box plots. The trainer demonstrates how to use double quotes for column names with special characters (e.g., "Département") and single quotes for string values (e.g., 'Direction'). A row-level security rule is applied: Département != 'Direction' to hide salary data from HR users. Participants test the rule by logging in as a restricted user and confirm the Direction department is filtered out.

12. Tabs, Filter Dependencies, and Advanced UI [02:33:33 - 02:44:36]

The trainer introduces dashboard tabs to organize related charts (e.g., one tab for compensation, another for absence rates). Tabs allow shared filters across groups of charts. Filter dependencies are configured: selecting a department auto-filters available statuses (e.g., no “Stagiaire” option appears if “Direction” is selected). Cross-filtering is shown to work across different tables if they share a common column (e.g., “Département” in both HR and fraud tables).

13. Export, Email Reports, and Integration [02:44:36 - 02:56:20]

The session concludes with export options: PDF, CSV, and scheduled email reports. Email recipients can include non-Superset users via CC. Slack integration is mentioned for alerting. The trainer reiterates that public dashboards (without login) are possible and gives examples from public Superset instances. The group discusses embedding dashboards in internal web portals as a solution for non-technical users who need access without logging into Superset.

Appendix

Key Principles

  • Type Casting in SQL: Always cast integers to float (CAST(x AS FLOAT), x * 1.0, or x + 0.0) when computing percentages to avoid integer division truncation.
  • Dashboard Design: Use clear, non-overlapping visualizations. Avoid mixing incompatible metrics (e.g., count and amount) on the same chart.
  • Row-Level Security (RLS): Use RLS (WHERE clauses) to hide sensitive data at the dataset level. This is the primary method in Superset to restrict data access, as column-level security is not supported.
  • Filter Scope: Native filters apply to all charts on the same dataset; cross-filters apply to charts sharing a common dimension, even across different tables.

Tools Used

  • Apache Superset: Dashboarding and analytics platform.
  • SQL: For data transformation and metric creation.
  • GeoJSON: For custom geographic boundaries (e.g., French communes).
  • Deck.gl: For high-performance 3D/point-based geospatial visualizations.
  • Mapbox: Required for Deck.gl to render base maps (requires API key).
  • Google Sheets: Recommended for managing annotation event tables.

Common Pitfalls

  • Forgetting to save changes before navigating away from a chart or annotation.
  • Using legacy map types (e.g., s-map legacy) that are deprecated.
  • Not using double quotes around column names with spaces or accents (e.g., "Département").
  • Misunderstanding cross-filter scope — filters apply automatically unless explicitly excluded.
  • Assuming public dashboards are secure — they are accessible to anyone with the link; use network/SSO restrictions for sensitive data.

Practice Suggestions

  • Recreate the fraud dashboard using a different dataset (e.g., sales data).
  • Build a RLS rule to hide data for a specific department or user group.
  • Create a time-series chart with a 7-day rolling average and overlay a time shift of -7 days.
  • Use GeoJSON to map fraud incidents by French department and compare with a Deck.gl point map.
  • Set up a scheduled email report to send the HR dashboard to management every Monday.