Course recordings on DaDesktop for Training platform
Visit NobleProg websites for related course
Visit outline: Talend Data Stewardship (Course code: talenddatastewardship)
Categories: Talend
Summary
Overview
This course session provides a hands-on demonstration of data cataloging and metadata management using a data governance tool (likely Talend Data Catalog or similar). The instructor guides participants through configuring data imports, creating semantic data mappings between transactional and data warehouse models, applying business terminology and classifications, and adding documentation to metadata. The session emphasizes establishing lineage, semantic consistency, and business context for data assets, with practical exercises on column mapping, business domain tagging, and business documentation (business metadata) for technical assets.
Topic (Timeline)
1. Data Import and Model Initialization [00:00:30 - 00:02:50]
- Instructor guides user to import data into a target model (ACME Web Data Warehouse) using the import tool.
- Configures data sampling, data profiling, and enables data classification for later use.
- Confirms successful import by verifying the number of tables loaded:
din_country,din_customer,din_date,fa_core, and related dimension/fact tables. - Emphasizes that the imported model structure forms the foundation for downstream mapping and lineage.
2. Creating a Data Mapping Configuration [00:02:55 - 00:04:46]
- Creates a new data mapping object named “Mapeo Transaccional Decisional” with description “Mapeo de la base de datos ACMEware hacia el Data Warehouse”.
- Explains the purpose: to establish semantic lineage between source (transactional) and target (data warehouse) models when ETL jobs are not available.
- Clarifies that by default, only models are auto-added; other elements (tables, columns) must be manually included.
- Opens the mapping configuration and navigates to the “Create” function to begin defining relationships.
3. Bulk vs. Custom Mapping Selection [00:04:48 - 00:05:43]
- Explains two mapping modes: “Bulk” (for identical source/target schemas) and “Custom” (for differing schemas).
- Selects “Custom” mapping since source (ACME Sporting Web) and target (Data Warehouse) table/column names do not match.
- Specifies the target (Data Warehouse) as the output model and selects target tables:
din_country,din_customer,din_date,din_order.
4. Defining Column-Level Mappings [00:05:46 - 00:10:35]
- Initiates query mapping for
din_country:- Source:
ref_countrytable from ACME Sporting Web. - Target:
din_countryin Data Warehouse. - Maps
country(source) →country(target) andcountry_name(source) →country_name(target).
- Source:
- Saves mapping and confirms lineage appears in the parent mapping object.
- Repeats process for
din_customer, linking toorder_euandorder_ustables viacustomer_id. - Maps
session_timefrom bothorder_euandorder_ustodi_session_timein the target. - Completes mapping for
din_orderby linking tored_product,customer, andordertables.
5. Reviewing and Troubleshooting Mappings [00:10:36 - 00:16:45]
- Identifies missing table (
rep_customer) and acknowledges incomplete mapping due to data model limitations in the demo. - Clarifies that the goal is not to build a full ETL pipeline but to establish visual semantic lineage for governance purposes.
- Navigates to enterprise architecture view to confirm the semantic mapping between “ACME Web” and “Data Warehouse” is recognized.
- Notes that the tool supports integration with external modeling tools (e.g., ER diagrams) to import metadata.
6. Metadata Editing: Column Renaming and Business Domains [00:17:09 - 00:25:24]
- Uses search to find columns containing “email” or “mail” across databases.
- Demonstrates renaming metadata: right-clicks on
emailcolumn incustomer→ “Edit Attributes” → “Name” → “Replace” with “correo”. - Applies business domain tagging: selects all
email_addresscolumns → “Business Domain” → assigns to “Personal” domain. - Notes that available business domains are predefined and cannot be modified within the tool (as observed during demo).
7. Business Documentation (Metadata Enrichment) [00:25:28 - 00:32:36]
- Introduces “Business Documentation” as a way to add business context to technical metadata.
- Edits documentation for
email_addressterm: sets name to “Correo Electrónico”, description to “Correo electrónico empresarial de los clientes”. - Applies similar documentation to
número de pólizain a CSV file: description = “Número que identifica de forma única una póliza de seguro”. - Emphasizes that this documentation is separate from technical metadata and is intended for business users to understand meaning and usage.
8. Database Commenting and Re-Importing Models [00:33:13 - 00:36:39]
- Demonstrates adding comments directly in MySQL database (table/column comments) as a best practice for metadata capture.
- Encounters an error during comment execution (likely due to syntax or encoding issue with “ondresco”).
- Initiates a new import into the ACME Web Data Warehouse model to reflect a newly added table (
tientes). - Confirms system generates a new model version upon import, capturing updated schema.
Appendix
Key Principles
- Semantic Lineage: Use data mapping to visually trace data from source to target when ETL jobs are not available or documented.
- Metadata Enrichment: Separate technical metadata (schema, data types) from business metadata (descriptions, domains, terms) for cross-functional understanding.
- Governance by Documentation: Business documentation should be added to all key columns and tables to ensure clarity for non-technical stakeholders.
Tools Used
- Data cataloging/governance platform (likely Talend Data Catalog or similar).
- MySQL database for source metadata and commenting.
- Data import and profiling module within the catalog tool.
Common Pitfalls
- Assumption that source and target schemas match (requires manual mapping in most real cases).
- Inability to customize business domains within the tool (predefined only).
- Syntax errors in database comments (e.g., encoding issues with special characters).
Practice Suggestions
- Practice mapping columns between mismatched schemas using custom mapping mode.
- Add business documentation to at least 3 key columns in your own data assets.
- Use database comments as a source of truth for metadata and re-import to validate capture.
- Search for and tag sensitive columns (e.g., email, phone) with business domains for compliance tracking.