12 videos 📅 2024-09-30 09:00:00 America/Bahia_Banderas
48:57
2024-09-30 11:03:24
30:21
2024-09-30 12:02:29
3:29
2024-09-30 12:34:41
1:04
2024-09-30 12:43:50
1:22
2024-09-30 13:06:34
5:52
2024-09-30 13:08:57
5:02
2024-09-30 13:30:59
12:33
2024-09-30 13:46:10
49:55
2024-09-30 15:00:01
6:21
2024-10-08 11:02:02
8:16
2024-10-08 12:05:49
3:03
2024-10-08 12:26:15

Course recordings on DaDesktop for Training platform

Visit NobleProg websites for related course

Visit outline: Talend Open Studio for ESB (Course code: talendesb)

Categories: SOA ESB Integration · Talend

Summary

Overview

This course session provides a hands-on tutorial on using Talend (or a similar ETL tool) to build data integration jobs, focusing on configuring components for reading and writing data in various formats (CSV, JSON, Excel), managing schemas, troubleshooting execution errors, and establishing database connections. The instructor demonstrates workflows for data transformation, schema editing, JSONPAT query construction, file output configuration, and resolving Java-related path separator issues. Emphasis is placed on practical component interaction, job structure (subjobs), and best practices for naming, saving, and debugging.

Topic (Timeline)

1. Component Configuration and Schema Editing [00:00:01.760 - 00:03:49.950]

  • Demonstrated how to configure components via double-click, similar to Mesa Boss interface.
  • Introduced the “Edit Schema” function, explaining its role in modifying column names, data types, and structure—changes affect the job, not the metadata source.
  • Clarified the distinction between “View Schema” and “Change Schema” options; users must select “Change Schema” to modify structure.
  • Showed how to execute a job using the play button, and identified a common tool bug where the system runs the previous job instead of the current one.
  • Resolved the job execution issue by closing and reopening the job to refresh the job name in the run configuration.
  • Explained that job execution internally runs generated Java code from connected components.

2. Output Component Configuration and Data Visualization [00:03:50.010 - 00:05:34.850]

  • Configured the tFileOutputDelimited component to change the field separator from default to pipe (|) and later to a custom string (e.g., “jock”).
  • Switched display mode from “Basic” to “Table” to improve readability of output in the console.
  • Highlighted that connected components (e.g., input → output) form a single subjob, indicated by a blue background in the canvas.
  • Emphasized that unconnected components create separate subjobs, affecting execution flow and isolation.

3. JSON Data Input with JSONPath [00:06:14.190 - 00:12:46.030]

  • Added a tFileInputJSON component to read a JSON file named artista.json.
  • Explained JSONPath syntax: $ denotes root, data[*] indicates an array, and data[*].artistId and data[*].name extract nested fields.
  • Demonstrated incorrect JSONPath usage that returned the entire array as a single line, then corrected it by dragging the data array node into the JSONPath field and selecting individual fields (artistId, name).
  • Used “Refresh Preview” to validate data extraction.
  • Noted that complex JSON structures (nested arrays, objects) require deeper JSONPath knowledge and may become challenging.
  • Confirmed successful schema definition and completed setup with “Finish.”

4. Data Flow: JSON to Excel Conversion [00:12:46.030 - 00:19:09.440]

  • Connected the JSON input to a tFileOutputExcel component to convert data to Excel format.
  • Configured output component: enabled “Include Header,” set output path to dataset/out.xls.
  • Explained that the tool’s version limited path customization, forcing use of default installation path.
  • Clarified that each connected group of components forms a subjob, visually indicated by blue background; subjobs execute sequentially in this version.
  • Executed the job and observed performance metrics: 25 rows processed in 0.03s (JSON read), 235 rows written in 1.16s (Excel output).
  • Confirmed file output location: default Talend workspace directory (e.g., Program Files (x86)/Talend/Studio/workspace).

5. Error Debugging and Java Path Separator Issues [00:20:24.640 - 00:23:16.120]

  • Encountered a Java runtime error due to path separator mismatch: Java uses forward slash (/) while Windows uses backslash (\).
  • Demonstrated that using a single backslash in file paths causes parsing errors in Java-based tools.
  • Solved by replacing single backslashes with double backslashes (\\) in the file path configuration.
  • Guided users to use the “Designer” panel to locate the exact component and error (indicated by a red dot).
  • Emphasized that this is a Java environment issue, not a Talend-specific bug.

6. Job Saving, Auto-Save, and Version Control [00:23:16.120 - 00:25:02.820]

  • Advised saving the job using the disk icon or via File > Save to persist changes.
  • Highlighted the asterisk (*) indicator next to the job name, signaling unsaved modifications.
  • Confirmed that after saving, the asterisk disappears and the red error indicator is resolved.
  • Re-executed the job successfully after correction, verifying both subjobs completed.

7. Database Connection Setup [00:26:09.800 - 00:30:20.840]

  • Created a new job named “job conexión base datos.”
  • Introduced tDBConnection component for connecting to relational databases.
  • Demonstrated creating a new database connection via right-click on tDBConnection in metadata.
  • Named the connection following a standard: Chinook (MySQL) to avoid confusion across multiple database types.
  • Explained the importance of selecting the correct JDBC driver (e.g., MySQL Connector/J) for the target database engine.
  • Noted that if the required driver is missing, the issue lies in the Java environment, not Talend—users must manually install the correct JDBC driver.

Appendix

Key Principles

  • Subjob Structure: Connected components form a single subjob (blue background); unconnected components are separate subjobs.
  • Schema Editing: Modifying schema affects only the job, not the source metadata.
  • JSONPath: Use $ for root, [*] for array elements, and field for nested objects. Test with “Refresh Preview.”
  • File Paths: In Java-based tools, use \\ (double backslash) or / (forward slash) to avoid path separator errors on Windows.
  • Naming Conventions: Use descriptive names for connections (e.g., DBName (Engine)) to prevent confusion across environments.

Tools Used

  • tFileInputDelimited – for reading CSV/text files
  • tFileOutputDelimited – for writing delimited files
  • tFileInputJSON – for reading JSON files with JSONPath
  • tFileOutputExcel – for writing Excel files
  • tDBConnection – for establishing database connections
  • JSONPath syntax for querying nested JSON structures

Common Pitfalls

  • Running the wrong job due to tool caching (resolved by closing/reopening job).
  • Using single backslashes (\) in file paths on Windows → causes Java errors.
  • Misusing “View Schema” instead of “Change Schema” when editing structure.
  • Incorrect JSONPath syntax leading to malformed or incomplete data extraction.
  • Not saving jobs before execution → loss of changes.

Practice Suggestions

  • Practice building jobs that convert between formats: CSV → JSON → Excel.
  • Experiment with complex JSON structures (arrays within arrays, nested objects).
  • Manually install JDBC drivers for unsupported databases to understand dependency management.
  • Always use “Refresh Preview” after modifying JSONPath or schema.
  • Use consistent naming and save frequently to avoid versioning issues.