Pentaho Data Integration Fundamentals (PDI1000S)


(16 hrs) Self-paced, interactive online course with virtual lab environment for hands-on practice.



This self-paced course introduces the Pentaho Data Integration (PDI) platform. It covers the basic functions of the platform, explains its capabilities, and describes the best practices to use the platform successfully. Course demonstrations and practice sessions prepare you to employ PDI for real world cases. 

Pentaho Data Integration prepares and blends data to create a complete picture of your business that drives actionable insights. The complete data integration platform delivers accurate, analytics-ready data to end users from any source. With visual tools to eliminate coding and complexity, Pentaho puts big data and all data sources at the fingertips of business and IT users alike.

Students will benefit from flexible, self-paced training which also includes hands-on practice using a full implementation of Pentaho in a virtual Lab Environment.


This course will help students to:

  • Describe the Pentaho Data Integration (PDI) Platform and its components and their common uses.
  • List the pieces that make up transformations and how they execute.
  • Create, preview, run, and troubleshoot a transformation using best practices and modular design principles.
  • Read and write data to and from various file formats.
  • Perform calculations, merges, and lookups.
  • Use PDI’s enterprise repository, scheduling, and monitoring capabilities.
  • Log execution metrics to database tables.



Prior experience of Pentaho is not required however, some experience using ETL (Extract, Transform and Load) for building data pipelines is preferred.


This course includes:

Part 1

  • Introduction to Pentaho Data Integration
    • Overview of PDI
    • Transformation best practices
    • Log files
  • Fundamental Concepts
    • Data types
    • Data conversion and formatting
    • PDI’s home directory
    • Variables and parameters
    • Data movement types
    • Step copies
    • Data inspection
    • Introduction to the repository
  • Working with Files
    • Inputting data from a file
    • Filtering data
    • Working with structured files

Part 2

  • Working with databases
    • Connecting to a database
    • Inputting from a database
    • Outputting to databases
  • Calculations and scripting
    • Performing calculations
    • Useful scripting steps
  • Data enrichment
    • Merging data
    • Lookups
  • Metadata Injection
  • Introduction to Jobs
  • Data Integration Repository
    • The Repository explorer
    • Importing and exporting Repository objects
  • Scheduling
  • Logging execution metrics



16 hours


Please click the button below to enroll for this training.