Pentaho Data Integration Advanced (PDI2000L)

Pentaho Data Integration Advanced (PDI2000L)


(2 days / 2 credits) Scheduled, instructor-led training with virtual lab environment for hands-on practice.


This instructor-led course is designed to build upon fundamental knowledge of Pentaho Data Integration (PDI). Moving beyond the basics of creating transformations and jobs, you will learn how to use PDI in real-world project scenarios. You will add PDI as a data source for a variety of visualization options, utilize PDI’s streaming data processing capabilities, build transformations with metadata injection, and scale and performance tune the PDI solution.

Students will benefit from engaging and learning from an experienced instructor coupled with hands-on practice using a full implementation of Pentaho in a virtual Lab Environment. The course can be run as a physical class at one of our training sites or as an online session.


This course will help students to:

  • Reduce manual tasks by harnessing the power of metadata injection.
  • Use PDI as a data source for CDA, Data Services, SnowFlake, Google BigQuery and Machine Learning applications.
  • Utilize PDI’s streaming data processing capabilities with MQTT, Kafka and Amazon Kinesis data streams.
  • Scale PDI by using Carte clustering, monitoring, and partitioning.
  • Tune PDI with checkpoints and logging.



Completion of the course “Pentaho Data Integration Fundamentals” (PDI1000S self-paced, or PDI1000L instructor-led) plus some experience using PDI.


This course includes:


  • Metadata Injection
    • Static Metadata Injection
    • Standard Metadata Injection
    • Metadata Injection (Push-Pull Modes)
    • 2-Phase Metadata Injection
    • Using Filters in Metadata Injection
  • PDI as an Enterprise Data Hub
    • CDA Datasource
    • Data services
    • Connect to a SnowFlake database
    • Pentaho Data Integration and Google BigQuery
    • Credit card fraud use case
  • Data Streaming
    • MQTT – Mosquitto Service
    • MQTT – sensor data (IoT)
    • Services – Zookeeper and Kafka
    • Kafka – sensor data
    • Amazon Kinesis data streams
  • Scaling your enterprise solution
    • Master and slave servers
    • Clustering and group by
    • Stream partitioning
    • Checkpoints


2 Days

Upcoming Classes


Instructor-led online training

Location Oct 2021 Nov 2021 Dec 2021 Jan 2022 Feb 2022 Mar 2022 Apr 2022
Virtual - EMEA Nov 4 – Nov 5
Jan 27 – Jan 28
Virtual - Germany (German language) Nov 10 – Nov 11
Virtual - Americas Jan 20 – Jan 21

Classes in bold are guaranteed to run!

Onsite Training

For groups of three or more

Request Quote

Public Training

Virtual - EMEA

Virtual - Germany (German language)

Virtual - Americas

Classes marked with Confirmed are guaranteed to run. Sign up now while there is still space available!

Don't see a date that works for you?

Request Class