Pentaho Data Integration Advanced (PDI2000L)

Pentaho Data Integration Advanced (PDI2000L)


(2 days / 2 credits) Scheduled, instructor-led training with virtual lab environment for hands-on practice.


This instructor-led course is designed to build upon fundamental knowledge of Pentaho Data Integration (PDI). Moving beyond the basics of creating transformations and jobs, you will learn how to use PDI in real-world project scenarios. You will add PDI as a data source for a variety of visualization options, utilize PDI’s streaming data processing capabilities, build transformations with metadata injection, and scale and performance tune the PDI solution.

Students will benefit from engaging and learning from an experienced instructor coupled with hands-on practice using a full implementation of Pentaho in a virtual Lab Environment. The course can be run as a physical class at one of our training sites or as an online session.


This course will help students to:

  • Reduce manual tasks by harnessing the power of metadata injection.
  • Use PDI as a data source for CDA, Data Services, SnowFlake, Google BigQuery and Machine Learning applications.
  • Utilize PDI’s streaming data processing capabilities with MQTT, Kafka and Amazon Kinesis data streams.
  • Scale PDI by using Carte clustering, monitoring, and partitioning.
  • Tune PDI with checkpoints and logging.



Completion of the course “Pentaho Data Integration Fundamentals” (PDI1000S self-paced, or PDI1000L instructor-led) plus some experience using PDI.


This course includes:


  • Metadata Injection
    • Static Metadata Injection
    • Standard Metadata Injection
    • Metadata Injection (Push-Pull Modes)
    • 2-Phase Metadata Injection
    • Using Filters in Metadata Injection
  • PDI as an Enterprise Data Hub
    • CDA Datasource
    • Data services
    • Connect to a SnowFlake database
    • Pentaho Data Integration and Google BigQuery
    • Credit card fraud use case
  • Data Streaming
    • MQTT – Mosquitto Service
    • MQTT – sensor data (IoT)
    • Services – Zookeeper and Kafka
    • Kafka – sensor data
    • Amazon Kinesis data streams
  • Scaling your enterprise solution
    • Master and slave servers
    • Clustering and group by
    • Stream partitioning
    • Checkpoints


2 Days

Upcoming Classes


Instructor-led online training

Location Jun 2021 Jul 2021 Aug 2021 Sep 2021 Oct 2021 Nov 2021 Dec 2021
Virtual - EMEA Jul 22 – Jul 23
Sep 16 – Sep 17
Virtual - Americas Jul 22 – Jul 23
Aug 19 – Aug 20
Sep 16 – Sep 17
Fulda (German language) Sep 30 – Oct 1

Classes in bold are guaranteed to run!

Onsite Training

For groups of three or more

Request Quote

Public Training

Virtual - EMEA

Virtual - Americas

Fulda (German language)

Classes marked with Confirmed are guaranteed to run. Sign up now while there is still space available!

Don't see a date that works for you?

Request Class