Pentaho Data Integration with Hadoop - clickstream example (PDI2002S)


Please click the button below to enroll for this training.



(15 mins) Self-paced eLearning.


In this eLearning module, instructor James O’Reilly walks through the process of uploading data into HDFS using Pentaho Data Integration. With Pentaho Data Integration installed on an Edge node in your Hadoop cluster (Cloudera), the next stage is to onboard the data. Once a shim has been configured, some sample clickstream retail data is onboarded then enriched, before finally being outputted as a Parquet file. Some of the more common PDI / Hadoop transformation steps are highlighted during this process.