In the coming months Pentaho will be releasing Pentaho Kettle 5.0. I thought I would download the community edition from their continuous integration server and see what Kettle 5.0 is all about. Obviously, Pentaho will be the arbiters of what features are stable and will be available, and some of the features that are being worked on below may not be in the GA version when it is released in the coming months.
1. Continuous Preview
In previous versions, you had to run a preview for each step that you wanted to see the output for when building a transformation. With continuous preview you can now run the transformation once and click on each step in your transformation to see the output of that step. No more having to run a separate preview for all of the steps.
2. Carte JDBC
You have long been able to use Pentaho’s Report Designer to build a report from the output of a Kettle transformation. The Carte JDBC driver will allow you to hook Mondrian up to the output of a Kettle transformation. You will be able use Mondrian on your unstructured data without a database; however, memory constraints may limit performance and data sizes.
3. Sub-Jobs in Transformations
Kettle 5.0 will make looping through data much easier. No longer will you have to pass rows to the result, then have the parent job execute a sub-job for every input row. You will be able to loop data through a sub-job directly from the transformation with sub-jobs in transformations. This has also added the capability to launch multiple copies of a sub-job and iterate through data more quickly.
4. Job Checkpoints
A checkpoint table has been added to Pentaho logging that will allow you to specify restart checkpoints in jobs. This will provide the ability to more easily restart a failed job where it failed without having to re-run a bunch of earlier processes too.
5 Execute Multiple Jobs and Transformations in the Same Database Transaction
It will be possible to execute multiple jobs and transformations as part of the same database transaction. So if one transformation fails, the database can roll back all of the transactions for a series of jobs and transformations, rather than just the transactions for that transformation. There are many other features not mentioned here including pluggable datatypes, more detailed metrics, Carte repository integration, load balancing and more. Watch for Pentaho Kettle 5.0 to take advantage of all of these great features.