Tuesday, September 13, 2011

KETTLE 4.2.0 stable is out

Download on sourceforge...

More informations from pentaho kettle's forum :

Here are some of the new things in this version:

  • The Excel Writer step offers advanced Excel output functionality to control the look and feel of your spreadsheets.
  • Graphical performance and progress feedback for transformations
  • The Google Analytics step allows download of statistics from your Google analytics account
  • The Pentaho Reporting Output step makes it possible for you to run your (parameterized) Pentaho reports in a transformation. It allows for easy report bursting of personalized reports.
  • The Automatic Documentation step generates (simple) documentation of your transformations and jobs using the Pentaho Reporting API.
  • The Get repository names step retrieves job and transformation information from your repositories.
  • The LDAP Writer step
  • The Ingres VectorWise (streaming) bulk loader step
  • The Greenplumb (streaming) bulk loader step (for gpload)
  • The Talend Job Execution job entry
  • Healthcare Level 7 : HL7 Input step, HL7 MLLP Input and HL7 MLLP Acknowledge job entries
  • The PGP File Encryption, Decryption & validation job entries facilitate encryption and decryption of files using PGP.
  • The Single Threader step for parallel performance tuning of large transformations
  • Allow a job to be started at a job entry of your choice (continue after fixing an error)
  • The MongoDB Input step (including authentication)
  • The ElasticSearch bulk loader
  • The XML Input Stream (StAX) step to read huge XML files at optimal performance and flat memory usage by flattening the structure of the data.
  • The Get ID from Slave Server step allows multi-host or clustered transformations to get globally unique integer IDs from a slave server:
  • Carte improvements:
    • reserve next value range from a slave sequence service
    • allow parallel (simultaneous) runs of clustered transformations
    • list (reserved and free) socket reservations service
    • new options in XML for configuring slave sequences
    • allow time-out of stale objects using environment variable KETTLE_CARTE_OBJECT_TIMEOUT_MINUTES
    • Memory tuning of logging back-end with: KETTLE_MAX_LOGGING_REGISTRY_SIZE, KETTLE_MAX_JOB_ENTRIES_LOGGED, KETTLE_MAX_JOB_TRACKER_SIZE allowing for flat memory usage for never ending ETL in general and jobs specifically.
  • Repository Import/Export
    • Export at the repository folder level
    • Export and Import with optional rule-based validations
    • Import command line utility allow for rule-based (optional) import of lists of transformations, jobs and repository export files:
  • ETL Metadata Injection:
    • Retrieval of rows of data from a step to the “metadata injection” step
    • Support for injection into the “Excel Input” step
    • Support for injection into the “Row normaliser” step
    • Support for injection into the “Row Denormaliser” step
  • The Multiway Merge Join step (experimental) allows for any number of data sources to be joined using one or more keys using an inner or a full outer join algorithm.
I'd like the Talend Job Execution job entry very much... it's so funny to use Talend inside of Kettle ;-)