More informations from pentaho kettle's forum :
Here are some of the new things in this version:
- The Excel Writer step offers advanced Excel output functionality to control the look and feel of your spreadsheets.
- Graphical performance and progress feedback for transformations
- The Google Analytics step allows download of statistics from your Google analytics account
- The Pentaho Reporting Output step makes it possible for you to run your (parameterized) Pentaho reports in a transformation. It allows for easy report bursting of personalized reports.
- The Automatic Documentation step generates (simple) documentation of your transformations and jobs using the Pentaho Reporting API.
- The Get repository names step retrieves job and transformation information from your repositories.
- The LDAP Writer step
- The Ingres VectorWise (streaming) bulk loader step
- The Greenplumb (streaming) bulk loader step (for gpload)
- The Talend Job Execution job entry
- Healthcare Level 7 : HL7 Input step, HL7 MLLP Input and HL7 MLLP Acknowledge job entries
- The PGP File Encryption, Decryption & validation job entries facilitate encryption and decryption of files using PGP.
- The Single Threader step for parallel performance tuning of large transformations
- Allow a job to be started at a job entry of your choice (continue after fixing an error)
- The MongoDB Input step (including authentication)
- The ElasticSearch bulk loader
- The XML Input Stream (StAX) step to read huge XML files at optimal performance and flat memory usage by flattening the structure of the data.
- The Get ID from Slave Server step allows multi-host or clustered transformations to get globally unique integer IDs from a slave server: http://wiki.pentaho.com/display/EAI/...m+Slave+Server
- Carte improvements:
- reserve next value range from a slave sequence service
- allow parallel (simultaneous) runs of clustered transformations
- list (reserved and free) socket reservations service
- new options in XML for configuring slave sequences
- allow time-out of stale objects using environment variable KETTLE_CARTE_OBJECT_TIMEOUT_MINUTES
- Memory tuning of logging back-end with: KETTLE_MAX_LOGGING_REGISTRY_SIZE, KETTLE_MAX_JOB_ENTRIES_LOGGED, KETTLE_MAX_JOB_TRACKER_SIZE allowing for flat memory usage for never ending ETL in general and jobs specifically.
- Repository Import/Export
- Export at the repository folder level
- Export and Import with optional rule-based validations
- Import command line utility allow for rule-based (optional) import of lists of transformations, jobs and repository export files: http://wiki.pentaho.com/display/EAI/...+Documentation
- ETL Metadata Injection:
- Retrieval of rows of data from a step to the “metadata injection” step
- Support for injection into the “Excel Input” step
- Support for injection into the “Row normaliser” step
- Support for injection into the “Row Denormaliser” step
- The Multiway Merge Join step (experimental) allows for any number of data sources to be joined using one or more keys using an inner or a full outer join algorithm.
I'd like the Talend Job Execution job entry very much... it's so funny to use Talend inside of Kettle ;-)
No comments:
Post a Comment