New Features for Pentaho Data Integration

Data Federation

The Thin Kettle JDBC Driver

Enterprise Deployment Enhancements

Job Restartability
  • Set checkpoints
  • Restart jobs from last successful check point
Transactional Job Execution
  • Provides the ability to 'roll back' job execution on failure
Security for Database Connections
  • Full permissions: read, write, and delete
  • Securely share connections with other users or groups

Data Movement Load Balancing

Marketplace

The Marketplace gives you the ability to share and/or download new plugins.
  • Supports steps, job entries, database types, perspectives, and more
  • Open in Spoon under Help > Marketplace

Marketplace Highlight: Data Cleaner

Data profiling by Human Interface
  • Analyze tables and columns in preparation for ETL
  • Profile step output in an ETL transformation to analyze prepared data
  • Broad set of profiling analytics
    • Numeric: cardinality, min/max values, nulls, and more
    • String: min/max characters, case, whitespace, and more
    • Others: analytic options like pattern matching, de-duplication, character set distribution, data gap, date/time and more

Additional Data Integration Highlights

  • WebSphere MQ/MQSeries Integration
  • Inline help links on transformation steps and job entries
  • Simplified looping - call jobs and transformations within a transformation and loop through the rows
  • Detailed timing metrics for low level operations – helps to analyze bottle necks in more detail, e.g. database connect and query time vs. transformation time
  • Extended monitoring of sub jobs and transformations in the Carte- and DI-Server (Expand remote job option)
  • Pluggable and new data types, e.g. Internet Address, Time stamp
  • Introduction of REST services (Carte and DI Server)
  • Several new Transformation Steps and Job Entries including: Splunk input/output, table compare, zip file, OpenERP input/output, Telnet

Expanded Partner Ecosystem Support

Enhanced Functionality
  • New InstaView use case templates for Hadoop and Splunk
  • Expanded NoSQL Integration
Expanded Ecosystem Support
  • Hadoop high availability support
  • New integrations: RedShift, Impala, Splunk
  • New Hadoop Certs: Intel, Hortonworks, DataStax
  • Support for latest versions of CDH, MapR, MongoDB, and Cassandra

Enhanced NoSQL Support

10Gen
  • Added support for Aggregation Framework, Replica Sets, and Tag Sets
  • Metadata discovery--samples documents to automatically determine available fields and data types
  • Enhanced control for inserting/upserting data into MongoDB collections
  • Enhanced Instaview Template for MongoDB analytics
  • All enhancements available for Reporting and Data Integration
Datastax
  • Added support for CQL-3 (continuing support for CQL-2)
  • Enhanced capabilities for writing data
  • Composite key support
  • Ability to set Time To Live
  • Query preview
  • Support non-textural column names

Adaptive Big Data Layer

Transparent Access to and Integration of Big Data
  • Insulates from changing versions, vendors, data stores
  • Give customers broad flexibility of choice, rapid time to value, reduced risk
  • Provides native integration into big data ecosystem
  • Broadest, deepest Big Data Support

Documentation Improvements

  • Screen shots match the latest graphical interface for the new Crystal and Onyx themes
  • Streamlined installation approach, with overviews of each installation type and decision tables that describe options
  • Moved Get Support to the top-level and aggregated all learning resources on that page
  • Added a Make a Plan category, containing overviews and work flows (InfoCenter only at this time)
    • Made Supported Technologies its own top-level item so you can have a PDF of this article
  • Added Configure Work Environment category, containing all the needed configuration instructions
  • Added Create Initial Data Model article that explains in detail how to use our wizard to create data models
  • Added a category for Create and Refine Advanced Data Models to show how you would approach refining the data models created by our wizard or those you have created on your own
  • Added PDFs for the current release to the Archives so that they can be easily downloaded as a whole set
  • Added the 4.8 InfoCenter to the Archives so that it can be searched online
  • Ease-of-use
    • Changed to task-oriented information architecture
    • Added standardized introductions for content sequences that include
      • Prerequisites
      • Expertise
      • Tools
      • Login Credentials
      • And other related information
    • Added navigational aids to keep readers within the Happy Path
      • Guide posts in procedures comprised of multiple articles
      • Article table of contents for longer articles
      • Next step topics
    • Added UI tour topics
    • Added decisions topics with choice tables