Release Notes - Pentaho Data Integration - Kettle - Version 5.2.0 GA - HTML format

Bug

  • [PDI-2805] - Local Re-partitioning from N copies to N partitions fails
  • [PDI-3064] - Mapping / Sub-transformation doesn't pick up named parameters defined in parent if they are also defined in the subtrans.
  • [PDI-3723] - Repository Explorer - Missing "Are you sure" warning dialog when deleting an empty Folder from the Repository
  • [PDI-3874] - Repository Configuration Dialog is wrong
  • [PDI-4667] - Executing a transformation on carte as a slave server is failing (setting the WebAppName to non-empty for use with a standard carte servlet produces a really bad error message)
  • [PDI-5601] - ability to create duplicate sub folders in PDI EE Repository
  • [PDI-5617] - User with read and write permissions can not save transformation in home folder
  • [PDI-6380] - Kettle remote execution: all requests are sent twice because authentication header is missing at 1st attempt
  • [PDI-6619] - Kitchen.bat - PENTAHO_DI_JAVA_OPTIONS env variable does not support multiple parameters
  • [PDI-6807] - Missing .exe after javaw in Spoon.bat
  • [PDI-7139] - Importing from enterprise repository changes the transformation path when "specify by name and directory" is used when using / as the import directory
  • [PDI-7243] - JSON Output Step throws NullPointerException when previewing no rows
  • [PDI-7516] - Specifying large limit in Cassandra input step causes timeout errors
  • [PDI-7734] - EE Repository export with Rules: When it fails, no UI feedback is given and the file is incomplete
  • [PDI-7800] - Cassandra Input Hangs on Preview if no active hop for the output stream
  • [PDI-8958] - In Directory Selection dialog can not delete or rename a directory
  • [PDI-9170] - Mysql Bulk loader step minor corrections
  • [PDI-9461] - SSH2 Get job entry fails when job is saved to the Enterprise Repository
  • [PDI-9509] - Clicking on restore in the PUR explorer when a version is not selected causes an exception
  • [PDI-9514] - Inconsistency within .sh startup scripts regarding the use of launcher can lead to different behavior between Spoon and Pan/Kitchen/Carte
  • [PDI-10366] - Spoon does Not Show a Transformation as Changed After Changing the Type in the Get System Info Step
  • [PDI-10386] - Spoon Help/About needs to be aligned with UX requests
  • [PDI-10401] - DI Repository Explorer: clicking on the Security tab shows an error dialog: Unable to initialize security tab [...] Unable to get users
  • [PDI-10432] - pentaho-cassandra-plugin-1.0.0.zip has a spurious .gitignore in the .zip
  • [PDI-10729] - Reducer with single thread unchecked hangs
  • [PDI-10770] - Kettle Insert/Update performance degrades after upgrading Postgres JDBC from 8.4 to 9.x
  • [PDI-10847] - Operations Marts transformations needed when switching to Oracle and MySQL are missing
  • [PDI-10923] - errors when running the pdi data migrator
  • [PDI-11033] - Unable to remove job finished with errors from Carte Monitor page
  • [PDI-11066] - Should not be able to create a partition on a step without a field
  • [PDI-11344] - Blocking step poor performance because of not setting BaseStep.first to false
  • [PDI-11350] - PDI 5.x doesn't allow "specify by reference" option for transformations using a DATABSE based repository
  • [PDI-11523] - Enterprise Repository: User with pound-sign in name cannot view files in their Trash
  • [PDI-11590] - Transformation Executor Step Result Rows Table has 2 Length Columns
  • [PDI-11616] - Encrypted passwords with Carte Web Services (DI-Server) are not working
  • [PDI-11912] - An empty note causes a fatal error in spoon after restart it when trying to edit the note
  • [PDI-11929] - PMR jobs failing with hdp20 shim on Windows using 5.0.4 with "Connection refused to job history server" error
  • [PDI-11938] - XML Output not well-formed when '&' char in attribute
  • [PDI-11959] - Hadoop Job Executor: Jobs using org.apache.hadoop.mapreduce API don't work
  • [PDI-12124] - Hive JDBC Connection in PRD & PUC fails when using prepared statement
  • [PDI-12137] - Multiway merge step fails with java.lang.IllegalArgumentException
  • [PDI-12140] - IndexOutOfBoundsException when transitioning between different partition schemas
  • [PDI-12155] - Error when saving a transformation with MongoDB Output step into Kettle Database Repository
  • [PDI-12204] - Transformation Executor issue with main output and result rows
  • [PDI-12211] - A single step copying data to multiple partitioned steps does not work
  • [PDI-12215] - S3 File Output - Variables not working in AWS Credentials fields
  • [PDI-12257] - Generate Row Step Does not take parameter when connected to repository
  • [PDI-12279] - Generate Row step throws exception when saved in repository and using variable for row limit.
  • [PDI-12318] - NullPointerException running job with massive parallel Job/Transformation loads
  • [PDI-12337] - Cassandra Output CQL3 - data is not loaded into newly added column when "Update Metadata" is used
  • [PDI-12348] - Cassandra Output - data is not loaded into database when "Use Compression" and CQL mode is used
  • [PDI-12361] - Cassandra Output - Time To Live (TTL) is not working with Thrift mode
  • [PDI-12362] - When running in restricted environments, a slave server tries and fails to create a metastore folder
  • [PDI-12370] - Spoon on linux ubuntu unity 14.04 doesn't show the menu bar
  • [PDI-12373] - Cassandra Output - state of 'Insert unlogged batches' checkbox is not saved
  • [PDI-12375] - Spoon.bat code could not find "find" command in line 54 due to missing environmental path.
  • [PDI-12378] - Cassandra Input CQL2 - in some case data in column is missing
  • [PDI-12409] - Cassandra Output/Input CQL3 - key fields metadata is displayed twice
  • [PDI-12427] - Field Type Selection box has a dubious blank entry at the end of the list, Copy/Paste is no longer working
  • [PDI-12440] - Cannot connect to an EE repo in spoon
  • [PDI-12453] - Have PDI R sort factor levels before creating factor columns in a data frame
  • [PDI-12455] - PDI Master Deadlocks when used in high volume cluster
  • [PDI-12456] - SpoonDebug: Messages on stderr output are not logged
  • [PDI-12486] - In case AES password encryption fails with "Unable to AES encrypt password", shared.xml is blown away: Unexpected problem reading shared objects from XML file : null
  • [PDI-12504] - Scheduling seems to schedule twice when an error in initializing of a step happens
  • [PDI-12521] - Google Analytics Data Input Step missing collections library
  • [PDI-12525] - PDI YARN clustering does not work if PDI was installed using the installer
  • [PDI-12539] - Move Files Job Entry Deletes Wrong File
  • [PDI-12545] - Edit the Partitioning method: Remainder of Division throws NPE
  • [PDI-12546] - Partitioned Stream lookup step does not honor the "Copy data to next steps" data movement method from the info step
  • [PDI-12552] - "Start or Restart Job From Selected Job Entry Copy" button disabled in 5.1.0
  • [PDI-12560] - HTTP step add date and time to file name options doesnot work properly
  • [PDI-12562] - Excel Input step modifies the files it opens (Apache POI)
  • [PDI-12564] - Thin JDBC driver mishandles ThinStatement.getMoreResults()
  • [PDI-12565] - Thin JDBC Driver - ThinDatabaseMetaData.getTables does not handle the tableNamePattern as expected.
  • [PDI-12575] - YARN Kettle Cluster won't start if a non-"official" Hadoop hostname is used
  • [PDI-12577] - Hadoop Job Executor doesn`t work with variables provided in path to jar
  • [PDI-12582] - Sqoop Import/Export Dialogs`s database selection "Use Advanced Options" cause null pointer exception
  • [PDI-12589] - Kitchen does not print Pig Script Executor Logs
  • [PDI-12599] - MongoDB Output step has a race condition in batch insert retries that can cause errors to be logged.
  • [PDI-12600] - Execute SQL Script: Incorrect parsing of SQL statement as SQL Comments
  • [PDI-12601] - Fix for the poor PMR reduce performance with the normal engine reducer
  • [PDI-12606] - Change column header "Label" text, to "Version"
  • [PDI-12622] - Japanese Localization in spoon seems to be incorrect
  • [PDI-12629] - Error in YARN clustering step on HDP2.1
  • [PDI-12631] - YARN clustering step doesn't work on windows
  • [PDI-12639] - Hive/Hive2: Unable to pre-load connection to the connection pool.
  • [PDI-12642] - PDF GettingStart does not exist
  • [PDI-12654] - JobExecutor Result Rows not being loaded from repo
  • [PDI-12658] - KTR file fails to open when "Table Output step is configured to use variable for Commit size"
  • [PDI-12659] - Fuzzy Match uses matching field instead of lookup fields
  • [PDI-12666] - Incorrect logging level at JobEntryCopyMoveResultFilenames line 583
  • [PDI-12671] - Switch / Case does not handle null values correctly
  • [PDI-12716] - Duplicated i18n resource's keys in the MongoDb plugin
  • [PDI-12729] - Unable to save kjb to database repository with "JobRestart" attribute present.
  • [PDI-12734] - Export repository to XML file step doesn't work properly on 5.1
  • [PDI-12736] - Unable to save transformation to EE repository after creating a Data Service
  • [PDI-12737] - No confirmation is requested when multiple transformations/jobs are deleted in the repository explorer
  • [PDI-12738] - When using multiple Enterprise Repositories of different versions of PDI can cause all jobs to fail if a transformation using an unsupported database type is imported
  • [PDI-12739] - The command line import utility in 5.0.0 is over three times slower than 4.4.0 GA.
  • [PDI-12740] - A corrupt repository object leaves the repository explorer in an unusable state.
  • [PDI-12741] - Renaming a file in Kettle's repository explorer does not set comments in the commit
  • [PDI-12742] - In Kettle's Repository Explorer, version history is not refreshed when an object is renamed
  • [PDI-12743] - No warning message or UI refresh when trying to create a new folder when it already exists
  • [PDI-12745] - Exporting/Importing Jobs breaks Transformation specification when using "Specify by reference"
  • [PDI-12746] - Error when switching from a repository; however, this only happens in an active-active clustered configuration suggested by Engineering
  • [PDI-12747] - Metastore MemoryMetaStore uses synchronized methods instead of a concurrent collection
  • [PDI-12748] - NPE and exposed exception on the DI-Server/Carte status page when the repository is not correctly defined (or repositories.xml is not accessible) - exception is not logged in the pentaho.log nor any tomcat\logs
  • [PDI-12755] - REG: HDP21:Hive: if we change the db-scheme, then it won't be used until pooling connection has closed; the original will be used instead.
  • [PDI-12756] - PentahoSystem.ERROR_0014 when starting DI Server with oracle-11.2.2.jar or ojdbc6 driver
  • [PDI-12761] - EE: HDP21: EE license error.
  • [PDI-12762] - Object with invalid name cannot be edited/removed in EE repository
  • [PDI-12770] - generateClusterSchema.sh and run_kettle_cluster_example.bat call classes that do not exist
  • [PDI-12775] - CLONE - Step Logging in Database when executing from Command Line doesn't work
  • [PDI-12788] - CLONE - Spoon does Not Show a Transformation as Changed After Changing the Type in the Get System Info Step
  • [PDI-12797] - Duplicated i18n resources' keys in the Kettle project
  • [PDI-12805] - KTR file fails to open when "Sort Rows" step is configured to use variable"
  • [PDI-12809] - HDP21:Kerberos:Sqoop Import: Not able to import the RDBMS-table into the Hbase table using Sqoop Import step.
  • [PDI-12859] - DI server failed to start after turn OFF and then turn ON repository versioning.
  • [PDI-12869] - Translator2 does not handle the new source code format very well
  • [PDI-12887] - PDI Visualize Perspective - Reload Model Button Not Working
  • [PDI-12898] - SDR does not work if you use a connection with a space in the name
  • [PDI-12900] - Model created by Build Model step does not use the correct case for the tables
  • [PDI-12904] - Cannot create a model from a postgres output step
  • [PDI-12927] - Filter Rows Performance
  • [PDI-12961] - Repository Name and Description are messed up
  • [PDI-12974] - Spoon: Data Refinery folder needs renamed and reordered
  • [PDI-13068] - 5.2: Missing "Configuring Hadoop" documentation
  • [PDI-13192] - Logging error in Pentaho Reporting Output step
  • [PDI-14464] - Malformed HTML Documents

Epic

  • [PDI-9763] - As an Administrator, I want to clean up the EE Repository
  • [PDI-11564] - As a PDI and Hadoop user, I want PDI to support Kerberos Authentication (Full Shim Non MapR)
  • [PDI-12189] - EPIC: PDI-R v2

Improvement

  • [PDI-2960] - Allow pan to determine classpath independent of working directory
  • [PDI-7071] - Would like to see support for metadata injection in additional steps
  • [PDI-7523] - Table metadata should be provided by DatabaseMeta objects, not Database
  • [PDI-7630] - The junit jar should be removed from Spoon and the DI Server.
  • [PDI-7773] - Add metadata injection to text file output
  • [PDI-9291] - As a Hadoop User, I want to be able to specify the driver class in a Hadoop Job Executor in Simple Mode
  • [PDI-9857] - Hadoop Job Executor Step seems to require old org.apache.hadoop.mapred.* interface
  • [PDI-11049] - PDI fails to connect to different Hive databases other that "default"
  • [PDI-11283] - Metadata injection for User Defined Java Expression step
  • [PDI-11830] - As a plugin developer I want to have a dialog to edit rows of data
  • [PDI-12360] - The wrong parameter is pass in the TransformationFinish extension point
  • [PDI-12390] - As a Metadata Injection User, I want support for the JSON Output Step
  • [PDI-12570] - Pentaho MapReduce: Set Kettle Properties

New Feature

  • [PDI-10799] - As an ETL administrator, I would like to limit the number of versions kept in the repository or have a utilitity for clearing older versions.
  • [PDI-12183] - Ubuntu 14.04 LTS Support

Story

  • [PDI-11582] - HDP 2.1 Linux Shim Support (Unsecure)
  • [PDI-12313] - Verify support for CDH 4.7
  • [PDI-12621] - Highlight "PDI 5.x to 5.x Functionality Change" article in the "Create Upgrade Plan"

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.