Release Notes - Pentaho Data Integration - Kettle - Version 4.3.0 GA (4.5.0 GA Suite Release) - HTML format

Bug

  • [PDI-1614] - Get a field via FTP dialog has unreadable options
  • [PDI-2273] - Enabling performance monitoring fails on Mac OS X 10.5
  • [PDI-2436] - Can't test for a variable being not set in the Simple Evaluation job-step
  • [PDI-2480] - User Defined Java Expression (aka Janino) sometimes fails on nulls
  • [PDI-3036] - java.util.ConcurrentModificationException - at org.pentaho.di.trans.Trans.endProcessing (Trans.java:1563) - kettle-3.2.3-r11505
  • [PDI-3703] - Issues with logging and database schema
  • [PDI-4313] - Attempting to delete a non-empty directory in HDFS fails with no error or warning message
  • [PDI-4328] - Cannot load the 3.2 job in 4.0
  • [PDI-4422] - Write to log job entry with an empty log message throws: org.eclipse.swt.SWTException: Failed to execute runnable (java.lang.IllegalArgumentException: Argument not valid)
  • [PDI-4457] - Connect to Sybase IQ database (patch)
  • [PDI-4780] - Shapefilereader error
  • [PDI-4812] - PDI issues with MySQL JDBC connector 5.1.x and behaves differently than the 3.1.x version
  • [PDI-4935] - Stack trace not printed when Spoon.java shows "invalid userid or password" message yet code caught a Throwable.
  • [PDI-5069] - -version command line option for Pan and Kitchen is no longer printing the version information
  • [PDI-5096] - Transformation overwrite prompt when importing a repository doesn't use "Dont ask again?" checkbox value.
  • [PDI-5162] - Duplicating User Defined Java Class does not do Deep Copy
  • [PDI-5177] - VFS File chooser dialog is throwing an exception when an S3 folder is deleted
  • [PDI-5191] - Hadoop Trans Job Executor - Invalid number of reducers does nothing
  • [PDI-5611] - Excel Writer Step to clean up the white space on the left of the step dialog.
  • [PDI-6375] - PDI - Twitter Example does not work Cannot compare types "long" and "java.math.BigInteger"
  • [PDI-6396] - Vectorwise Bulk Loader throws error if stream is empty
  • [PDI-6408] - Logging on the master server should not say OK when there is an error running a job on the slave
  • [PDI-6444] - Fail to open job/transformation if variable is used as the path in repository
  • [PDI-6478] - The process file step is unable to move files from a remote share to local file system
  • [PDI-6507] - Unusual behavior in compare to 4.1.x with closing connections when the transformation is stopped: Error comitting connection / Streaming result set [...] is still active
  • [PDI-6518] - 2 tranformations is sharing same connection and pool connection get closed
  • [PDI-6575] - phd EE distribution hive jars are missing json dependencies
  • [PDI-6602] - Blanking logging interval field in trans job executor and hadoop job executor defaults to previously set logging interval (not 60 seconds)
  • [PDI-6671] - Changing "Ouput processor" in the "Pentaho Reporting Output" step will not flag the transformation as having unsaved changes.
  • [PDI-6677] - MySQL: Error getting row information from database / Streaming result set [...] is still active
  • [PDI-6687] - Jar issue: "Get files via SFTP" job entry in 4.2.0 GA not compatible with jsch-0.1.44.jar
  • [PDI-6691] - Rss Output Step does not produce Custom Rss Output
  • [PDI-6709] - executiong Hive query by hive-jdbc-0.7.0-pentaho-1.0.0.jar with CDH3 will return Error on both preview and resultset
  • [PDI-6720] - User Defined Java Expression: Strange SimpleDateFormat bug in Janino when first value of the row is null
  • [PDI-6724] - RegExp step fails if regular expression contains groups but no Capture Group Fields are specified
  • [PDI-6742] - Splash screen does not disappear automatically.
  • [PDI-6746] - The DI Server's Carte UI and Enterprise Console are not displaying the logs of Jobs/Transformations
  • [PDI-6765] - PostgreSQL Bulk Loader step does not save parameters to DB repository
  • [PDI-6778] - Log entries from all the transformations in a job are mixed together in the individual log files when 'Launch next entries in parallel' is activated.
  • [PDI-6780] - Hadoop Copy Files overrides default dfs.replication
  • [PDI-6787] - libext\JDBC contains two versions of the Oracle JDBC drivers, one of which is obsolete and has errors (ojdbc14.jar)
  • [PDI-6808] - Switching from Native database connections to an JNDI connection leaves username and password information from Native connection.
  • [PDI-6831] - Log Table fails to get updated when "Enable Connection Pooling" is chosen on a connection
  • [PDI-6839] - PGP Encryption Failing When Connected to Repository
  • [PDI-6840] - Concurrency Issue with LogWriter
  • [PDI-6851] - The "Visualize" perspective can't find the Oracle table that is defined in the model.
  • [PDI-6869] - Exception saving some migrated transformations to database repository
  • [PDI-6872] - "Unique rows (Hashset)" step reject incorrectly unique occurences
  • [PDI-6889] - User defined Java Step does not duplicate correctly
  • [PDI-6892] - Check DB Connections not saving connections when connected to Enterprise Repository
  • [PDI-6942] - Doc: Minimize impact of corrupted nodes in Jackrabbit.
  • [PDI-6944] - "log" is misspelled in the Clear Log icons tooltip Execution results panel
  • [PDI-6945] - Process Files step (Copy/Move files) - Allows you to near as much delete all files (not folders) from any path you have permission on if badly configured (in this case the target was given as a folder and not a file)
  • [PDI-6947] - Problem with storage path of mail-content and attachment-files in getPOP Job-Entry
  • [PDI-6987] - Kitchen doesn't close LogAppender
  • [PDI-6990] - Import.sh on PDI EE fails to connect to repository
  • [PDI-6991] - Incorrect error message returned from import.sh
  • [PDI-7000] - LDAPInput error when using dynamic filter
  • [PDI-7008] - Inconsistent jar files between BI Server and PDI (FTP job entry)
  • [PDI-7012] - Dragging and Dropping a Measure/Dimension Level/Category to the bottom of a list causes an exception and usually model corruption
  • [PDI-7021] - Jar issue: Google Analytic step fails when run from DI-Server
  • [PDI-7030] - Autopopulate does not work
  • [PDI-7033] - "Evaluate rows number in a table" or "Evaluate Table content ..."?
  • [PDI-7042] - Not closed InputStream at org.pentaho.di.i18n.GlobalMessages
  • [PDI-7044] - Visualize option does not work in PDI
  • [PDI-7075] - JSON Rownum Issue
  • [PDI-7097] - Deletes from database repository do not always get comitted
  • [PDI-7100] - Fix for PDI-6872: "Unique rows (Hashset)" step reject incorrectly unique occurences
  • [PDI-7114] - As a PDI User, I want to be able to connect to MapR-FS
  • [PDI-7117] - JVM crashes when trying to read a file out of MapR in version 1.2.0
  • [PDI-7119] - "KettleStepException: Unable to find field" error with clustered partitioning and a Group By step when one partition does not get any data rows.
  • [PDI-7121] - Create RPM and Deb packages for Pentaho Hadoop Distribution
  • [PDI-7122] - A successfully completed job is reported as failed if a portion of the job had an error
  • [PDI-7124] - As an HBase user, I want to be able to specify the ZooKeeper client port without providing a configuration file
  • [PDI-7127] - VW Bulkloader sets OutputDone() before Data is committed
  • [PDI-7128] - As a Hadoop Administrator, I want to the Pentaho for Hadoop Distribution to live outside the Hadoop lib/ directory
  • [PDI-7131] - Hadoop Trans Job Executor fails with cryptic error when a ktr used has the input step hop disabled
  • [PDI-7132] - Hadoop File Input will not output any rows when executed
  • [PDI-7134] - Vertica connector: 3 bugs
  • [PDI-7137] - As a Hadoop user, I want to be able to set the KETTLE_HOME based on a Job configuration property
  • [PDI-7140] - CassandraOutput doesn't encode key values correctly
  • [PDI-7142] - ArrayIndexOutOfBounds error when configuring replace based on value types in If Field Value is Null step
  • [PDI-7143] - MapRHadoopConfigurer doesn't handle named cluster and direct IP:port configuration correctly
  • [PDI-7145] - Mixing lazilly converted data and non lazy ( energetic? ) data can cause a binary string conversion error
  • [PDI-7146] - Pentaho MapReduce provides no feedback when job fails to start because permission issues
  • [PDI-7151] - PdiAction stops waiting for a job to finish after 5000000 ms.
  • [PDI-7167] - Webservice step - readStringFromInputStream doesn't correctly handle encodings
  • [PDI-7168] - Agile BI currency format differs from Analyzer currrency format
  • [PDI-7181] - HBase Output Step does not work when executed in a Pentaho MapReduce job
  • [PDI-7195] - JSON Output Step does not return proper JSON structure
  • [PDI-7202] - After importing .KTR, "The key(s) to look up the value(s)" section loses the last field mapped in Insert/Update step
  • [PDI-7204] - Strings cut step: Lazy conversion causes error in converting to string (ava.lang.ClassCastException: [B cannot be cast to java.lang.String )
  • [PDI-7205] - Cannot browse for a transformation in the Pentaho MapReduce Job Entry
  • [PDI-7207] - PDIFTPClient does not honor log level
  • [PDI-7209] - Support for multi-character string as separator in the CSV Input step
  • [PDI-7218] - Carte cannot access db-based repository when re-executing an already published job
  • [PDI-7221] - Nullpointer exception on invalid output step name for Pentaho Map/Reduce
  • [PDI-7222] - Map/Reduce job completes even though wrong input field name has been referenced in java UDF
  • [PDI-7226] - When logging into a DB repository with a user that does not exist the message will say that fact.
  • [PDI-7227] - Scheduled transformation when monitored from DI server (also Carte) on refresh shows up earlier instance of the transformation details.
  • [PDI-7228] - DB-Based repository does not commit deletions
  • [PDI-7239] - Mapping step migration (parsing step xml data) from PDI 4.1.0 to PDI 4.2.0 is buggy
  • [PDI-7240] - carte logging lost for subjobs
  • [PDI-7251] - logging in spoon is sometimes incorrect/misses log lines of children (subjobs or transformations)
  • [PDI-7259] - PDI not resolving DB connection variables in Table Output
  • [PDI-7262] - When referencing MapReduce we should not use a forward slash between Map and Reduce (Map/Reduce)
  • [PDI-7265] - Including the Send date or Received Date in the fields of the Email Messages Input Step causes NPE
  • [PDI-7266] - Cannot use variable on port in Get Mail Job Entry
  • [PDI-7270] - Config property KETTLE_MAX_LOGGING_REGISTRY_SIZE is not picked up
  • [PDI-7273] - Error in Sample: User Defined Java Class - Concatenate firstname and lastname
  • [PDI-7279] - HBaseInput step swallows root exception when creating a MappingAdmin connection
  • [PDI-7283] - On editing connection strings in File Repository, the changes do not take effect on transformation saved in repository.
  • [PDI-7292] - HBase Steps Lose Configuration After Saving to Enterprise Repository
  • [PDI-7299] - PGP Encryption in Linux, "No public key" or "public key not found"
  • [PDI-7302] - Google moved their API documentation and the Google Analytics step points to now invalid reference locations
  • [PDI-7304] - NullPointerException in Hadoop Copy Copy Files Job Entry when Destination is HDFS and you cannot connect
  • [PDI-7309] - Properties Output - append
  • [PDI-7350] - NPE in Kettle Logging when Mapper Input/Output Step Names are not correct
  • [PDI-7365] - Hadoop Copy Files Step Does not Work with CDH3u3
  • [PDI-7366] - Pentaho MapReduce Gives Class not Found Error with Cloudera CDH3u3
  • [PDI-7372] - Null Pointer Exception when the number range steps reads in a null value
  • [PDI-7385] - HBase Input step tries to connect and fail forever
  • [PDI-7388] - NPE When No Rows Passed to HBase Output
  • [PDI-7389] - Remedy Action Request System Database Connection does not work.
  • [PDI-7405] - AgileBI doesn't work on Mac OS X 10.7
  • [PDI-7411] - Standardize HBase Jars across projects
  • [PDI-7421] - Hadoop Job Executor reports failure when it is successful
  • [PDI-7430] - PDI 4.3+ installer should not install Hadoop licenses
  • [PDI-7435] - Update step: Problem with "Use batch updates?" option for Pentaho stable version 4.2.1
  • [PDI-7444] - transformation do not ask to save when running in clustered mode
  • [PDI-7469] - TransGraph and JobGraph memory leak, XulDomContainer does not release them
  • [PDI-7478] - Cannot open .xanalyzer files that were created in PDI
  • [PDI-7482] - Slave server monitor does not show different logs for different jobs on carte instance
  • [PDI-7489] - Subdirectory "administration-console" is shown in archive based install (pdi-ee-4.3.0-RC1.zip) but this is not used and empty
  • [PDI-7490] - NPE when I connect with a 4.3.0 RC Spoon client to a 4.2.1 GA DI server (EE repository)
  • [PDI-7497] - SSH step doesn't decrypt encrypted kettle properties passwords
  • [PDI-7499] - Get System Info step is not returning "Kettle build date"
  • [PDI-7501] - Jersey jars not loading into PDI
  • [PDI-7519] - PreparedStatementMetadataRetrieval flag for databases is defective
  • [PDI-7529] - Saxon jars are out of date in projects that depend on kettle
  • [PDI-7536] - PHD/distributed cache is missing jars necessary for using HBase and logging to database tables within MR
  • [PDI-7565] - Pentaho MapReduce reducer drops first record if value is null
  • [PDI-7566] - Pentaho MapReduce reducer will not perform any type conversion if the first record's value is null
  • [PDI-7575] - Java Filter - outputs flipped when connecting to other components.
  • [PDI-7577] - Stopping a job while the Hadoop Job executor (in advanced mode with blocking) is executing does not stop the job on the Hadoop Cluster
  • [PDI-7581] - Excel input step not reading all the fields from a .XLSX worksheet
  • [PDI-7582] - Pentaho-Geo Map showing error when trying to load cities with a global view
  • [PDI-7596] - JSON Input Step - Rownum not produced if input is not from a file.
  • [PDI-7603] - Cannot run Pentaho MapReduce on Cloudera CDH3u3
  • [PDI-7695] - PMR: Suppressing output of reducer key or value doesn't work when a combiner transformation is used
  • [PDI-7705] - generated javadoc is incomplete
  • [PDI-7717] - In PDI, not able to create Analyzer report from a model since analyzer UI does not load properly.
  • [PDI-7725] - The current license reference on the splash screen doesn't mention Apache
  • [PDI-7728] - Corrupt repositories.xml - Spoon will not open
  • [PDI-7729] - A data type error occurs when reading integer column values using Cassandra input
  • [PDI-7738] - NPE when HBase Input is not fully configured
  • [PDI-7742] - Cannot run a Transformation from a DB repository on DIS
  • [PDI-7751] - Items scheduled to execute when DIS is offline will cause exception upon server startup
  • [PDI-7760] - PDI-4654 causes backwards compatibility issues and should be reverted in PDI 4.3.0 GA
  • [PDI-7823] - Calculator step adds a space when concatenating a string and integer using "A+B" calculation.

Improvement

  • [PDI-2375] - In filter rows step, make it more obvious you can specify 2 targets or none
  • [PDI-4526] - Force Excel ouput to enforce data formatting such as percentages
  • [PDI-4550] - Confusing log entries on the console: SetElements on listbox called: collection has x rows, could not parse [vertical] as Align value, Cannot overlay element with id
  • [PDI-4954] - No of Rows limit in Excel
  • [PDI-5259] - As a Hadoop User, I want to be able to specify different output key/value classes for mappers and reducers
  • [PDI-5260] - As a Hadoop User, I don't want to have to know the internal Hadoop data type classes when using a TJE
  • [PDI-5417] - Implement Hadoops NullWritable class as output
  • [PDI-5500] - Incorrect return code by pan.sh (when the importer code is called, e.g. from a JavaScript)
  • [PDI-6797] - "Specify by reference" links should not take variables
  • [PDI-6924] - Group By - Concatenate strings with "," has lousy performance
  • [PDI-6961] - The Split Field To Rows should support split by regular expression
  • [PDI-6965] - Relocate and group all Hadoop related jars into a single directory for ease of maintenance
  • [PDI-7203] - As a Hadoop User, I want to be able to delete the output directory as part of the Pentaho MapReduce Job Entry
  • [PDI-7242] - Integrate the Single Threading transformation engine as a default option for a Reducer in the MapReduce execution job entry.
  • [PDI-7247] - Review JDBC drivers being included in DIS and PDI ... and JDBC as a dependency of Plugin Actions
  • [PDI-7248] - Create a group called "Big Data" and move the Hadoop and NoSQL databases there
  • [PDI-7258] - Integrate the SingleThreadedTransformationEngine as an option for the Combiner of a Pentaho MapReduce Job
  • [PDI-7293] - The current Hadoop conversion factory doesn't take into account the PDI conversion rules
  • [PDI-7296] - Incorrect return code by import.sh /.bat (2 : Unexpected error during repository import is not set correctly)
  • [PDI-7303] - As a PDI user, I'd like samples of how to use the HL7 steps and job entries
  • [PDI-7443] - The repo list does not show the selected repo
  • [PDI-7557] - Make sure all jars in PDI client also appear in the DI Server
  • [PDI-7560] - Implement a recursive delete when users attempt to delete a non-empty folder in VFS browser (File -> Open URL)
  • [PDI-7576] - Java Filter - Some samples to add on how to use them.
  • [PDI-7578] - Switch / Case - Searching inside the incoming string for token to switch on (Exampel attached)
  • [PDI-7583] - Update name of hbase-comparators.jar
  • [PDI-7718] - Updated Community Wiki to reflect Pentaho Big Data Plugin changes

New Feature

  • [PDI-4952] - Feature Request: In excel output using a template add a new option to output to named ranges
  • [PDI-6361] - As a Hadoop User, I want to be able to supply multiple Input Paths for the TJE
  • [PDI-7104] - MongoDB Query Input Step
  • [PDI-7211] - As Pentaho we want Analyzer's new heat grid visualization to work in PDI
  • [PDI-7285] - As an ETL developer I want out of the box database support for Exasol 4.0
  • [PDI-7295] - As a Hadoop user, I do not want to deploy and maintain the Kettle engine on every node
  • [PDI-7297] - The ability for the PDI Hadoop File Input step to read and decompress HDFS files which have been encoded using the "snappy" file compression library
  • [PDI-7384] - Create a MapR specific assembly for Kettle CE
  • [PDI-7397] - As an ETL-Developer, I want to use Human Inference DQ steps (EasyDQ)

Task

  • [PDI-6734] - Create unit tests that will test the functionality of the various repository types
  • [PDI-7144] - Verify (and fix) all hadoop jobs and steps that do not properly support the different Hadoop distros
  • [PDI-7147] - Change Kettle license to Apache
  • [PDI-7148] - Change the name of the Hadoop Trans Job Executor to Pentaho Map Reduce or Pentaho MR
  • [PDI-7420] - PDI CE and EE packages should contain samples from the Pentaho Big Data Plugin

Sub-task

  • [PDI-6877] - Validate in 4.3.0 - PDI-6839 - PGP Encryption Failing When Connected to Repository
  • [PDI-7311] - Validate in 4.3.0 - Log Table fails to get updated when "Enable Connection Pooling" is chosen on a connection.
  • [PDI-7314] - Validate 4.2.2 / 4.3.0 - -version command line option for Pan and Kitchen is no longer printing the version information
  • [PDI-7316] - Validate in 4.3.0 - Check DB Connections not saving connections when connected to Enterprise Repository
  • [PDI-7319] - Validate 4.2.2 / 4.3.0 - Problem with storage path of mail-content and attachment-files in getPOP Job-Entry
  • [PDI-7323] - Validate 4.2.2 / 4.3.0
  • [PDI-7325] - Validate 4.2.2 / 4.3.0 - PDI-6177 - When using the dimension Lookup update step with a technical key as "use table maximum +1", conversion error occurs
  • [PDI-7327] - Validate in 4.3.0 - Log entries from all the transformations in a job are mixed together in the individual log files when 'Launch next entries in parallel' is activated.
  • [PDI-7329] - Validate in 4.3.0 - Kettle: Transformation with Set Variables stored in Enterprise Repository doesn't show up
  • [PDI-7331] - Validate 4.2.2 / 4.3.0
  • [PDI-7333] - Validate 4.3.0
  • [PDI-7335] - Validate in 4.3.0 - Transformation overwrite prompt when importing a repository doesn't use "Dont ask again?" checkbox value.
  • [PDI-7337] - Validate 4.2.2 / 4.3.0
  • [PDI-7340] - Validate in 4.3.0 - When error occurs during login, regardless of cause, error message says "Repository login failed. Invalid userid or password."
  • [PDI-7342] - Validate in 4.3.0 -Process Files step (Copy/Move files) - Allows you to near as much delete all files (not folders) from any path you have permission on if badly configured (in this case the target was given as a folder and not a file)
  • [PDI-7345] - Validate in 4.3.0 - "Evaluate rows number in a table" or "Evaluate Table content ..."?
  • [PDI-7348] - Validate in 4.3.0 - Connect to Sybase IQ database (patch)
  • [PDI-7458] - validate in 4.5 RC
  • [PDI-7462] - validate in 4.5 RC
  • [PDI-7463] - validate in 4.5 RC
  • [PDI-7465] - validate in 4.3 RC - Excel Writer Step to clean up the white space on the left of the step dialog.
  • [PDI-7550] - Remove PHD project
  • [PDI-7601] - Remove PHD from installer
  • [PDI-9588] - Test PDI-7167 in 4.3.0 (4.5)

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.