Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-18159

Text File Input "Get Fields" fails for S3 Location/Source

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Severity: Urgent
    • Resolution: Fixed
    • Affects Version/s: 8.2.0 GA, 8.3.0 GA
    • Fix Version/s: 9.0.0 GA
    • Environment:
    • Story Points:
      8
    • PDI Sub-component:
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Tatooine (Maint)
    • Operating System/s:
      Windows 10
    • Steps to Reproduce:
      Hide

      Prerequisites:  Steps to reproduce assume ability to connect to S3 and deposit sample file in a bucket, then connect to retrieve its data from Spoon.

      1)  Create a simple text file with delimited data and place on S3/Create a simple text file with delimited data and place on S3.

      2)  Create a transformation and add a Text File Input step.

      3)  On the File tab, hit the Broswe button to navigate to an input file.

      4)  On the Open File dialog box, change the location to S3.  Your local S3 credentials for your bucket should be used to automatically connect to your S3 instance.

       

      5)  Select your sample file on S3.  Add the selected file to the step.

      6)  Go to the Fields tab and select the Get Fields button.

      7)  Choose a sample lines value.

      8)  An error is raised and the fields are not returned correctly.  In Support testing, the fields are all retrieved on a single row on the of the fields grid.

       

      java.lang.reflect.InvocationTargetException: Problem encountered scanning the CSV file in row #11 (get first line) : org.pentaho.di.core.exception.KettleFileException: 
      Exception reading line: java.io.IOException: Attempted read on closed stream.Attempted read on closed stream.
      Attempted read on closed stream.
       at org.pentaho.di.ui.trans.steps.fileinput.text.TextFileCSVImportProgressDialog$1.run(TextFileCSVImportProgressDialog.java:135) at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:113)Caused by: org.pentaho.di.core.exception.KettleFileException: 
      Exception reading line: java.io.IOException: Attempted read on closed stream.Attempted read on closed stream.
      Attempted read on closed stream.
       at org.pentaho.di.trans.steps.fileinput.text.TextFileInputUtils.getLine(TextFileInputUtils.java:350) at org.pentaho.di.ui.trans.steps.fileinput.text.TextFileCSVImportProgressDialog.doScan(TextFileCSVImportProgressDialog.java:372) at org.pentaho.di.ui.trans.steps.fileinput.text.TextFileCSVImportProgressDialog.access$100(TextFileCSVImportProgressDialog.java:69) at org.pentaho.di.ui.trans.steps.fileinput.text.TextFileCSVImportProgressDialog$1.run(TextFileCSVImportProgressDialog.java:132) ... 1 moreCaused by: java.io.IOException: Attempted read on closed stream. at org.apache.http.conn.EofSensorInputStream.isReadAllowed(EofSensorInputStream.java:107) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:133) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at org.apache.commons.vfs2.util.MonitorInputStream.read(MonitorInputStream.java:91) at org.pentaho.di.core.compress.CompressionInputStream.read(CompressionInputStream.java:68) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112) at java.io.InputStreamReader.read(InputStreamReader.java:168) at org.pentaho.di.trans.steps.fileinput.text.TextFileInputUtils.getLine(TextFileInputUtils.java:299) ... 4 more

      9)  After dismissing the error stack, a further error window is raised:  ERROR - Unable to show results of the document scan.

      Show
      Prerequisites:   Steps to reproduce assume ability to connect to S3 and deposit sample file in a bucket, then connect to retrieve its data from Spoon. 1)  Create a simple text file with delimited data and place on S3/Create a simple text file with delimited data and place on S3. 2)  Create a transformation and add a Text File Input step. 3)  On the File tab, hit the Broswe button to navigate to an input file. 4)  On the Open File dialog box, change the location to S3.  Your local S3 credentials for your bucket should be used to automatically connect to your S3 instance.   5)  Select your sample file on S3.  Add the selected file to the step. 6)  Go to the Fields tab and select the Get Fields button. 7)  Choose a sample lines value. 8)  An error is raised and the fields are not returned correctly.  In Support testing, the fields are all retrieved on a single row on the of the fields grid.   java.lang.reflect.InvocationTargetException: Problem encountered scanning the CSV file in row #11 (get first line) : org.pentaho.di.core.exception.KettleFileException:  Exception reading line: java.io.IOException: Attempted read on closed stream.Attempted read on closed stream. Attempted read on closed stream. at org.pentaho.di.ui.trans.steps.fileinput.text.TextFileCSVImportProgressDialog$1.run(TextFileCSVImportProgressDialog.java:135) at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:113)Caused by: org.pentaho.di.core.exception.KettleFileException:  Exception reading line: java.io.IOException: Attempted read on closed stream.Attempted read on closed stream. Attempted read on closed stream. at org.pentaho.di.trans.steps.fileinput.text.TextFileInputUtils.getLine(TextFileInputUtils.java:350) at org.pentaho.di.ui.trans.steps.fileinput.text.TextFileCSVImportProgressDialog.doScan(TextFileCSVImportProgressDialog.java:372) at org.pentaho.di.ui.trans.steps.fileinput.text.TextFileCSVImportProgressDialog.access$100(TextFileCSVImportProgressDialog.java:69) at org.pentaho.di.ui.trans.steps.fileinput.text.TextFileCSVImportProgressDialog$1.run(TextFileCSVImportProgressDialog.java:132) ... 1 moreCaused by: java.io.IOException: Attempted read on closed stream. at org.apache.http.conn.EofSensorInputStream.isReadAllowed(EofSensorInputStream.java:107) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:133) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at org.apache.commons.vfs2.util.MonitorInputStream.read(MonitorInputStream.java:91) at org.pentaho.di.core.compress.CompressionInputStream.read(CompressionInputStream.java:68) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112) at java.io.InputStreamReader.read(InputStreamReader.java:168) at org.pentaho.di.trans.steps.fileinput.text.TextFileInputUtils.getLine(TextFileInputUtils.java:299) ... 4 more 9)  After dismissing the error stack, a further error window is raised:  ERROR - Unable to show results of the document scan.

      Description

      The Text File Input step's "Get Fields" fails for an S3 Location/Source.  When adding a file from S3 and then retrieving the fields using the Get Fields button, the fields are not correctly retrieved and an error stack is raised.

      If the delimiter is different from the source file's, the fields are all returned on the second row and the first row is blank.  If the delimiter is set correctly, the fields are returned starting on the second row.

      Exit the step and re-enter it, and the blank first row is removed.  But the Get Fields button will return an error regardless of whether the correct delimiter is chosen.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              rrosinha Ricardo Rosinha
              Reporter:
              criccardi Christopher Riccardi
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: