Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-18979

Microsoft Excel Input mishandling empty strings and null values when using streaming

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Severity: Urgent
    • Resolution: Fixed
    • Affects Version/s: 8.3.0.10 GA, 9.0.0.5 GA
    • Fix Version/s: 9.2.0 GA
    • Component/s: Step
    • Labels:
    • Story Points:
      0
    • PDI Sub-component:
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Tatooine (Maint)
    • Steps to Reproduce:
      Hide

      SCENARIO 1

      1: Run write_excel_file_1.ktr write_excel_file_1.ktr
      2. Run read_excel_file.ktr read_excel_file.ktr

       SCENARIO 2

      1: Run write_excel_file_2.ktr write_excel_file_2.ktr
      2: Run read_excel_file.ktr read_excel_file.ktr

       

      Show
      SCENARIO 1 1: Run write_excel_file_1.ktr  write_excel_file_1.ktr 2. Run read_excel_file.ktr  read_excel_file.ktr   SCENARIO 2 1: Run write_excel_file_2.ktr  write_excel_file_2.ktr 2: Run read_excel_file.ktr  read_excel_file.ktr  

      Description

      Microsoft Excel Input mishandling empty strings and null values when using streaming.

      This issue occurs both in the PDI Client and on the Pentaho Server.

      These Transformations use the Microsoft Excel Input step.

      First we write an xlsx file with the Microsoft Excel Writer step using Streaming. This file is readable by Microsoft Excel



      Then we read the generated file with an Microsoft Excel Input step using Streaming.



      We have observed some issues related to cells that contain an:

      • Empty String
      • Null

      The observations are summarized below in the following two scenarios:

      SCENARIO 1

      1.1: A file is created containing an:

      • Empty String (BOXED in BLUE)
      • Null (BOXED in ORANGE)

      neither of these cells are the last cell in their row.



      1.2: After the file has been read the following can be seen:

      • The cell that contains the Empty String has its value changed to the value of the next cell in the row. (BOXED in RED)
      • The next cell in the row has its value changed to Null. (BOXED in ORANGE)
      • The Microsoft Excel Input step does not continue reading rows after it reaches the row containing the Null value. (BOXED in BLUE)


      SCENARIO 2

      2.1: A file is created containing an:

      • Empty String (BOXED in BLUE)
      • Null (BOXED in ORANGE)

      the cell with the Empty String is the last cell in its row.



      2.2: After the file has been read the following can be seen:

      • The cell that contains the Empty String has its value replaced by the value of the first cell of the next row. (BOXED in RED)
      • The next row values are replaced by the values of the row coming after. (BOXED in ORANGE)
      • The row coming after is repeated (BOXED in BLUE)
      • The Microsoft Excel Input step does not continue reading rows after it reaches the row containing the Null value. (BOXED in PURPLE)



      NOTE: These issues DO NOT occur if he following change is made: 

      Microsoft Excel Input step NOT using Streaming. (BOXED in GREEN)


        Attachments

        1. 01 Stream Ticked.png
          01 Stream Ticked.png
          24 kB
        2. 02 Spreadsheet Type Streaming.png
          02 Spreadsheet Type Streaming.png
          30 kB
        3. 03 Stream Not Ticked.png
          03 Stream Not Ticked.png
          36 kB
        4. 04 Spreadsheet Type Not Streaming.png
          04 Spreadsheet Type Not Streaming.png
          30 kB
        5. PDI-18978.ktr
          24 kB
        6. read_excel_file_no_stream.ktr
          15 kB
        7. read_excel_file.ktr
          15 kB
        8. S101 Spreadsheet Format.png
          S101 Spreadsheet Format.png
          19 kB
        9. S102 Results.png
          S102 Results.png
          20 kB
        10. S201 Spreadsheet Format.png
          S201 Spreadsheet Format.png
          24 kB
        11. S202 Results.png
          S202 Results.png
          29 kB
        12. write_excel_file_1_no_stream.ktr
          40 kB
        13. write_excel_file_1.ktr
          40 kB
        14. write_excel_file_2_no_stream.ktr
          40 kB
        15. write_excel_file_2.ktr
          40 kB

          Issue Links

            Activity

              People

              Assignee:
              ana.mendes Ana Rita Clemente Mendes
              Reporter:
              singram Stuart Ingram
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: