Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-17684

Excel Input step mishandles nulls using Excel 2007 XLSX (Apache POI Streaming)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Severity: Urgent
    • Resolution: Duplicate
    • Affects Version/s: 7.1.0.13 GA, 8.1.0.1 GA
    • Fix Version/s: Backlog
    • Component/s: Step
    • Labels:
      None
    • Story Points:
      0
    • PDI Sub-component:
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Tatooine (Maint)
    • Steps to Reproduce:
      Hide

      I have provided a sample KTR and Excel sheet in the linked ESR, but to recreate:

      1) Create a KTR with an Excel input using the Apache POI Streaming engine
      2) Create an Excel sheet with several columns and at least one row with one null field (not the last field).
      3) Modify the null cell's format (change it from general to text or change the alignment, etc)
      4) Read in that Excel sheet with the KTR and preview the data

      The null field will shift the data in that row down a field, duplicating the null in the process.

      Show
      I have provided a sample KTR and Excel sheet in the linked ESR, but to recreate: 1) Create a KTR with an Excel input using the Apache POI Streaming engine 2) Create an Excel sheet with several columns and at least one row with one null field (not the last field). 3) Modify the null cell's format (change it from general to text or change the alignment, etc) 4) Read in that Excel sheet with the KTR and preview the data The null field will shift the data in that row down a field, duplicating the null in the process.

      Description

      The Excel Input step using the Apache POI Streaming engine will shift the data in a row over a column when it encounters a null, duplicating the null as well.

      This only happens when cells have their format modified, such as a number category change, text alignment, font, etc.

      Also, setting a cell back to it's default formatting either manually or by Clear -> Clear Formats does not correct this. You can only resolve it by selecting the entire sheet and then selecting Clear -> Clear Formats.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              scosta Sandra Costa
              Reporter:
              bowens Brittany Owens
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: