Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-18489

When using the Excel Input Step results in an exception when the compression ratio is too high for an xlsx file

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Severity: Low
    • Resolution: Fixed
    • Affects Version/s: Master, 8.3.0.5 GA
    • Fix Version/s: 9.1.0 GA
    • Component/s: None
    • Environment:
      Ubuntu 16.04 ; Windows 10
    • Story Points:
      5
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      BB-8
    • Operating System/s:
      Windows 10
    • Steps to Reproduce:
      Hide

      1. Download the ktr and excel file
      2. Run the transformation

      Show
      1. Download the ktr and excel file 2. Run the transformation

      Description

      In Pentaho version 8.3.0.5 and above, using the Excel Input step can result in an exception if the compression ratio is too high for an xlsx file. We are having a security feature that tries to determine if the xlsx file is a zip bomb. ~

      Error :

      Caused by: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.
      This may indicate that the file is used to inflate memory usage and thus could pose a security risk.
      You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.

      Full Log :

      2019/11/29 14:31:24 - Microsoft Excel Input.0 - ERROR (version 9.0.0.0-335, build 9.0.0.0-335 from 2019-11-28 11.22.16 by buildguy) : Error processing row from Excel file [C:\tc001\PDI-16942\Balance_Type_Codes.xlsx] : org.pentaho.di.core.exception.KettleException: 2019/11/29 14:31:24 - Microsoft Excel Input.0 - ERROR (version 9.0.0.0-335, build 9.0.0.0-335 from 2019-11-28 11.22.16 by buildguy) : Error processing row from Excel file [C:\tc001\PDI-16942\Balance_Type_Codes.xlsx] : org.pentaho.di.core.exception.KettleException: 2019/11/29 14:31:24 - Microsoft Excel Input.0 - org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException2019/11/29 14:31:24 - Microsoft Excel Input.0 - java.lang.reflect.InvocationTargetException2019/11/29 14:31:24 - Microsoft Excel Input.0 - ERROR (version 9.0.0.0-335, build 9.0.0.0-335 from 2019-11-28 11.22.16 by buildguy) : org.pentaho.di.core.exception.KettleException: 2019/11/29 14:31:24 - Microsoft Excel Input.0 - org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException2019/11/29 14:31:24 - Microsoft Excel Input.0 - java.lang.reflect.InvocationTargetException2019/11/29 14:31:24 - Microsoft Excel Input.0 - 2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.pentaho.di.trans.steps.excelinput.poi.PoiWorkbook.<init>(PoiWorkbook.java:89)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.pentaho.di.trans.steps.excelinput.WorkbookFactory.getWorkbook(WorkbookFactory.java:46)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.pentaho.di.trans.steps.excelinput.ExcelInput.getRowFromWorkbooks(ExcelInput.java:546)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.pentaho.di.trans.steps.excelinput.ExcelInput.processRow(ExcelInput.java:426)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at java.lang.Thread.run(Thread.java:748)2019/11/29 14:31:24 - Microsoft Excel Input.0 - Caused by: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.POIXMLFactory.createDocumentPart(POIXMLFactory.java:63)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:580)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:165)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:270)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:266)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:226)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.pentaho.di.trans.steps.excelinput.poi.PoiWorkbook.<init>(PoiWorkbook.java:78)2019/11/29 14:31:24 - Microsoft Excel Input.0 - ... 5 more2019/11/29 14:31:24 - Microsoft Excel Input.0 - Caused by: java.lang.reflect.InvocationTargetException2019/11/29 14:31:24 - Microsoft Excel Input.0 - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at java.lang.reflect.Constructor.newInstance(Constructor.java:423)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:56)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.POIXMLFactory.createDocumentPart(POIXMLFactory.java:60)2019/11/29 14:31:24 - Microsoft Excel Input.0 - ... 11 more2019/11/29 14:31:24 - Microsoft Excel Input.0 - Caused by: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.This may indicate that the file is used to inflate memory usage and thus could pose a security risk.You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.Counter: 4916901, cis.counter: 49152, ratio: 0.009996540503866155Limits: MIN_INFLATE_RATIO: 0.012019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.advance(ZipSecureFile.java:268)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.read(ZipSecureFile.java:222)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:140)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:163)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.openxmlformats.schemas.spreadsheetml.x2006.main.StyleSheetDocument$Factory.parse(Unknown Source)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.xssf.model.StylesTable.readFrom(StylesTable.java:192)2019/11/29 14:31:24 - Microsoft Excel Input.0 - at org.apache.poi.xssf.model.StylesTable.<init>(StylesTable.java:141)2019/11/29 14:31:24 - Microsoft Excel Input.0 - ... 17 more2019/11/29 14:31:24 - Microsoft Excel Input.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=1)2019/11/29 14:31:24 - test - ERROR (version 9.0.0.0-335, build 9.0.0.0-335 from 2019-11-28 11.22.16 by buildguy) : Errors detected!2019/11/29 14:31:24 - Spoon - The transformation has finished!Unable to render embedded object: File (31:24 - test - ERROR (version 9.0.0.0-335, build 9.0.0.0-335 from 2019-11-28 11.22.16 by buildguy) : Errors detected!2019/11/29 14:31:24 - test - ERROR (version 9.0.0.0-335, build 9.0.0.0-335 from 2019-11-28 11.22.16 by buildguy) : Errors detected!2019/11/29 14:31:24 - test - Transformation detected one or more steps with errors.2019/11/29 14:31:24 - test - Transformation is killing the other steps) not found.
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              rsarreira Ronye Sarreira
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: