Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-19373

Avro Output Step: Generates corrupt Avro file when connected to repository from Spoon.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Severity: Urgent
    • Resolution: Unresolved
    • Affects Version/s: 9.1.0 GA, 8.3.0.0, 9.2.0 GA
    • Fix Version/s: Backlog
    • Component/s: Step
    • Labels:
      None
    • Story Points:
      0
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      ROVER
    • Steps to Reproduce:
      Hide
      1. Do a fresh installation of Pentaho 9.2 Server and Spoon
      2. Connect to the repository from Spoon and upload the attached sample transformation AvroExportSample.ktr
        NOTE: Make changes to the connection used in the Table Input step and changes to the paths used in the Avro Output step.
      3. Run the transformation.
      4. Try to create a new table by uploading the Avro file to Google BigQuery.
      Show
      Do a fresh installation of Pentaho 9.2 Server and Spoon Connect to the repository from Spoon and upload the attached sample transformation AvroExportSample.ktr NOTE: Make changes to the connection used in the Table Input step and changes to the paths used in the Avro Output step. Run the transformation. Try to create a new table by uploading the Avro file to Google BigQuery.

      Description

      We are trying to export the data from a table to a Avro file using a transformation stored in the repository. This Avro file is then used to load the table in Google BigQuery.
      When trying to upload the Avro file into the Google BigQuery table we are getting the below error:

      Error while reading data, error message: The Apache Avro library failed to parse the header with the following error: 
      Unexpected type for default value. Expected null, but found string: ""
      

      We tried uploading the data to Google BigQuery using the bq utility and also from the bigquery console and we get the same error.

      But when we run the same transformation without connecting to the repository then we are able to successfully upload the Avro file to the Google BigQuery tables.

      I am attaching both versions of the files for review.
      Avro generated from local Spoon execution avroexport_from_client.avro shows the fields definition as below:

      details","fields":[{"name":"inst_id","type":["null","string"]},{"name":"obj_type","type":["null","string"]},{"name":"actor","type":["null","string"]}]
      

      Avro generated from Spoon when connected to repository avroexport_from_server.avro shows fields the *default attribute:*

      details","fields":[{"name":"inst_id","type":["null","string"],"default":""},{"name":"obj_type","type":["null","string"],"default":""},{"name":"actor","type":["null","string"],"default":""}]
      

        Attachments

        1. avroexport_from_client.avro
          0.8 kB
        2. avroexport_from_client.avsc
          0.3 kB
        3. avroexport_from_server.avro
          0.8 kB
        4. avroexport_from_server.avsc
          0.4 kB
        5. AvroExportSample.ktr
          18 kB
        6. AvroFile_Error.PNG
          AvroFile_Error.PNG
          114 kB

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            gdev Gurudev
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated: