Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-2607

Text File Input - 1 extra line read when using wrapped lines and 1 header line

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Severity: Urgent
    • Resolution: Fixed
    • Affects Version/s: 3.2.0 GA
    • Fix Version/s: 6.0.0 GA
    • Component/s: Step
    • Labels:
      None
    • PDI Sub-component:
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Maintenance
    • Operating System/s:
      Windows XP

      Description

      I am attempting to combine multiple lines of a file containing genomic data in the "fasta" form and load it into a Postgres database. Basically, the data looks like the following

      >chr1
      taaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta
      accctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaac
      cctaacccaaccctaaccctaaccctaaccctaaccctaaccctaacccc
      taaccctaaccctaaccctaaccctaacctaaccctaaccctaaccctaa
      ccctaaccctaaccctaaccctaaccctaacccctaaccctaaccctaaa

      etc. for many thousands of lines. The first line is skipped as a header line. I have tried setting wrap to various arbitrary numbers and it always appears that on the first set of lines, one extra line is read and added to the input. If the field size on for the data on the "fields" tabn is smaller than this result then the data is truncated. For example, if I set "Numer of times wrapped" to 20, and the field length is 1050 then the first record added to the database will actually contain data from 21 lines (lines 2 to 22 of the input file). Subsequent inserts into the database will contain data for only 20 lines, as expected (at the expected length of 1000 chars).

        Attachments

          Activity

            People

            Assignee:
            aliaksandr Aliaksandr Bialkevich (Inactive)
            Reporter:
            pdh Peter Hunsberger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0h
                0h
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 6.5h
                6.5h