Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-8971

Pentaho MapReduce Input Path split on comma breaks set globs in org.apache.hadoop.fs.GlobFilter

    XMLWordPrintable

    Details

    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Operating System/s:
      CentOS 5.x

      Description

      the split on comma:
      https://github.com/pentaho/big-data-plugin/blob/master/src/org/pentaho/di/job/entries/hadooptransjobexecutor/JobEntryHadoopTransJobExecutor.java#L689

      an input path with a glob containing a comma:
      /input/

      {201210,201211}

      will be split on the comma resulting in two paths with incomplete glob patterns throwing IOException:

      ERROR 27-11 23:19:54,573 - PriviledgedActionException as:tlynch (auth:SIMPLE) cause:java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or } for glob

      {201210 at 7 java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or }

      for glob

      {201210 at 7 at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1234) at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1219) at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:1137) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1057) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1015) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:174) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:977) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:969) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807) at org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(JobEntryHadoopTransJobExecutor.java:874) at org.pentaho.di.job.Job.execute(Job.java:528) at org.pentaho.di.job.Job.execute(Job.java:667) at org.pentaho.di.job.Job.execute(Job.java:667) at org.pentaho.di.job.Job.execute(Job.java:667) at org.pentaho.di.job.Job.execute(Job.java:667) at org.pentaho.di.job.Job.execute(Job.java:393) at org.pentaho.di.job.Job.run(Job.java:313) ERROR 27-11 23:19:54,577 - Pentaho Map NO Reduce 2 - Illegal file pattern: Expecting set closure character or end of range, or }

      for glob

      {201210 at 7 ERROR 27-11 23:19:54,578 - Pentaho Map NO Reduce 2 - java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or }

      for glob {201210 at 7
      at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1234)
      at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1219)
      at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:1137)
      at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1057)
      at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1015)
      at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:174)
      at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205)
      at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:977)
      at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:969)
      at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
      at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
      at org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(JobEntryHadoopTransJobExecutor.java:874)
      at org.pentaho.di.job.Job.execute(Job.java:528)
      at org.pentaho.di.job.Job.execute(Job.java:667)
      at org.pentaho.di.job.Job.execute(Job.java:667)
      at org.pentaho.di.job.Job.execute(Job.java:667)
      at org.pentaho.di.job.Job.execute(Job.java:667)
      at org.pentaho.di.job.Job.execute(Job.java:393)
      at org.pentaho.di.job.Job.run(Job.java:313)

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            tlynchpin Tim Lynch
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: