Uploaded image for project: 'Pentaho Data Integration - Kettle'
  1. Pentaho Data Integration - Kettle
  2. PDI-18430

Parallel job entry execution can result in NPE

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Severity: High
    • Resolution: Fixed
    • Affects Version/s: 8.3.0.3 GA
    • Fix Version/s: 9.1.0 GA
    • Component/s: Core (Engine), Job
    • Labels:
    • Story Points:
      0
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Sprint Team:
      Tatooine (Maint)

      Description

      Attaching a job and trans which can be used for repro. The trans will run the job repeatedly, writing any failures to the file 'concurrency-results.txt' . On my environment between 1:250 and 1:1000 executions will result in the NPE, with a stack trace similar to the one below.

      019/10/22 14:32:11 - concurrencybug - ERROR (version 9.0.0.0-SNAPSHOT, build 9.0.0.0-SNAPSHOT from 2019-10-02 03.44.37 by buildguy) : java.lang.NullPointerException
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.osgi.blueprint.collection.utils.ServiceMap.getItem(ServiceMap.java:35)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.osgi.metastore.locator.api.impl.MetastoreLocatorImpl.getExplicitMetastore(MetastoreLocatorImpl.java:39)
      2019/10/22 14:32:11 - concurrencybug -  at Proxy5d98fb64_2967_47c1_94cc_a0f856d65dbe.getExplicitMetastore(Unknown Source)
      2019/10/22 14:32:11 - concurrencybug -  at Proxya80a7770_0316_4cb3_a27f_206c4aeba54e.getExplicitMetastore(Unknown Source)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.big.data.impl.vfs.hdfs.nc.NamedClusterProvider.closeFileSystem(NamedClusterProvider.java:196)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.closeEmbeddedFileSystem(ConcurrentFileSystemManager.java:192)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.core.vfs.KettleVFS.closeEmbeddedFileSystem(KettleVFS.java:566)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.base.AbstractMeta.disposeEmbeddedMetastoreProvider(AbstractMeta.java:2054)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.job.Job.execute(Job.java:632)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.job.Job.access$000(Job.java:121)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.job.Job$1.run(Job.java:804)
      2019/10/22 14:32:11 - concurrencybug -  at java.lang.Thread.run(Thread.java:748)
      2019/10/22 14:32:11 - Hadoop copy files 6 2 2 - Processing row source File/folder source : [file:///tmp/file1.txt] ... destination file/folder : [file:///tmp/file2.txt]... wildcard : [null]
      2019/10/22 14:32:11 - concurrencybug - ERROR (version 9.0.0.0-SNAPSHOT, build 9.0.0.0-SNAPSHOT from 2019-10-02 03.44.37 by buildguy) : file:///Users/matcampbell/Downloads/concurrencybug.kjb : concurrencybug
      2019/10/22 14:32:11 - concurrencybug - ERROR (version 9.0.0.0-SNAPSHOT, build 9.0.0.0-SNAPSHOT from 2019-10-02 03.44.37 by buildguy) : org.pentaho.di.core.exception.KettleException: 
      2019/10/22 14:32:11 - concurrencybug - Unexpected error occurred while launching entry [Create a folder 2.0]
      2019/10/22 14:32:11 - concurrencybug -  at java.lang.Thread.run (Thread.java:748)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.job.Job$1.run (Job.java:804)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.job.Job.access$000 (Job.java:121)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.job.Job.execute (Job.java:632)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.base.AbstractMeta.disposeEmbeddedMetastoreProvider (AbstractMeta.java:2054)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.core.vfs.KettleVFS.closeEmbeddedFileSystem (KettleVFS.java:566)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.closeEmbeddedFileSystem (ConcurrentFileSystemManager.java:192)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.big.data.impl.vfs.hdfs.nc.NamedClusterProvider.closeFileSystem (NamedClusterProvider.java:196)
      2019/10/22 14:32:11 - concurrencybug -  at Proxya80a7770_0316_4cb3_a27f_206c4aeba54e.getExplicitMetastore (null:-1)
      2019/10/22 14:32:11 - concurrencybug -  at Proxy5d98fb64_2967_47c1_94cc_a0f856d65dbe.getExplicitMetastore (null:-1)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.osgi.metastore.locator.api.impl.MetastoreLocatorImpl.getExplicitMetastore (MetastoreLocatorImpl.java:39)
      2019/10/22 14:32:11 - concurrencybug -  at org.pentaho.osgi.blueprint.collection.utils.ServiceMap.getItem (ServiceMap.java:35)
      2019/10/22 14:32:11 - concurrencybug - 
      

      Jobs running entries with the "Run next entries in parallel" option have the potential to hit a threading issue. Parallel job entries are run as separate threads, and each thread will independently dispose of the metastore provider and then immediately set a new embedded metastore provider key (here).

      This can cause issues if one thread clears out a metastore provider key while another thread is in the process of retrieving it, on this line: MetastoreLocatorImpl

      That line is outside of a lock and involves a non-atomic operation.

      We should consider reworking the ServiceMap class to use ConcurrentHashMap for a cleaner and more reliable design. It's also worth rethinking how the embeddedmetastore is managed in Jobs, given that parallel job entry execution causes a set of different provider keys to be assigned to the same Job.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Juvenal.Ribeiro Juvenal Henrique Dias Ribeiro
              Reporter:
              mcampbell Matt Campbell
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: