Uploaded image for project: 'Pentaho Data Mining - Weka'
  1. Pentaho Data Mining - Weka
  2. DATAMINING-783

HierarchicalClustering FilteredDistance option crashing for data close to MaxDouble

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Severity: Unknown
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Weka packages
    • Labels:
      None
    • Environment:
      weka version: 3.9.4
    • Story Points:
      0
    • Notice:
      When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.
    • Steps to Reproduce:
      Hide

      Load [this dataset|https://user.informatik.uni-goettingen.de/~sherbol/MaxDouble.arff].

      Train clusterer with this dataset and the following parameters via buildClusterer:

      -P -A weka.core.FilteredDistance -R first-last -F "weka.filters.unsupervised.attribute.RandomProjection -N 10 -R 42 -D Sparse1" -D "weka.core.EuclideanDistance -R first-last" -L SINGLE -N 2

      Cluster all instances of the dataset with the trained clusterer using clusterInstance and distributionForInstance.

      Show
      Load [this dataset| https://user.informatik.uni-goettingen.de/~sherbol/MaxDouble.arff ]. Train clusterer with this dataset and the following parameters via buildClusterer: -P -A weka.core.FilteredDistance -R first-last -F "weka.filters.unsupervised.attribute.RandomProjection -N 10 -R 42 -D Sparse1" -D "weka.core.EuclideanDistance -R first-last" -L SINGLE -N 2 Cluster all instances of the dataset with the trained clusterer using clusterInstance and distributionForInstance.

      Description

      HierarchicalClustering seems to have a problem with clustering data close to MaxDouble (values >10^306). Apparently no cluster center seems to be assigned to be the closest to the same datapoints from the training data and it crashes with an IndexOutOfBoundsError when returning the cluster center with the initially set index -1:

      "java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 100
      at weka.clusterers.HierarchicalClusterer.clusterInstance(HierarchicalClusterer.java:846)
      at weka.clusterers.WEKA_HIERARCHICALCLUSTERER_AtomlTest.test_MaxDouble(WEKA_HIERARCHICALCLUSTERER_AtomlTest.java:4614)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.base/java.lang.reflect.Method.invoke(Method.java:566)
      at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
      at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
      at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
      at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      at java.base/java.lang.Thread.run(Thread.java:834)
      "

       

        Attachments

          Activity

            People

            Assignee:
            project admin Triage
            Reporter:
            thaar Tobias Haar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated: