API

scConstruct

  \scInstance <- [
                    \scInstance           : handle,
                    {[#:key, $:value]}    : data,
                    $                     : linkage,
                    $                     : distance,
                    {[$:event, &:weight]} : eventWeights
                 ]: scConstruct;
	

This function initializes the clustering plugin if called with an empty handle parameter (scNoInstance). If called with a non-empty handle parameter, the plugin will be initialized from an on-disk cache.

The data to be clustered needs to be passed as a set of numbered strings, eg.:

  {[1, "123321"], [2, "122311"]}
	

Note that the numbers (keys) have to be consecutive and starting with 1.

The linkage parameter describes the type of linkage used by the algorithm. The allowed values are: "single", "maximum", "average".

The distance parameter describes the distance function to be used. The allowed values are: "levenshtein", "damereau", "journey", "event_histogram".

The eventWeights parameter contains the event weights which have effect on some distance measures. Each event type should be coded as a single-letter string, and weights should be real numbers. For example:

  {["1", 1.0], ["2", 0.5], ["3", 10.0]}
	

You can also pass an empty set ({}) as event weights. In this case, all events will have weight 1.0.

scGetClusters

  {[#:cluster, #:key]} <- [\scInstance, #:nCluster] : scGetClusters;
	

Cluster the data in nCluster clusters and return cluster number for each data element (key). If nCluster has a special value of scBestNClusters the algorithm will choose the best number of clusters for this dataset, based on silhouette analysis.

scGetMedoids

  {[#:cluster, #:key]} <- [\scInstance, #:nCluster] : scGetMedoids;
	

Cluster the data in nCluster clusters and return the key of the medoid of each cluster. If nCluster has a special value of scBestNClusters the algorithm will choose the best number of clusters for this dataset, based on silhouette analysis.

scGetBestNClusters

  {#} <- \scInstance : scGetBestNClusters;
	

Return the 4 best numbers of clusters for this dataset. The first number in the array is always the best clustering.

scDrillDown

  \scInstance <- [\scInstance, #:nCluster, #:cluster] : scDrillDown;
	

Cluster the data in nCluster clusters and then drill down into the cluster cluster. From now on all above function calls will work only on the data from the selected cluster.

scUndrill

  \scInstance <- \scInstance : scUndrill;
	

Reset the drilling. From now on all above function calls will work on the complete dataset.

scGetDistanceTab

  {[#: key1, #: key2, &: dist]} <- \scInstance : scGetDistanceTab;
	

Return the distance table for the current dataset and distance measure;

scDestruct

  \scInstance <- \scInstance : scDestruct;
	

Destroy the clustering instance. Should be called at the end of a script that uses the library.