mclcm(1)                        USER COMMANDS                         mclcm(1)



  NAME
      mclcm - hierarchical clustering of graphs with mcl

  SYNOPSIS
      mclcm <-|fname> [mclcm-options] [-- "mcl options"*]

      mclcm <-|fname> -a "-I 4 --shadow=vl"
      mclcm <-|fname> -a "-I 3" -- "-I 5"
      mclcm <-|fname> -a "-I 3" -b1 "" -- "-ph 3 --shadow=vl -I 5"

      mclcm  <-fname>  [--contract  (contraction  mode)] [--dispatch (dispatch
      mode)] [--integrate (integrate mode)] [--subcluster  (subcluster  mode)]
      [-a  <opts>  (shared  mcl  options)]  [-b1  <opts> (dedicated base 1 mcl
      options)] [-b2 <opts> (dedicated base 2 mcl options)] [-c <fname> (input
      clustering)]  [-n <num> (iteration limit)] [--root (ensure universe root
      clustering)] [-cone <fname> (nested cluster stack file)] [-stack <fname>
      (expanded cluster stack file)] [-coarse <fname> (coarsened graphs file)]
      [-write  (stack,cone,coarse,steps)]  [-write-base  <fname>  (write  base
      matrix)]  [-stem  <str>  (prefix  for  all outputs)] [--mplex=y/n (write
      clusterings separately)] [-annot  str  (dummy  annotation  option)]  [-h
      (print  synopsis,  exit)]  [--apropos (print synopsis, exit)] [--version
      (print version, exit)] [-q spec (log levels)] [-z (show  default  shared
      options)] [-- "mcl options"*]

  DESCRIPTION
      The  mclcm  options  may  be followed by a number of trailing arguments.
      The trailing arguments should be separated from the mclcm options by the
      separator  --.   Normally each trailing argument should consist of a set
      of zero, one, or more mcl arguments enclosed in quotes or double  quotes
      to  group  them  together.  These arguments are passed to the successive
      stages of hierarchical clustering. They are combined  with  the  default
      options.  If an option is specified both in the default options list and
      in a trailing options list the latter specification overrides  the  for-
      mer.   When  the  --integrate option is specified the trailing arguments
      must be names of files containing mcl clusterings;  see  further  below.
      mclcm  has  four major modes of operation, namely contraction (default),
      integration, dispatch, and subcluster. Each mode is described  a  little
      below.  Note  though  that dispatch mode is not the best mode to use for
      hierarchical clustering. It is mostly useful to  generate  multiple  mcl
      clusterings in a single run.

      In  all  modes  mclcm  will generate a file, by default called mcl.cone.
      This is a representation of a hierarchical clustering that is particular
      to mcl. It can be converted to newick format like this:

               mcxdump -imx-tree mcl.cone --newick -o NEWICKFILE
         OR    mcxdump -imx-tree mcl.cone --newick -o NEWICKFILE -tab TABFILE

      In  the last example, TABFILE should be a file containing a mapping from
      mcl labels to application labels. Refer to mcxio(5) for more information
      about tab files and mcl input/ouput formats.

  OPTIONS
      --contract (repeated contraction mode)
         This  is  the  default mode of operation.  At each successive step of
         constructing the hierarchy on top of the first level mcl  clustering,
         mclcm  uses  a matrix derived from the input matrix and the last com-
         puted clustering to compute a contracted graph.  The contracted graph
         is  a  graph where the nodes represent the clusters of the last clus-
         tering. The matrix derived from the input graph that is used to  con-
         struct  the  contracted  graph  is  called  the base matrix. The base
         matrix can be either the start matrix or the expansion  matrix.   The
         start  matrix  is  the  input  matrix after transformations have been
         applied to it (if any).  The expansion matrix is the  first  expanded
         matrix of some mcl process applied to the input graph.

         By  default  the  base  matrix  is  constructed from either the start
         matrix or the expansion matrix obtained from the first  mcl  process.
         It is possible to use a start matrix derived from special purpose mcl
         transformation parameters (such as  -ph  and  -tf)  or  an  expansion
         matrix  derived  from a special purpose mcl process.  The -b1 and -b2
         parameters provide the interfaces to this functionality.

         You are advised to start with a high inflation value  for  the  input
         graph  and to use shadowing, e.g. include --shadow=vl in the -a argu-
         ment.  This generally leads to hierarchies that are better  balanced.
         Shadowing  is  a  transformation  where nodes are added to the graph,
         preventing relatively distant nodes from unwanted chaining.  For more
         information refer to the mcl manual.  The invocations in SYNOPSIS are
         a good starting point.

      --dispatch (different mcl processes)
         In this mode each trailing argument is specified as a set of  options
         to  pass to an mcl process. For each trailing argument an mcl process
         is thus computed. The set of resulting clusterings is integrated into
         a hierarchy.

      --integrate (existing clusterings)
         This  mode  is  similar to dispatch mode. The difference is that with
         this option mclcm simply integrates a set of already  existing  clus-
         terings.   Each trailing argument must be the name of a file contain-
         ing a clustering.  The set of clusterings  thus  specified  is  inte-
         grated into a hierarchy.

      --subcluster (repeated sub-clustering)
         In  this  mode  each  trailing argument specifies a set of options to
         pass to an mcl process. The second clustering process is  applied  to
         the graph of components induced by the first clustering, resulting in
         a further subdivision of  the  first  clustering.  This  approach  is
         repeated with each further trailing argument. With this approach, the
         first clustering will be the most coarse  clustering.  Hence,  subse-
         quent  trailing  arguments will typically specify increasingly higher
         inflation values, pre-inflation values, and optionally more stringent
         transformation parameters in order to achieve further subdivsions.

      -a <opts> (shared mcl options)
         Use  this to change and/or set the default mcl options for all itera-
         tions. Use quotes if necessary.  Example of usage: -a "-I 5".

      -b1 <opts> (dedicated base 1 mcl options)
         This will apply the mcl options opts to the input matrix. The result-
         ing  start  matrix  is  used  as the base matrix for constucting con-
         tracted graphs.

      -b2 <opts> (dedicated base 2 mcl options)
         This will apply the mcl options opts to the input matrix and  compute
         the  first  iterand  of  the  corresponding  mcl  process.  The first
         iterand, aka the expansion matrix, is used as  the  base  matrix  for
         constucting contracted graphs.

      -c <fname> (input clustering)
         The  hierarchical  clustering process will be kicked off by the clus-
         tering found in <fname>.

      -n <num> (iteration limit)
         This puts an upper bound to the number of contractions that  will  be
         performed.

      --root (ensure universe root clustering)
         In  case  the  graph  consists of different connected components, the
         last clustering computed by the mclcm process  will  correspond  with
         those  connected  components.  This  option simply adds an artificial
         clustering where all nodes have been joined into a single cluster.

      -cone <fname> (nested cluster stack file)
         File to write the nested cluster stack to.  The nested cluster  stack
         contains  a  sequence  of clusterings, each written as an MCL matrix.
         The first (bottom) clustering is a clustering of  the  nodes  in  the
         input  graph.  Each  subsequent  clustering is a clustering where the
         nodes are the clusters of the previous clustering.  mcxdump can  dump
         this  format  if the file name is given as the -imx-stack option. The
         explanation for the cone/stack discrepancy is simple. To mcxdump  the
         contents are simply a stack of matrices. It does not care whether the
         stack is cone shaped, cylindrical, or yet another shape.

      -stack <fname> (expanded cluster stack file)
         File to write the expanded cluster stack  to.  The  expanded  cluster
         stack is similar to the nested cluster stack except that each cluster
         lists all the nodes in the input graph it contains.  mcxdump can dump
         this format if the file name is given as the -imx-stack option.

      -coarse <fname> (coarsened graphs file)
         File  to  write  the sequence of coarsened graphs to. Each clustering
         induces a coarsened graph where the nodes represent the clusters  and
         an  edge  between  two  nodes represents the connectivity between the
         corresponding two clusters.  The  computation  of  this  connectivity
         takes into account all edges between the two clusters in in the orig-
         inal graph.

      -write <tag> (select output modes)
         Use this option to explictly specify all of the output types you want
         written  in  a  comma-separated  string. <tag> may contain any of the
         strings stack, cone, coarse, steps.  The current default is to  write
         all of these except coarse.  The latter dumps the intermediate coars-
         ened (aka contracted) graphs to a single file.

      -write-base <fname> (write base matrix)
         Write the base matrix to file.  This  can  be  useful  for  debugging
         expectations.

      -stem <str> (prefix for all outputs)
         All output files share the same prefix. The default is mcl and can be
         changed with this option.

      --mplex=y/n (write clusterings separately)
         If turned on each clustering is written in a separate file. The first
         clustering is written to the file <stem>.3 where <stem> is determined
         by the -stem option. For each  subsequent  clustering  the  index  is
         incremented by two, so clusterings are written to files for which the
         name ends with an odd index.

      -annot str (dummy annotation option)
         mclcm writes the command line with which it was invoked to the output
         file  (either of the cone or stack files). Use this option to include
         any additional information.  mclcm  does  nothing  with  this  option
         except copying it as just described.

      -q spec (log levels)
         Set the quiet level. Read tingea.log(7) for syntax and semantics.

      -z (show default shared options)
         Show  the default mcl options. These are used for each mcl invocation
         as successively applied to the input graph and succeeding  contracted
         graphs.

  AUTHOR
      Stijn van Dongen.

  SEE ALSO
      mclfamily(7)  for an overview of all the documentation and the utilities
      in the mcl family.



  mclcm 1.008, 08-312               7 Nov 2008                          mclcm(1)
