../common/images/vh_36.pngVHIST 1.84.0

Vhistify Tutorial

Table of Contents

Introduction

In the field of medical image processing and evaluation, most workflows are comprised of several individual tools and scripts, which are combined into one complex pipeline. Since changing even one parameter (e.g. filter method, Matlab version, co-registration template file) can have a large influence on the result of the workflow, documenting all workflow steps is essential (Good Scientific Practice). Manual documentation is error prone, cumbersome and in some cases outright impossible (opaque relationships within and between complex software packages).

Vhistify is our attempt to create workflow documentations in an automated way. It is based on VHIST, a file format specifically designed to document workflows. VHIST files are self-contained and PDF-compatible - all information stored in a VHIST file is accessible from any PDF browser. However, VHIST files also contain structured information on each workflow step (embedded XML) suitable for automated processing.

Vhistify executes and monitors another program and gathers lots of information about the monitored process. This information includes:

  • The command line of the executed command
  • The hostname of the computer, on which the program was executed
  • The name of the user, who executed the command
  • The time, the program took to run
  • The initial working directory of the command
  • The return value of the program as well as the reason for termination (killed, segmentation fault, etc)
  • Standard output and error of the program
  • A list of files read and written by the program, including paths, filesizes and MD5 fingerprints

With the help of plugins, vhistify can also:

  • Infer version numbers of tools used
  • Collect meta-information for files of known file formats
  • Generate zip-archives, which contain the source code of a script.
  • Gather more detailed information about the machine used (version of Linux, versions of core libraries and programs, etc.)
  • Create preview images for plots or graphics
  • etc.

Prerequisites

Currently, vhistify only supports Linux systems. vhistify requires that strace and Python 2.6 or 2.7 are installed on your system. In some situations, you will need a recent version of strace (Matlab with the Parallel Computation toolbox does not work correctly with strace 4.5; strace 4.8 works well).

Typographic Notations

All examples in this tutorial are shell commands and are displayed in the following form:

# change into directory "some/directory"
$ cd some/directory
$ echo "hello world"
   hello world

The $-sign at the beginning of each line marks the shell prompt and does not belong to the command. Do not copy it when you type the command into the terminal! Lines starting with the #-sign are comments and can be left out when executing the commands. Where appropriate, we show the output of the command underneath the command itself, just like the line "hello world" in the example above. Output lines are shown in an indented way.

We assume that you use the bash shell. To find out which shell you use, open a terminal and enter

$ echo $SHELL
   /bin/bash

If you use another command line, you might have to adjust some of the commands.

Installation

On Debian, Ubuntu and derived distributions of Linux, you can download the .deb installer and install it on your system. vhistify is installed into the /opt/ directory and the installer will create a symbolic link to the vhistify executable in /opt/bin/. On other Linux distributions, download and unpack the tar archive. Make sure that the vhistify executable is in your path.

# export PATH="$PATH:/path/to/vhistify/"

If you want to use a custom version of strace with vhistify, you can create a symbolic link to the strace executable in the vhistify/ directory (right next to vhistify executable). Vhistify will prefer this version of strace over the strace binary in the unix path.

First Steps

This section shows how to "vhistify" a simple command-line call. On Linux, to copy a file input.txt to the file output.txt, we can write

# create a file input.txt, which contains the text "hello world"
$ echo "hello world" > input.txt 

# copy the file to the destination
$ cp input.txt output.txt

on the command-line. We can now document this operation with vhistify. First, remove the file again with the following command:

$ rm output.txt

Afterwards, repeat the cp command but insert the word vhistify at the beginning of the command-line:

# copy the file and monitor the copying with vhistify
$ vhistify cp input.txt output.txt

Vhistify will create a file output.txt.vhist in the same directory as output.txt. You can open the file with any PDF viewer (Preview on Mac OS X denies to open files with an extension other than .pdf. You therefore have to rename the file to output.txt.pdf). The VHIST file contains lots of information about the copying process. The entry "title", however, is empty as vhistify can not derive sensible titles on it's own (it does not know what the command cp does). You can add a title with the --title option:

# remove previous outputs
$ rm output.txt
$ rm output.txt.vhist

# rerun the copying process
$ vhistify --title="Copy a File"  cp input.txt output.txt

We have to remove the old VHIST file since vhistify will not overwrite existing VHIST files. Most often, this behaviour is what you want. You can, however, set the VHISTIFY_RENAME_OLD_VHISTFILES to true to change this behaviour. This option is very convenient when testing vhistify calls as you do not have to remove/rename old VHIST files by hand.

$ export VHISTIFY_RENAME_OLD_VHISTFILES=true
$ vhistify --title="Copy a File"  cp input.txt output.txt

If we run several vhistified commands in a row and an output file of one step is the input file of the next call, vhistify will use the VHIST file of the input as the base for the newly created VHIST file:

$ export VHISTIFY_RENAME_OLD_VHISTFILES=true
$ vhistify --title="Copy a File"  cp input.txt intermediate.txt
$ vhistify --title="Copy a File"  cp intermediate.txt output.txt

The VHIST file output.txt.vhist contains two sections, one for each of the two cp commands. If more than one input file is associated with a VHIST file, vhistify will append to one of the two and embed the other one inside the created VHIST file. If a command creates several output files, vhistify will create one VHIST file for each one of them. This way, the complete workflow is always documented within one file and you can inspect the creation history of every file by looking into the accompanied VHIST file.

There is much more to vhistify and VHIST than we can explain in this "first steps" section. You can find more information in the following sections and on http://www.nf.mpg.de/vhist. To get started with vhistify, the information in this section should get you pretty far.

Examples

Vhistify contains a directory examples/, which includes a number of vhistify demos. Most demos will run on any linux system. Some demos require special software, such as FSL, Matlab or SPM. Each example will abort with an error if it can not find one of the needed tools.

To run the demos, perform the following steps:

Commandline Options

You can view a short summary of all command line options by typing:

$ vhistify --help

You can also have a look at the vhistify Manual Page for a more detailed explanation of all options and environment variables.

Plugins

Vhistify supplies a large number of plugins, which you can use to enhance and augment vhistify's output. There are plugins for different scripting languages, plugins for tools or software compilations, plugins which handle how files are listed or embedded a VHIST files and so on. You can use the --list-plugins option to get a list of all plugins available. The --plugin-help options displays a detailed description of each plugin.

$ vhistify --list-plugins
   This is a list of all installed plugins:

   * spm
       Enhances the output of SPM jobs.

   * largefile
       Disables MD5 checksums for very large files and reads MD5 sums from
       .md5 files on the file system.

   * matlab
       Enhances the output of Matlab jobs.
   ....

$ vhistify --plugin-help largefile
   Documentation of plugin largefile:
       The "largefile" plugin disables the generation of MD5 checksums for
       files larger than 100 MB. If a file with the same name as the
   ....

To enable plugins, set the --plugins option:

$ vhistify --plugins=matlab,largefile ....

A number of plugins are active by default. These plugins are marked as [default] in the listing generated by --list-plugins.

Internal Structure of vhistify

For many purposes, it is sufficient to prepend vhistify to your command line and activate some plugins that seem to suit your needs in order to generate adequate VHIST files. In some cases, however, it is helpful to understand the internal structure of vhistify to debug problems and further improve the generated VHIST files.

Info: Per-plugin-operations are executed in the order, the plugins were specified on the command line or in the environment variables VHISTIFY_PLUGINS and VHISTIFY_DEFAULT_PLUGINS. Plugins in VHISTIFY_DEFAULT_PLUGINS are handled first.

Caveats and Things to Keep in Mind

(C) 2005-2013 Max Planck Institute for Neurological Research Cologne, Germany