About
The VHIST
project defines a file format designed to document
data-processing workflows. VHIST
allows you to store all information
about individual steps of your workflow within one file – the complete
history of your processed data (including all branches in the workflow) is
stored at one single location. VHIST
has several distinct features:
- PDF Compatible
The
VHIST
file format is compatible to the PDF file format. Therefore, you can view yourVHIST
files on any platform that ships a PDF browser - your Windows PC, your Mac laptop, your Unix workstation or even your smartphone or tablet. Most PDF browsers support extracting embedded data stored in aVHIST
file.Sending a
VHIST
file to somebody who is not familiar withVHIST
is trouble-free. Simply rename the file extension to .pdf and the person will immidiately know what to do with the file. There is no need to download and install additional software if you just want to view oneVHIST
file. - Machine Readable
In addition to the human-readable PDF representation,
VHIST
files contain all user-supplied information as structured XML data. It is easy to automatically extract this information from aVHIST
file for automated processing or validation. - Can Store Arbitrary Binary Data with Meta-Data
VHIST
allows you to store any kind of data or meta-data, either in the form of key-value pairs or as files embedded inside theVHIST
file. You can embed any type of file in aVHIST
file: raw data, images, log files, configuration files of even otherVHIST
files. The choice of format is only up to you and your needs.You can store files in compressed form to reduce the required space. If you do not want to embed a file into a
VHIST
document for whatever reasons, you can still letVHIST
record meta-information about the file (such as location, filesize, MD5 fingerprint and date of last modification). - Incremental History
You can create
VHIST
files incrementally. If your workflow is composed of three individual steps, you can create a newVHIST
document for the first workflow step and extend it twice for the following two steps.Each workflow step will be embedded as an individual section within one
VHIST
document. New sections are strictly appended to the end of the document. Old data is never modified and there is no risk of accidentially altering or deleting preceeding steps. - Easy to Integrate into Existing Workflows
VHIST
files are created using the command line programvhistadd
. You can specify arguments, options and files either on the command line or as platform independent "argument files". If your workflow allows you to add command line calls in between individual workflow steps, you can integrateVHIST
into your workflow.We also provide libraries to easily create
VHIST
files from within C++, Python or Matlab programs without the need to directly use the command line. More interfaces for other programming languages will follow in the future.Last but not least, we provide a commandline tool
vhistify
that automates the creation ofVHIST
files for a wide range of applications and scripts . Documenting the Python call$ python myscript.py
can be as simple as
$ vhistify --plugins=python python myscript.py
- Self-Describing and Simple to Parse
VHIST
files contain special markers, which can be used to extract embedded files and meta-information without the need to know anything about the PDF file format. A small Python program to extract all data from aVHIST
file is included at the beginning of eachVHIST
file and can be extracted using any plain text editor. - Validation at Several Levels
VHIST
files contain checksums for each embedded file and each section. You can verify the validity of aVHIST
file and the embedded files with these checksums. Each file and each section is independant of all other files and sections - one corrupt section does not invalidate theVHIST
file as a whole. - Cross-Platform
You can view
VHIST
files on all platforms that provide a PDF browser. Moreover, our tools for creating, finding, viewing and validatingVHIST
files are available on all major desktop platform (Windows, Mac OS X and Linux). - Open Source
The
VHIST
documentation as well as the reference implementation are subject to the GNU Lesser General Public License (Version 2.1). You can use and distributeVHIST
free of charge. Since the source code of the reference implementation is freely available, you can adjust the tools to your liking, integrateVHIST
into your own programs or even contribute back bugfixes or new features to theVHIST
project. - Installers for Different Platforms