THE RAZORBACK UTILITY LIBRARY


by András Aszódi
Novartis Forschungsinstitut GmbH
Brunnerstrasse 59
A-1235 Vienna, Austria, Europe

Version 2.0


Introduction

The Razorback library is hardly more than a haphazard collection of useful software modules that I accumulated over the years. This collection has been growing organically over the years and its composition reflects my scientific interests and the projects I have been working on in the Novartis Forschungsinstitut in Vienna since 1996. I do not pretend that the routines you will find here are optimal in any sense, but they do their job reasonably well. Feel free to use and modify them.
 

The Razorback collection consists of C and C++ sublibraries. Although you can use the C library in your C++ programs, most of the C routines have already been ported to the C++ library. These ports often contain significant enhancements. My recommendation is not to program in C unless absolutely necessary.

The sublibraries are documented separately. Follow the links below to get to the detailed documentation (generated by Doxygen).

C Libraries Overview

  1. Linear algebra: provides vector and matrix structures that attempt to make C a bit object-oriented. Only rectangular and lower triangular (symmetric) matrices are provided. All linear algebra routines operate on double-precision numbers. Solution of linear equations are provided.
  2. Statistics: estimation of distributions via histograms, simple one-way and two-way statistics, statistical tests, linear and nonlinear parameter estimation.
  3. Bioinformatics: this is just a buzzword here because no sequence-processing routines are offered. It is rather structural biology (another pompous buzzword): a PDB read/write module, a DSSP reader and the GOR-III secondary structure prediction algorithm are provided.
  4. Miscellaneous: everything else. File and directory manipulation, time stamps, command-line processing.

C++ Libraries Overview

  1. Linear algebra: vector and matrix classes. These are implemented as templates so that you can work e.g. with complex matrices if you wish. Most people would just use the double-precision instantiations. This is a very small collection of linear algebra routines, you would expect much more from a real package. In any case, you get linear equation solvers, SVD, real symmetric matrix diagonalisers (both QL all-eigenvalues/all-eigenvectors and a specialised class that can be used to get only subsets of eigenvalues and/or eigenvectors).
  2. Statistics: offers a fairly complete class hierarchy for representing distributions (both analytically known and empirically estimated). The parameter estimation classes enhance the capabilities of the C version considerably, in particular the SVD-based linear regression and the orthogonal polynomial fit are worth mentioning. A simple one-way ANOVA class is also provided.
  3. Bioinformatics: manipulation of macromolecular sequences. Can read and write a variety of formats, but there are no guarantees :-). A class representing multiple sequence alignments is also provided.
  4. POSIX thread wrappers: makes life with multithreaded programs easier. Provides thread launchers and a job queue that links "producer" and "consumer" threads. All POSIX synchronisation primitives with the exception of read/write locks are implemented. This is the least portable part of the Razorback library: first, the architecture must support POSIX threads, and second, the thread launcher checks the number of available CPUs on your system which is machine-dependent.
  5. Exceptions: these get thrown by the other sub-libraries if something went wrong. The Razorback library has its own exception hierarchy which is independent from the hierarchy defined by the C++ standard (see <stdexcept>). To catch Razorback exceptions, put a Utilsexc_& object reference in the catch argument.

  6. Miscellaneous: useful bits (such as a bit vector class, no pun intended :-) ) and pieces.


Licensing

This software is distributed as Novartis Open Source which essentially amounts to granting a license similar to the GPL. IMPORTANT: refer to the LICENSE file for the precise legal terms and conditions.

Implementation

Source

The library contains C and C++ modules. The C part is written in ANSI C, the C++ part is in something that is intended to be as close to ANSI C++ as possible. Given the generic nature of the libraries, no machine-dependent features are used.
 

Supported platforms


SGI

The Razorback Library was mostly developed on SGI machines. There is a bewildering variety of ABI and MIPS instruction set combinations: the  Makefiles are configured for the -n32 -mips3 ("irix32") and the -64 -mips4("irix64") ABIs. You would like to have at least an R4000 running IRIX 6.2 (IRIX 6.5 recommended). Compiling for lowlier architectures or old IRIX versions is theoretically possible but not recommended. The installation script requires an IRIX version >=6.2.

PC Linux

I develop actively for Linux since 1998. That was the year when the G++ 2.8.1 compiler became available, and it made the reliable and simple instantiation of C++ templates possible at last. That release also fixed a number of annoying bugs that had made C++ development under Linux anything but enjoyable before. The Razorback libraries can be compiled under any 2.x kernel, but the GNU compiler version must be at least version 2.95.

Alpha Tru64 UNIX

The Razorback library was ported to Tru64 UNIX V5.1, with the Compaq C++ compiler V6.3. This is the only C++ compiler I have access to that actually manages the ANSI C++ standard more or less. It is also quite picky and has often helped me finding bugs that went unnoticed under IRIX or Linux.
 

Other architectures

Porting the Razorback Library to another platform running a UNIX variant should be straightforward, especially if the GNU C/C++ compiler supports the platform. Ideally, only the platform-dependent Makefiles includes have to be changed to cope with the idiosyncracies of compiler and linker command-line arguments. The platform must support shared libraries and the POSIX threads.

I do not know if the library could be ported to non-UNIX architectures (e.g. Windows or MacOS) because I never had to do professional work on these platforms.


Installation

  1. Prerequisites: you will need a C and a C++ compiler.
  2. Obtain the Razorback source archive razorback_x.y.tar.gz where x and y are the major and minor version numbers. Create a razorback top directory (for example, "razorback") and copy the tarfile there. Extract the contents: I am not telling you how to do this :-) Your directory will now contain a directory called "razorback_x.y". Change to this directory "razorback/razorback_x.y" now. It should contain the following subdirectories:- admin, doc, include, cc, c and the files INSTALL, LICENSE, Makefile.
  3. Invoke the script admin/configure.sh. This is a Bourne shell script that figures out your architecture, and asks you a few questions. In particular, you have to specify an already existing directory <ARCHDIR>. In this directory a new subdirectory <ARCHDIR>/<ARCH>/lib will be created (where <ARCH> is your architecture such as "irix64") that contains the static libraries. The dynamic libraries are put into <ARCHDIR>/<ARCH>/lib/shared. The configure script generates a platform-specific file Makefile.defs that contains the macros needed for compilation and/or installation. Have a look at these to get an impression how difficult it is to write portable software...
  4. Next, compile the library by typing 'make compile'. On multiprocessor architectures that have a parallel make utility, the compilation will be done in parallel.
  5. Install the static libraries and dynamic shared objects (DSOs) to their final location defined in the configuration step by invoking 'make install'.  This step actually performs the compilation, too, if it was not done before.
  6. When compiling against the Razorback libraries, you should specify the header directory "razorback/razorback_x.y/include": this is not moved during installation. Additionally, specify the appropriate directory for the linker using the -L option.
  7. If you wish to link your programs against the Razorback DSOs, you also have to set your LD_LIBRARY_PATH environment variable accordingly. The installation script generates a C shell script <ARCH>/ldpath.csh that you can use for setting LD_LIBRARY_PATH: just source it from your .login file. Some combinations of the TC-shell and  the KDE desktop environment under Linux do not invoke .login which is positively silly: put the lines above in your .cshrc file instead as a workaround.
  8. The Razorback user guide is located under "razorback/razorback_x.y/doc/index.html". Bookmark it now!
You are done! Enjoy programming with the Razorback.


Programming

Header Files

Symbolic links to all header files can be found in the "razorback_x.y/include" directory. The C header files have all-lowercase names and a "*.h" extension (e.g. "svd.h"). The C++ header files always begin with an uppercase letter and have a "*.hh" extension (e.g. "Svd.hh"). Both the C and C++ libraries are wrapped into the namespace 'RazorBack', so if you link them against C++ code, do not forget to use either the 'using namespace RazorBack' directive or prefix each name from the library with 'RazorBack:: '. When compiling a program against the Razorback library, tell the compiler about the header file location using the "-Irazorback_x.y/include" switch.

Linking

Link statically against the C library by specifying the options "-L<ARCHDIR>/<ARCH>/lib -lrazorback" to the compiler. Similarly, link statically against the C++ library by using the switches "-L<ARCHDIR>/<ARCH>/lib -lRazorBack" (note the spelling!). If you want to link against the shared libraries, specify "-L<ARCHDIR>/<ARCH>/lib/shared -lrazorback" and "-L<ARCHDIR>/<ARCH>/lib/shared -lRazorBack" for the C and C++ libraries, respectively. On some architectures you might need other compiler options for dynamic linking, please consult the relevant manpages.

ANSI C++ issues

C++ template instantiation

This is a thorny issue as no compiler writer seems to get it right. General recommendation:- try to compile your programs as if you were using the "Borland model". The Razorback template declarations are in *.hh files, and the template definitions are in *.cc files. Both are available in the include directory. The template headers should include the template definition files: ALWAYS define the macro INCLUDE_TMPL_DEFS on the compiler command line. Additionally, switch off automatic template instantiation: on the SGI platform, use the switch -no_auto_include, on the Alpha use -noimplicit_include. Wonderfully enough, G++ under Linux needs no additional flags.

SGI warning:- the linker may have problems with templates instantiated more than once and may issue warnings that certain functions with horrible mangled names were defined twice. Ignore these.

Alpha C++ warning:- this compiler uses repositories for template instantiation. The Razorback C++ library has its repository under <ARCHDIR>/alpha/lib/cxx_repository. When linking, you have to tell the compiler where to look for these files: you have to specify the switches '-ptr ./cxx_repository -ptr <ARCHDIR>/alpha/lib/cxx_repository'. The first repository will contain the instantiation records for your program and must be writable, the second is the Razorback repository. If you link against other C++ libraries with templates, you have to list those repositories as well. See the Compaq C++ manual for the gory details :-(

Thread safety

The libraries are compiled so that they use the reentrant system routines and are POSIX thread-aware. One needs to set some mysterious flags before using the threads, here is the list without any explanations, you just have to take my word for it that it works :-)
  If you use these flags it may happen that some not strictly ANSI functions disappear from scope,  routines from the math library are the usual victims. You then have to play around with those, reading /usr/include/math.h and standards.h are recommended :-).

Cutting corners with Makefile.defs

All these horrendous things described above are taken care of by the macro settings in Makefile.defs that we used for compiling the libraries. Permission is hereby granted to steal most of the macro settings from this file :-). However: NEVER define -DRAZORBACK_LIB_COMPILE when linking against the Razorback. That flag is reserved for some internal tricks, most importantly for taking care of template instantiations within the library, and if it is defined then most probably you'll get linking errors.

Why Razorback?

Wild boar are cute and intelligent animals, and (despite of their bad reputation) they are quite friendly, too. If you meet them in the forest, have an apple ready for them. Avoid the sows with young piglets though -- they can get real paranoid.