2015-01-21

LFI data analysis with FPC

In this post I describe my attempts to write a program for doing scientific data analysis on a High-Performance Cluster (HPC) using Free Pascal 2.6.4. The code is parallelized using MPI and is ~5 kLOC (thousands of lines of code) long.

Why Free Pascal?

This summer I had to implement a data analysis software for the forthcoming Planck/LFI polarization papers (due to appear in 2015). Since the tasks was quite orthogonal to what I had wrote before (in the sense that nobody else was going to help me in writing it, and I was not foreseeing the possibility of re-using old code), I felt free to decide to pick the language I wanted, instead of sticking with C++ as usual. My first thoughts went to Nim (see my post Experiments with Nimrod — meanwhile, the name has been changed into Nim), but at the time I feared it was too unstable for the task. (I must say that the Nim team has released version 0.10.2, which feels much more mature than any previous release, and they have announced that version 1.0 will soon be delivered. I’ll talk more about Nim in a post soon to be due.)

At the end I decided to give a try to Free Pascal, while waiting for Nim 1.0 to appear. The Pascal language has always been close to my heart: I started doing "serious" programming using Turbo Pascal 3.0, and I followed its evolution till Borland Pascal 7.0, which I never forgot (while I’m writing this, I’m gazing at a couple of Borland Pascal 7.0 manuals that are still standing on my bookshelf). What I most vividly remember of Borland Pascal 7.0 were two characteristics that C++ still lacks in 2015:

  1. A good module system. It’s incredible how Turbo Pascal got it right already in Turbo Pascal 4.0 (1987)!

  2. Amazing compilation times.

Free Pascal provides a very good level of compatibility with Borland Pascal, and it shares the two advantages I just mentioned above. Moreover, Free Pascal’s dialect is an improvement over Turbo Pascal, because it provides some modern language constructs (e.g., generics, classes, …: these were first introduced in the Pascal language by Borland Pascal’s successor, Delphi). Another big plus is its standard library, which is larger than C++ and includes a few useful functions:

  1. There is a proper Format function which is similar to Python’s format: it is incredible how plain C++ forces you either to install Boost Format or rely on weird stream functions like setw, setfill, setprecision, and so on.

  2. There is a class for parsing INI files (these are usually used for keeping configuration parameters, and I needed something similar for my project); a JSON library is included as well.

  3. I found particularly handy to have a set of SQLite bindings bundled with the compiler, because part of the data I needed for my calculations were kept in this format.

  4. There is also an open-source clone of the Turbo Vision library bundled with Free Pascal! (However, I did not use this.)

Characteristics of the code

Since the 2015 Planck data release hasn’t been published yet, I cannot reveal too many details about the code. Here is a short description of what it does:

  1. It must process a huge quantity of data (of the order of a terabyte, stored in thousands of files);

  2. It must be parallelized using MPI (because it is going to be run in a distributed memory environment);

  3. The mathematics behind it is not too complicated, it just uses the basic arithmetic operations.

  4. It must produce FITS maps in the Healpix format.

Bindings

Before starting writing the code, I had first to implement the following units:

  1. A wrapper to the CFITSIO library (the standard library used to read/write FITS files). This has not been hard, since it is extremely easy to write bindings to C libraries using Free Pascal;

  2. A set of functions to handle Healpix maps (it was not too difficult to write these functions in Pascal, rather than create a set of bindings);

  3. A wrapper to the MPI functions. This was the toughest part, as I discovered that different MPI libraries implement a few basic data structures using widely different memory representations (see this answer of mine on Stack Overflow). The compatibility with the MPI standard is typically granted by set of C macros. Therefore, instead of having straight bindings to C functions, I had to implement an intermediate layer of C code that abstracted the interface to MPI itself. (Free Pascal provides bindings to MPICH, which is an implementation of MPI. Unfortunately, the clusters I am using for support other MPI libraries that are too different for these bindings to be easily ported.)

Writing wrappers to C libraries is not difficult. Free Pascal implements the keyword external that directly binds a function in a static library to a Pascal name. For example, you just have to use the following definition to bind the ffopen name to a function with the same name in the library cfitsio (under Unix, the name of the library file will of course be libcfitsio.a):

function ffopen(out Ptr : Pointer;
                FileName : PChar;
                IoMode : CInt;
                var Status : Cint) : Cint; cdecl; external 'cfitsio';

Creating bindings to CFITSIO was one of the most boring parts. Since a FITS files can contain tables of data of various types (integers, single/double precision floating point numbers, booleans, strings…), for every function that accesses the rows of a table the CFITSIO library (a pure C library) provides a generic implementation of the same function that accept a void pointer plus an integer variable that specifies how to cast the pointer to something meaningful. For instance, the ffgcv function reads a number of rows from a file, and its prototype is the following:

int ffgcv(fitsfile * f,
          int datatype,
          int colnum,
          LONGLONG firstrow,
          LONGLONG firstelem,
          LONGLONG nelems,
          void * nullval,
          void * dest,
          int * anynull,
          int * status);

The interesting parts in this definition are the parameters datatype, nullval, and dest. The first one must be equal to one of the constants CSHORT, CLONG, CSTRING`…, and it specifies how to interpret `nullval (a pointer to a value to be used as default for those rows in the table which do not provide a value) and dest (which must point to an array of nelems elements that will be filled by ffgcv).

This kind of prototype is quite representative of the CFITSIO’s API: it is quite easy to make mistakes (e.g., by passing datatype=CLONG and making dest point to an array of short values), so I decided to make my bindings less error-prone. I implemented a number of ReadColumn functions that accepted as an input parameter different types of arrays (note: Free Pascal’s extended mode allows function overloading):

procedure ReadColumn(var F : TFitsFile;
                     ColumnNumber : Integer;
                     FirstRow : Int64;
                     FirstElement : Int64;
                     var Elements : Array of String);
procedure ReadColumn(var F : TFitsFile;
                     ColumnNumber : Integer;
                     FirstRow : Int64;
                     FirstElement : Int64;
                     var Elements : Array of Boolean);
procedure ReadColumn(var F : TFitsFile;
                     ColumnNumber : Integer;
                     FirstRow : Int64;
                     FirstElement : Int64;
                     var Elements : Array of Uint8);
{ etc. }

Of course, this has been really boring to do! I am not sure if I could have used Free Pascal’s preprocessor to ease this implementation, but I preferred not to dig into this. Here I wished Free Pascal had metaprogramming capabilities: this is something Lisp has provided since ages (see Seibel’s chapter about Common Lisp macros) and which new languages like Julia and Nim have wisely adopted. (To be fair, I think that allowing metaprogramming constructs in an existing language like Pascal is much harder that designing a new language with such constructs built-in.)

The code

Writing the code was really a smooth process. The compilation times were so fast that whenever I was unsure about a language feature/function syntax, I tried many variants of the code by quickly running many write/compile cycles instead of Googling around. It’s incredible how much your productivity can be boosted by negligible compilation times! (See below for some quantitative arguments.)

The code runs fast, probably as fast as C++, although I cannot provide a quantitative comparison as I do not have a C++ program which runs the same analysis for making measurements.

What I like about FP

Compilation times

As I mentioned before, the time required to compile code with Free Pascal is incredible, especially if you’re used with the speed of a typical C++ compiler (I use GCC). For instance, recompiling 5838 LOC requires 0.4 seconds on my laptop. (I used FPC’s -B flag to force the recompilation of everything: it is similar to the -B parameter in GNU Make.)

As a comparison, recompiling the legacy C++ code I was referring to above (10623 lines of code) requires 83 seconds. I’ll repeat it: 83 seconds. In this case the compilation speed is 127 LOC/sec, while Free Pascal in the example above was able to reach a speed of 14.6 kLOC/sec. Therefore, Free Pascal is two orders of magnitude faster than GCC in compiling my codes!

To understand why this happens, you should consider that my C++ code uses the C++ Standard Template Library and a few of the Boost libraries. Both make an heavy use of templates, so that they cannot be compiled separately but instead literally copied into each of my source file by the C++ preprocessor (the details are not exact, but this should provide a fair representation of what’s really happening). So the actual number of LOCs processed by GCC is far higher than the 10623 number I provided above, because I should add many thousands of lines taken from my /usr/include path. However, my point still holds. If I compile a program consisting of ~10000 LOCs, I can expect to wait ~100 times more if I use GCC than if I use Free Pascal.

You might object that GCC takes this huge amount of time only when it needs to recompile everything, and that this is not always the case. However, if I need to modify some header file that is included by many other source files in the project, this will trigger a recompilation of any of them, and the compilation times will be not much different. (The same would happen if I need to change some compilation switch.) I checked this by touching just one file in my C++ project and triggering a make (which therefore recompiled just the touched source file and rebuilt the executable). There are 34 .cpp files and 38 .hpp (header) files, and I iterated this touch-and-recompile loop over all of them. The results are the following:

  1. If one .cpp file is modified, my computer takes on average 3.7 seconds to rebuild the executable, with a maximum of 7.8 seconds and a minimum of 0.8 seconds.

  2. If one .hpp file is modified, things get obviously worse. On average the compilation time is 10.7 seconds, and the extremes are 2.8 seconds and 21 seconds.

Given these numbers, the 0.4 seconds required by FPC to recompile from scratch a 5 kLOC project are even more amazing!

Units

Part of the huge speed achieved by Free Pascal is due to its sound module system, which is directly derived from Turbo Pascal (which in turns inherited it from Modula-2). It may feel a bit outdated for Python users, but it is still much better than C++ archaic reliance on the cpp preprocessor.

Free Pascal programs are divided in source files called "units". The Free Pascal compiler is able to create a dependency tree of the units used by a program or by another unit, and it triggers recompilations only if needed. Thus, GNU Make is not strictly needed for producing an executable out of a set of source files: you just run fpc main.pas and the compiler will automatically recompile every module (unit) used by main.pas. Moreover, FPC is smart enough to recomple only those source files that are newer than the object file. (I still use a Makefile because I need to compile the MPI middleware I mentioned above, and I want to pass a number of flags to the fpc compiler; but it is an extremely simple and short Makefile.)

What I don’t like about FP

FPC is really great, but like everything else it has its shortcomings. In this section I will describe a couple of them.

Bad segmentation faults can still happen

The first thing I discovered while programming in Free Pascal is that "bad" segmentation faults can still happen! (By "bad" segmentation fault I mean one of those crashes that leave you with a SIGSEGV message on the terminal and no other clue about what triggered it.)

There are many reasons for this. One that is particularly subtle is the way static and dynamic arrays are implemented. A "static array" is a plain sequence of values that lie on adjacent memory locations and whose size is specified at compilation time, like the following:

var
    arr : Array[1..3] of Integer;

On the other side, the size of a "dynamic array" is decided at runtime. (It does not need to change during the execution of the program: the important point here is that before the program starts, the compiler has still no idea of which size will have). Here is an example:

var
    arr : Array of Integer;

begin
    SetLength(arr, 3);
    { ... }
end;

The former type of array is useful e.g. when you want to define a 3-D vector, as you already know that your vector will have 3 elements. But if you want to read a set of numbers from a file provided by the user, you surely need dynamic arrays.

Here comes the problem with my segmentation fault. I had some code which set the elements of an array to zero:

var
    arr : Array[1..10] of Integer;

begin
    FillChar(arr, 0, SizeOf(Integer) * Length(arr));
    { Now use "arr" }
end;

Then I changed the code so that the number of elements was decided at runtime:

var
    arr : Array of Integer;

begin
    SetLength(arr, getNumOfElementsFromSomewhereElse);
    FillChar(arr, 0, SizeOf(Integer) * Length(arr));
    { Bang! }
end;

After this change, the code crashed. The problem, I learned, is due to the fact that the internal representation of a dynamic array is a record (i.e., the analogous of a C struct type) with two elements, like in the following example:

type
    DynamicArray = record
       numOfElements : Integer;
       ptr : Pointer;
    end;

So, when I call FillChar on a static array, the address of the actual array is passed to the procedure. But if the array is dynamic, FillChar gets the address to the record and overwrites numOfElements, ptr, plus a number of bytes that follow them!

Once I discovered the cause of this bug, it was easy to fix the code. Just pass the first element of the array to FillChar:

{ This works both for static and dynamic arrays }
FillChar(arr[0], 0, SizeOf(Integer) * Length(arr));

The point is that the compiler does not produce any warning/hint if you forget the [0] and pass a dynamic array as its first parameter. And the program will crash without saying you what line caused the problem…

No advanced programming features

The language has improved a lot since Borland Pascal 7, but it is still lacking some important features. One of the C++ features I missed most is the lack of generic procedures/functions/records: Free Pascal 2.6.4 still supports only generic classes. (A class is not very different from a record, but the latter can be allocated on the stack, while a class always requires to be allocated on the heap — and properly freed when it is no longer needed.)

In the code I wrote I had these two structures:

type
   IntRange = record
       begin, end : Integer;
   end;

   DoubleRange = record
       begin, end : Double;
   end;

They are equivalent, yet there in FPC 2.6.4 is no mechanism to define them using one generic construct.

Another thing I missed is the lack of metaprogramming facilities. (See above.)

Languages like C++ allow to define variables wherever you want, and the newest ones also have some form of type-inference. Even Ada (which is clearly derived from Pascal) has a limited type-inference in the variables used in for loops; in the following example, the Elem variable is declared within the body of a for loop, and the variable i must not be declared in advance because the compiler already infers its type:

-- This is written in Ada
procedure LoopOverArr is
begin
    for i in Arr'Range loop
        declare
            Elem : Arr'Type := Arr (i);
        begin
           -- "Elem" is the i-th element of the array Arr
        end;
    end loop;
end LoopOverArr;

Unfortunately, Free Pascal does not provide any of these, and the loop above must be implemented in the following way:

procedure LoopOverArr;
var
    i : Integer;
    Elem : ArrType;
begin
    for i := Low(Arr) to High(Arr) do
    begin
        { ... }
    end;
end;

This does not look particularly annoying for short procedures like the one above. But if your procedure body is longer than a few lines, having the declaration of a variable widely separated from the place you’re actually using it affects readability.

Weirdness in the standard library

One thing that struck me is the lack of a sorting function in the standard library. The same applies to other handy algorithms as well (binary searches, looking for the minimum/maximum value in an array of arbitrary type…). I figure the motivation is the lack of support for generic procedures, but still it is quite amazing that in 2015 a language does not provide such facilities and forces you to roll your own implementation.

Another minor point is that a few low-level functions do not define their arguments as you would expect. One nice feature of Free Pascal is the ability to qualify a procedure argument using in, var, or out:

  1. A in parameter is an input parameter which cannot be modified by the procedure.

  2. A var parameter is a parameter that is modified by the procedure (and it is therefore internally passed as a reference, much like C++ & parameters);

  3. Finally, a out parameter is like a var parameter, but there is no need to initialize it before calling the procedure, because it is assumed that the initialization will be done by the procedure itself.

The problem I faced is the fact that a few low-level functions like fillchar, move, and other similar procedures do not make proper use of such qualifiers, leading the compiler to produce strange messages. Consider this example:

program FillCharDemo;

{$mode objfpc}

procedure ZeroArray(out arr : array of Double);
begin
    FillChar(arr[Low(arr)], 0, Length(arr) * SizeOf(Double));
end;

var
    arr : Array[0..10] of Double;

begin
    ZeroArray(arr);
end.

Compiling this code using the -vh flag (show hints) produces the following hint:

$ fpc -vh fillchar_demo.pas
Hint: Start of reading config file /etc/fpc.cfg
Hint: End of reading config file /etc/fpc.cfg
Free Pascal Compiler version 2.6.4+dfsg-3 [2014/07/13] for x86_64
Copyright (c) 1993-2014 by Florian Klaempfl and others
Target OS: Linux for x86-64
Compiling fillchar_demo.pas
fillchar_demo.pas(7,14) Hint: Variable "arr" does not seem to be initialized
Linking fillchar_demo
/usr/bin/ld.bfd: warning: link.res contains output sections; did you forget -T?
16 lines compiled, 0.1 sec
3 hint(s) issued

The message Variable "arr" does not seem to be initialized refers to the fact that within the ZeroArray we have used the arr variable without it being properly initialized before. But the purpose of FillChar is to initialize every byte of the first parameter to some value! The problem is that the prototype of the FillChar procedure marks the first argument as var instead of out (had been the latter, no hint would have been produced). There are motivation for this, but the basic fact is that almost every time you use functions like FillChar and Move you will get useless hints.

Possible workarounds are either to ignore hints (by avoiding the -vh flag) or to suppress hints within the body of ZeroArray:

procedure ZeroArray(out arr : array of Double);
begin
    {$hints off}
    FillChar(arr[Low(arr)], 0, Length(arr) * SizeOf(Double));
    {$hints on}
end;

None of these is completely satisfying, IMHO. But, as I said before, this is not a huge problem, once you know a workaround.

Conclusions

My overall experience with Free Pascal is positive. Its amazing compilation times and good quality of the code are a clear winner over GCC. I don’t regret having used it for this new project of mine.

There are a few quirks in the language and in its standard library that might well be improved. But I understand that it is difficult to implement such changes in the code when one of its selling features is a very high compatibility with legacy systems (Free Pascal is able to compile painlessly code written for Borland Turbo Pascal for DOS!).

The number of libraries available does not match C, and it couldn’t have been so. However, the amount of work required to create bindings to C libraries is quite limited, as Pascal is not that much different from C in terms of data type representation and program logic. (It is surely easier than writing bindings for languages like, e.g., Ocaml or Haskell.)

It is interesting to see what the next releases will bring us. Support for Android and for the JVM is being prepared.

Edit: As explained by TimJYoung on HackerNews, my explanation of the reason why FillChar crashed when I passed a dynamic array to it is wrong. What really happens is that the dynamic array is truly a pointer to its first element. However, FillChar does not follow the pointer, but instead sets it and all the following bytes to zero. Also, I would like to mention that generic procedures and functions are going to be incorporated into Free Pascal (unfortunately, they have not been incorporated in the new 3.0.0 release).

Edit: I released the source code here: https://github.com/ziotom78/phi_d_phi_sky. Enjoy!