2014-01-17

Experiments with Nimrod

In the last months I have studied with great interest three new (imperative) languages: Nimrod, Go, and Rust. I find them interesting because they promise to reach the same speeds as C++, but without some of its most annoying features. They all are less verbose than Ada, but at the same time retain some of its clarity and strictness. At the moment I am trying to use them in some small/medium-sized projects, in order to better understand what are their advantages and their limits.

In this post I am going to describe my experience with Nimrod. The language is still in development (the latest release is 0.9.2), but it seems it has reached a pretty stable state. It has a number of interesting features:

  • Its syntax is similar to C and Pascal, but it uses indentation as a way to delimit blocks of code (in the same way as Python does).
  • The performance of Nimrod programs is usually the same as their C counterparts.
  • It is a strongly typed language (i.e., you cannot sum integers and floats without using casting).
  • Large programs can be split into packages, which work pretty much like Python.
  • The language supports garbage collection, so that memory management is easier than in C.
  • It features a LISP-like macro system!
  • Like my old friend Chicken Scheme, it compiles sources to C and then invokes a C compiler. This means that writing bindings to C libraries is extremely easy.

I decided to use Nimrod to write a small program (~500 lines of code) to perform a simple task: determine at which time a celestial source in the sky was observed by the Planck/LFI radiometers. The process is quite easy to understand:

  1. The program reads the so-called "pointing information", which tell which sky direction each LFI horn was pointing at a certain time. It is a very large table (tens of GB), with four columns: the time of the observation, the ecliptic latitude, the ecliptic longitude, and the rotation of the antenna around the observing direction.
  2. Pointing information is divided into many files. There is one file per day, and I only need to know in which days an object was observed. Therefore, for all the observing directions contained in each file X, the program determines how far each of them is from the celestial source. If at least one of them is smaller than the angular resolution of the instrument, then the name of file X is saved in a list.
  3. When all the files have been read, print the list built during the process.

Let's see how I implemented this using Nimrod.

Basic trigonometry

Implementing the trigonometric functions used to calculate the angular distance between two positions was fairly trivial. To compute the angular distance between two directions I first converted each direction into a vector and then calculated their dot product. So I implemented two types: the first one represents an angular direction in ecliptic coordinates, and the other one represents a vector in a 3D space:

import math

type
  TAngularPosition* = tuple[colatitude, longitude: float] # In radians
  TVector* = tuple[x, y, z: float]

Then I overloaded the multiplication operator * in order to compute the dot product between two vectors:

proc `*` (v1, v2: TVector): float {. inline .} =
  return (v1.x * v2.x) + (v1.y * v2.y) + (v1.z * v2.z)

Note the use of {. inline .}, which avoids the cost of a function call.

Then I implemented a function to convert a TAngularPosition object into a TVector:

proc AngToVec*(pos: TAngularPosition): TVector {. inline .} =
  let sin_theta = math.sin(pos.colatitude)
  result = (sin_theta * math.cos(pos.longitude),
            sin_theta * math.sin(pos.longitude),
            math.cos(pos.colatitude))

(The purpose of let is to introduce a constant whose value is defined at runtime. Changing the value of sin_theta after its initial assignment is forbidden.)

Finally, I implemented the function which computes the angular distance (in radians) between two TAngularPositions (it implicitly uses the fact that vectors returned by AngToVec have length one):

proc AngularDistance*(pos1, pos2 : TAngularPosition): float =
  let vec1 = AngToVec(pos1)
  let vec2 = AngToVec(pos2)

  result = math.arccos(vec1 * vec2)

At the end of the file I implemented a number of unit tests to check the validity of my implementation:

when isMainModule:
  const TOLERANCE = 1.0e-7

  let ang1 = (colatitude: 0.1, longitude: 0.2)
  let ang2 = (colatitude: 0.4, longitude: 0.6)

  let vec1 = AngToVec(ang1)
  let vec2 = AngToVec(ang2)

  assert(abs(vec1.x - 0.09784340) < TOLERANCE)
  assert(abs(vec1.y - 0.01983384) < TOLERANCE)
  assert(abs(vec1.z - 0.99500417) < TOLERANCE)

  assert(abs(vec2.x - 0.32140083) < TOLERANCE)
  assert(abs(vec2.y - 0.21988214) < TOLERANCE)
  assert(abs(vec2.z - 0.92106099) < TOLERANCE)

  let dist = AngularDistance(ang1, ang2)
  assert(abs(dist - 0.31021624) < TOLERANCE)

Once I saved these definitions into positions.nim, I was able to run the unit tests simply by running

$ nimrod compile --run positions.nim

On the other hand, including the string

import positions

in another Nimrod source file imports these definitions. This is almost the same as in Python, with the difference that Nimrod does not include the unit tests in the compilation if positions.nim is imported into another file. (It is able to do this because when isMainModule is computed at compile time. It is like C's #if, but much less error-prone.)

Binding the program to CFITSIO

Since LFI's pointing information are saved in FITS files, I had to write a set of bindings for the CFITSIO library, However, as I said above, in Nimrod it is extremely easy to write bindings to C libraries.

First of all, I defined the name of the dynamic library for CFITSIO. I used the where construct we already used above (although I used this code only on a Linux machine):

when defined(windows):
  const LibraryName = "cfitsio.dll"
elif defined(macosx):
  const LibraryName = "libcfitsio(|.0).dylib"
else:
  const LibraryName = "libcfitsio.so"

(const declares a constant whose value must be fixed at compilation time.)

Then I defined a data type for CFITSIO's fitsptr, the basic data structure that represents a FITS file:

type
  TInternalFitsStruct = pointer
  PInternalFitsStruct = ptr TInternalFitsStruct

  TFitsFile* {. final .} = object
    file: TInternalFitsStruct

The type TInternalFitsStruct is a raw pointer to a fitsptr object, while TFitsFile is an high-level type which is going to be exported outside the module (hence the * at the end of the name).

Next step is to define how errors must be handled. Each of CFITSIO's functions returns an integer error code, but we want to take advantage of Nimrod's exception mechanism:

type
  EFitsException* = object of E_Base
    code: int
    message: string


proc fitsGetErrStatus(statusCode: int,
                      errorText: array[0..30, char]) {. cdecl,
                                                        dynlib: LibraryName,
                                                        importc: "ffgerr" .}

proc raiseFitsException(statusCode: int) {. noinline, noreturn .} =
  var errorText: array[0..30, char]
  fitsGetErrStatus(statusCode, errorText)

  raise newException(EFitsException, $errorText)

This code shows how to bind to a C function (in this case, ffgerr, which fills errorText with a string which describes the error represented by statusCode): as you can see, it is enough to include {. cdecl, importc: "ffgerr" .} at the end of the definition. Note that fitsGetErrorStatus is not a wrapper to ffgerr, but it is ffgerr. (Unlike let, var declares a true variable, i.e., one whose value can change during the execution of the program.)

At this point we can write a binding to the ffopen function:

type
  TIOMode* = enum
    ReadOnly = 0, ReadWrite = 1

proc fitsOpenFile(filePtr: PInternalFitsStruct, 
                  fileName: cstring, 
                  ioMode: TIOMode,
                  status: ptr int): int {. cdecl, 
                                           dynlib: LibraryName,
                                           importc: "ffopen" .}

proc OpenFile*(fileName: string, ioMode: TIOMode): TFitsFile =
  var status = 0
  result.file = nil
  if fitsOpenFile(addr(result.file), fileName, ioMode, addr(status)) != 0:
    raiseFitsException(status)

Function OpenFile is the high-level equivalent of ffopen, since it uses native Nimrod's data types string and enum, and it raises exceptions when error occur. As above, OpenFile is the only one to be exported, because of the *.

I wrote bindings for many other CFITSIO functions, but what I've shown here should already give the idea of how easy is to interface with C.

Reading data from FITS files

It is quite easy to read the FITS files that contain Planck/LFI pointings using CFITSIO: the files contain one large table with the time and the ecliptic coordinates in separate columns, and CFITSIO provides a number of functions to read columns into C arrays.

It is however quite tricky to read multiple columns efficiently: CFITSIO provides functions to read one column of data (or a subset of rows in that column), but FITS tables are saved in rows. Therefore, in order to read N columns CFITSIO must traverse the file from the beginning to the end N times: this is clearly inefficient.

CFITSIO provides a trick to overcome this limitation: since it reads data in chunks of a few KB, many elements from contiguous columns are present at the same time in its cache. CFITSIO is able to tell how many rows fill its cache (using the ffgrsz function, which I bound to GetOptimalNumberOfRowsForIO), so I implemented the following strategy:

type TPositions = seq[positions.TAngularPosition]

proc ReadAnglesFromFile(fileName: string): TPositions =
  var f = OpenTable(fileName, ReadOnly)

  let numOfRows = GetNumberOfRows(f)
  newSeq(result, numOfRows)

  let rowsPerChunk = GetOptimalNumberOfRowsForIO(f)
  var chunkOfData: seq[float]
  newSeq(chunkOfData, rowsPerChunk)

  var firstRow = 1
  var rowsToRead = numOfRows
  var done = false
  while not done:
    if rowsToRead < rowsPerChunk:
      setLen(chunkOfData, rowsToRead)

    ReadColumnOfFloats(f,
                       colNum = 3, 
                       firstRow = firstRow,
                       firstElem = 1,
                       destArray = chunkOfData)

    for idx in 0..len(chunkOfData)-1:
      result[firstRow - 1 + idx].colatitude = chunkOfData[idx]

    ReadColumnOfFloats(f,
                       colNum = 4, 
                       firstRow = firstRow,
                       firstElem = 1,
                       destArray = chunkOfData)

    for idx in 0..len(chunkOfData)-1:
      result[firstRow - 1 + idx].longitude = chunkOfData[idx]

    firstRow += len(chunkOfData)
    rowsToRead -= len(chunkOfData)

    if rowsToRead <= 0:
      break

  CloseFile(f)

(The functions OpenTable, GetNumberOfRows, GetOptimalNumberOfRowsForIO, ReadColumnOfFloats, and CloseFile are my bindings to analogous functions in CFITSIO.) The logic of the code should be easy to understand: the number of rows passed to the function ReadColumnOfFloats (rowsToRead) is always smaller or equal to the value returned by GetOptimalNumberOfRowsForIO, so that each call to ReadColumnOfFloats either reads just one chunk and stores it into the cache, or it reuses all the information already present in the cache without further I/O instructions. In each iteration of the while not done:... cycle, the function ReadColumnOfFloats is called twice, first to read the ecliptic latitude (column 3) and then the longitude (column 4). (There is also a column containing times, but you might recall that I am not interested in reading it for the purpose of my analysis.)

To store the arrays I used a seq type, which is basically a dynamic array. Unfortunately there does not seem to be a realiable way to use array slices in Nimrod. It would have been very useful to rewrite this code:

for idx in 0..len(chunkOfData)-1:
  result[firstRow - 1 + idx].colatitude = chunkOfData[idx]

into something like this:

result[firstRow:(firstRow + len(chunkOfData))] = chunkOfData

(this would have allowed Nimrod to easily vectorize such expressions in multicore CPUs). Such features are present in Ada and Fortran, and they are handy in number-crunching programs like this.

The last thing to do is to sew things together and build the main loop of the program.

Building the list of matches

Here is the (abridged) main loop of the program. I am skipping the definitions at the beginning of the file (I simply initialize a few variables like objectPosition and tolerance according to the values specified on the command line).

var matchFound = false
var matchFileNames: seq[string] = @[]

for paramIndex in 4..os.ParamCount():
  let curFile = os.ParamStr(paramIndex)

  echo("Reading file ",  curFile)
  var angles: seq[TAngularPosition]
  try:
    angles = ReadAnglesFromFile(curFile)
  except EFitsException:
    let msg = getCurrentExceptionMsg()
    echo "Error reading file " & curFile & ", reason: ", msg
    continue

  var minPos: positions.TAngularPosition
  var minDist = 2 * math.pi
  for idx in 0..len(angles)-1:
    let dist = positions.AngularDistance(angles[idx], objectPosition)
    if dist < minDist:
      minPos = angles[idx]
      minDist = dist

  if minDist < tolerance:
    matchFound = true
    add(matchFileNames, curFile)

if matchFound:
  echo "Matches found:"

  for i in 0..len(matchFileNames)-1:
    echo "  " & matchFileNames[i]

Nimrod: the good parts

Overally my experience with Nimrod was nice. I was able to write this program in one afternoon, which is not bad considering that I had no previous experience with the language and had to implement a set of bindings for CFITSIO. The features that mostly impressed me are the following:

  • It is extremely easy to create bindings to C libraries. (I wonder how much difficult would be to produce a Nimrod executable which uses many processes communicating via MPI?)

  • Its syntax is clean and elegant: I found that typing Nimrod code is a very quick process, because you do not have to bother with a number of details (e.g., declaring the type of each variable, putting { and } everywhere, …).

  • Nimrod programs are very fast. I did some quick tests writing short programs in C and Nimrod, and I found that the difference in speed between them is negligible (< 1%).

Nimrod: what I do not like

However, the language has some minor problems too. First of all, Nimrod's documentation is quite poor. At the moment the best introduction to the language is given in two tutorials, which however do not tell everything. There is also a manual, which is however considered still a "draft" and is not as readable as the tutorials.

Given the lack of good documentation, I would have expected Nimrod to follow the "principle of least surprise" in entity naming. Yet there are a number of small things that are different from other languages:

  • You print to the terminal using echo (only shell languages use this verb, while the rest of the world uses something like print or put). The existence of echo is however well explained in the tutorial from its very beginning.

  • You append elements to a dynamic array using add. No other language I know uses this verb: the most used solution is append, with the exception of C++'s push_back. This is annoying, as the tutorial does not say a word about this: I had to search in the documentation of the system module to discover the existence of add. (I discovered some days later that this is briefly mentioned in the manual.)

Another thing that I miss in Nimrod is an easy way to use format strings, similar to C's printf and Python's format. Apparently, when you want to use such facility you have to bind to the libc's implementation of printf. This is helpful if you only want to write very basic types (e.g., integers and floating point numbers), since of course printf is ignorant about Nimrod's datatypes. (Update: I discovered the existence of the strfmt module in the standard library. However, such a fundamental module does not seem to be advertised in the tutorials.)

The compiler shows a couple of weird behaviors too:

  • Why does the compiler emit error message using parentheses to identify the line number and column, like fits.nim(54, 2) Error: ...?

    The problem is that the majority of the compiler and tools around (e.g., GCC, Clang, Go, Rust, …) use a different format for error messages, namely filename:line:column: message. This means that a few handy macros in Emacs (e.g. next-error) cannot work without some tweaking, as well as some handy scripts I regularly use (e.g. to highlight errors and warnings using different colors). This is a minor problem, but it is nevertheless annoying, expecially considering that Rust and Go follow the more widely used standard. (I voluntarized to add a command-line flag to the compiler for switching to the more standard syntax, but people weren't interested.)

    Update: Too bad for my scripts, but at least I was able to make Emacs understand this new syntax. Unfortunately, my tweak only works when I run nimrod directly: if I call make, it doesn't. I've spent too much time trying to fix it, so at the end I gave up. Anyway, I added the following to my .emacs file:

(setq compilation-error-regexp-alist
    (push '("^\\([a-zA-Z0-9\\.]+\\)(\\([0-9]+\\),\\([0-9]+\\))\s\\(.*$\\)" 
       1 2 3)
     compilation-error-regexp-alist))
  • Compile error messages might be improved a bit. When I was implementing ReadColumnOfFloats I missed one parameter (it has six parameters, and it is a wrapper to a CFITSIO function with nine parameters!). The error message was not very helpful, since it did not point out which parameter I forgot:
fits.nim(339, 22) Error: type mismatch: got \
  (TInternalFitsStruct, int, int64, int64, \
  float, ptr float, ptr int, ptr int)
but expected one of: 
fits.fitsReadDblColumn(filePtr: TInternalFitsStruct, \
  colNum: int, firstRow: int64, firstElem: int64, \
  numOfElems: int64, nullValue: float, \
  destArray: ptr float, anyNull: ptr int, \
  status: ptr int): int

Conclusions

My overall impression of Nimrod is very good. The language is extremely easy to use, yet it has some powerful features. The ability to easily create bindings for C libraries is a big plus. It still shows some minor problems, which are probably due either to its early state (no version 1.0 has been release yet) or to the small number of people contributing to it.

2 comments:

  1. How is errors and exception handling resolved in Nimrod? Do they think exceptions are the answer, whereas in Go Language by google they implement errors as a separate function result...

    I've written a few articles on the problems with exceptions..

    I'm not experienced with Nimrod and don't know about the error system in it. Do you have any experience with handling errors or using exceptions? Is it still an intraprocedural goto system (That's what exceptions are)..

    Articles I've written on the topic relating to FPC:
    http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Suggestion-for-Modern-Pascal-Error-Contracts

    http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Improve-or-Ditch-The-Rude-Exceptions

    http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Try-Finally-Is-a-GOTO-That-Is-Too-Limited

    ReplyDelete
    Replies
    1. Hi Lars, your idea about the fact of declaring an error code in a manner that makes it clear to the compiler is extremely interesting. This would indeed make coding a lot easier!

      From what I understand, Nim (the new name for Nimrod) uses setjmp to implement exceptions.

      I have tried Go and must say I am not impressed with the way errors are handled: having to explicitly check the return value can be boresome in those cases where you have to do complex validation of the user's input.

      Delete