In the last months I have studied with great interest three new (imperative) languages: Nimrod, Go, and Rust. I find them interesting because they promise to reach the same speeds as C++, but without some of its most annoying features. They all are less verbose than Ada, but at the same time retain some of its clarity and strictness. At the moment I am trying to use them in some small/medium-sized projects, in order to better understand what are their advantages and their limits.
In this post I am going to describe my experience with Nimrod. The language is still in development (the latest release is 0.9.2), but it seems it has reached a pretty stable state. It has a number of interesting features:
- Its syntax is similar to C and Pascal, but it uses indentation as a way to delimit blocks of code (in the same way as Python does).
- The performance of Nimrod programs is usually the same as their C counterparts.
- It is a strongly typed language (i.e., you cannot sum integers and floats without using casting).
- Large programs can be split into packages, which work pretty much like Python.
- The language supports garbage collection, so that memory management is easier than in C.
- It features a LISP-like macro system!
- Like my old friend Chicken Scheme, it compiles sources to C and then invokes a C compiler. This means that writing bindings to C libraries is extremely easy.
I decided to use Nimrod to write a small program (~500 lines of code) to perform a simple task: determine at which time a celestial source in the sky was observed by the Planck/LFI radiometers. The process is quite easy to understand:
- The program reads the so-called "pointing information", which tell which sky direction each LFI horn was pointing at a certain time. It is a very large table (tens of GB), with four columns: the time of the observation, the ecliptic latitude, the ecliptic longitude, and the rotation of the antenna around the observing direction.
- Pointing information is divided into many files. There is one file per day, and I only need to know in which days an object was observed. Therefore, for all the observing directions contained in each file X, the program determines how far each of them is from the celestial source. If at least one of them is smaller than the angular resolution of the instrument, then the name of file X is saved in a list.
- When all the files have been read, print the list built during the process.
Let's see how I implemented this using Nimrod.
Basic trigonometry
Implementing the trigonometric functions used to calculate the angular distance between two positions was fairly trivial. To compute the angular distance between two directions I first converted each direction into a vector and then calculated their dot product. So I implemented two types: the first one represents an angular direction in ecliptic coordinates, and the other one represents a vector in a 3D space:
import math
type
TAngularPosition* = tuple[colatitude, longitude: float] # In radians
TVector* = tuple[x, y, z: float]
Then I overloaded the multiplication operator *
in order to compute the dot product between two vectors:
proc `*` (v1, v2: TVector): float {. inline .} =
return (v1.x * v2.x) + (v1.y * v2.y) + (v1.z * v2.z)
Note the use of {. inline .}
, which avoids the cost of a function call.
Then I implemented a function to convert a TAngularPosition
object into a TVector
:
proc AngToVec*(pos: TAngularPosition): TVector {. inline .} =
let sin_theta = math.sin(pos.colatitude)
result = (sin_theta * math.cos(pos.longitude),
sin_theta * math.sin(pos.longitude),
math.cos(pos.colatitude))
(The purpose of let
is to introduce a constant whose value is defined at runtime. Changing the value of sin_theta
after its initial assignment is forbidden.)
Finally, I implemented the function which computes the angular distance (in radians) between two TAngularPosition
s (it implicitly uses the fact that vectors returned by AngToVec
have length one):
proc AngularDistance*(pos1, pos2 : TAngularPosition): float =
let vec1 = AngToVec(pos1)
let vec2 = AngToVec(pos2)
result = math.arccos(vec1 * vec2)
At the end of the file I implemented a number of unit tests to check the validity of my implementation:
when isMainModule:
const TOLERANCE = 1.0e-7
let ang1 = (colatitude: 0.1, longitude: 0.2)
let ang2 = (colatitude: 0.4, longitude: 0.6)
let vec1 = AngToVec(ang1)
let vec2 = AngToVec(ang2)
assert(abs(vec1.x - 0.09784340) < TOLERANCE)
assert(abs(vec1.y - 0.01983384) < TOLERANCE)
assert(abs(vec1.z - 0.99500417) < TOLERANCE)
assert(abs(vec2.x - 0.32140083) < TOLERANCE)
assert(abs(vec2.y - 0.21988214) < TOLERANCE)
assert(abs(vec2.z - 0.92106099) < TOLERANCE)
let dist = AngularDistance(ang1, ang2)
assert(abs(dist - 0.31021624) < TOLERANCE)
Once I saved these definitions into positions.nim
, I was able to run the unit tests simply by running
$ nimrod compile --run positions.nim
On the other hand, including the string
import positions
in another Nimrod source file imports these definitions. This is almost the same as in Python, with the difference that Nimrod does not include the unit tests in the compilation if positions.nim
is imported into another file. (It is able to do this because when isMainModule
is computed at compile time. It is like C's #if
, but much less error-prone.)
Binding the program to CFITSIO
Since LFI's pointing information are saved in FITS files, I had to write a set of bindings for the CFITSIO library, However, as I said above, in Nimrod it is extremely easy to write bindings to C libraries.
First of all, I defined the name of the dynamic library for CFITSIO. I used the where
construct we already used above (although I used this code only on a Linux machine):
when defined(windows):
const LibraryName = "cfitsio.dll"
elif defined(macosx):
const LibraryName = "libcfitsio(|.0).dylib"
else:
const LibraryName = "libcfitsio.so"
(const
declares a constant whose value must be fixed at compilation time.)
Then I defined a data type for CFITSIO's fitsptr
, the basic data structure that represents a FITS file:
type
TInternalFitsStruct = pointer
PInternalFitsStruct = ptr TInternalFitsStruct
TFitsFile* {. final .} = object
file: TInternalFitsStruct
The type TInternalFitsStruct
is a raw pointer to a fitsptr
object, while TFitsFile
is an high-level type which is going to be exported outside the module (hence the *
at the end of the name).
Next step is to define how errors must be handled. Each of CFITSIO's functions returns an integer error code, but we want to take advantage of Nimrod's exception mechanism:
type
EFitsException* = object of E_Base
code: int
message: string
proc fitsGetErrStatus(statusCode: int,
errorText: array[0..30, char]) {. cdecl,
dynlib: LibraryName,
importc: "ffgerr" .}
proc raiseFitsException(statusCode: int) {. noinline, noreturn .} =
var errorText: array[0..30, char]
fitsGetErrStatus(statusCode, errorText)
raise newException(EFitsException, $errorText)
This code shows how to bind to a C function (in this case, ffgerr
, which fills errorText
with a string which describes the error represented by statusCode
): as you can see, it is enough to include {. cdecl, importc: "ffgerr" .}
at the end of the definition. Note that fitsGetErrorStatus
is not a wrapper to ffgerr
, but it is ffgerr
. (Unlike let
, var
declares a true variable, i.e., one whose value can change during the execution of the program.)
At this point we can write a binding to the ffopen
function:
type
TIOMode* = enum
ReadOnly = 0, ReadWrite = 1
proc fitsOpenFile(filePtr: PInternalFitsStruct,
fileName: cstring,
ioMode: TIOMode,
status: ptr int): int {. cdecl,
dynlib: LibraryName,
importc: "ffopen" .}
proc OpenFile*(fileName: string, ioMode: TIOMode): TFitsFile =
var status = 0
result.file = nil
if fitsOpenFile(addr(result.file), fileName, ioMode, addr(status)) != 0:
raiseFitsException(status)
Function OpenFile
is the high-level equivalent of ffopen
, since it uses native Nimrod's data types string
and enum
, and it raises exceptions when error occur. As above, OpenFile
is the only one to be exported, because of the *
.
I wrote bindings for many other CFITSIO functions, but what I've shown here should already give the idea of how easy is to interface with C.
Reading data from FITS files
It is quite easy to read the FITS files that contain Planck/LFI pointings using CFITSIO: the files contain one large table with the time and the ecliptic coordinates in separate columns, and CFITSIO provides a number of functions to read columns into C arrays.
It is however quite tricky to read multiple columns efficiently: CFITSIO provides functions to read one column of data (or a subset of rows in that column), but FITS tables are saved in rows. Therefore, in order to read N columns CFITSIO must traverse the file from the beginning to the end N times: this is clearly inefficient.
CFITSIO provides a trick to overcome this limitation: since it reads data in chunks of a few KB, many elements from contiguous columns are present at the same time in its cache. CFITSIO is able to tell how many rows fill its cache (using the ffgrsz
function, which I bound to GetOptimalNumberOfRowsForIO
), so I implemented the following strategy:
type TPositions = seq[positions.TAngularPosition]
proc ReadAnglesFromFile(fileName: string): TPositions =
var f = OpenTable(fileName, ReadOnly)
let numOfRows = GetNumberOfRows(f)
newSeq(result, numOfRows)
let rowsPerChunk = GetOptimalNumberOfRowsForIO(f)
var chunkOfData: seq[float]
newSeq(chunkOfData, rowsPerChunk)
var firstRow = 1
var rowsToRead = numOfRows
var done = false
while not done:
if rowsToRead < rowsPerChunk:
setLen(chunkOfData, rowsToRead)
ReadColumnOfFloats(f,
colNum = 3,
firstRow = firstRow,
firstElem = 1,
destArray = chunkOfData)
for idx in 0..len(chunkOfData)-1:
result[firstRow - 1 + idx].colatitude = chunkOfData[idx]
ReadColumnOfFloats(f,
colNum = 4,
firstRow = firstRow,
firstElem = 1,
destArray = chunkOfData)
for idx in 0..len(chunkOfData)-1:
result[firstRow - 1 + idx].longitude = chunkOfData[idx]
firstRow += len(chunkOfData)
rowsToRead -= len(chunkOfData)
if rowsToRead <= 0:
break
CloseFile(f)
(The functions OpenTable
, GetNumberOfRows
, GetOptimalNumberOfRowsForIO
, ReadColumnOfFloats
, and CloseFile
are my bindings to analogous functions in CFITSIO.) The logic of the code should be easy to understand: the number of rows passed to the function ReadColumnOfFloats
(rowsToRead
) is always smaller or equal to the value returned by GetOptimalNumberOfRowsForIO
, so that each call to ReadColumnOfFloats
either reads just one chunk and stores it into the cache, or it reuses all the information already present in the cache without further I/O instructions. In each iteration of the while not done:...
cycle, the function ReadColumnOfFloats
is called twice, first to read the ecliptic latitude (column 3) and then the longitude (column 4). (There is also a column containing times, but you might recall that I am not interested in reading it for the purpose of my analysis.)
To store the arrays I used a seq
type, which is basically a dynamic array. Unfortunately there does not seem to be a realiable way to use array slices in Nimrod. It would have been very useful to rewrite this code:
for idx in 0..len(chunkOfData)-1:
result[firstRow - 1 + idx].colatitude = chunkOfData[idx]
into something like this:
result[firstRow:(firstRow + len(chunkOfData))] = chunkOfData
(this would have allowed Nimrod to easily vectorize such expressions in multicore CPUs). Such features are present in Ada and Fortran, and they are handy in number-crunching programs like this.
The last thing to do is to sew things together and build the main loop of the program.
Building the list of matches
Here is the (abridged) main loop of the program. I am skipping the definitions at the beginning of the file (I simply initialize a few variables like objectPosition
and tolerance
according to the values specified on the command line).
var matchFound = false
var matchFileNames: seq[string] = @[]
for paramIndex in 4..os.ParamCount():
let curFile = os.ParamStr(paramIndex)
echo("Reading file ", curFile)
var angles: seq[TAngularPosition]
try:
angles = ReadAnglesFromFile(curFile)
except EFitsException:
let msg = getCurrentExceptionMsg()
echo "Error reading file " & curFile & ", reason: ", msg
continue
var minPos: positions.TAngularPosition
var minDist = 2 * math.pi
for idx in 0..len(angles)-1:
let dist = positions.AngularDistance(angles[idx], objectPosition)
if dist < minDist:
minPos = angles[idx]
minDist = dist
if minDist < tolerance:
matchFound = true
add(matchFileNames, curFile)
if matchFound:
echo "Matches found:"
for i in 0..len(matchFileNames)-1:
echo " " & matchFileNames[i]
Nimrod: the good parts
Overally my experience with Nimrod was nice. I was able to write this program in one afternoon, which is not bad considering that I had no previous experience with the language and had to implement a set of bindings for CFITSIO. The features that mostly impressed me are the following:
It is extremely easy to create bindings to C libraries. (I wonder how much difficult would be to produce a Nimrod executable which uses many processes communicating via MPI?)
Its syntax is clean and elegant: I found that typing Nimrod code is a very quick process, because you do not have to bother with a number of details (e.g., declaring the type of each variable, putting
{
and}
everywhere, …).Nimrod programs are very fast. I did some quick tests writing short programs in C and Nimrod, and I found that the difference in speed between them is negligible (< 1%).
Nimrod: what I do not like
However, the language has some minor problems too. First of all, Nimrod's documentation is quite poor. At the moment the best introduction to the language is given in two tutorials, which however do not tell everything. There is also a manual, which is however considered still a "draft" and is not as readable as the tutorials.
Given the lack of good documentation, I would have expected Nimrod to follow the "principle of least surprise" in entity naming. Yet there are a number of small things that are different from other languages:
You print to the terminal using
echo
(only shell languages use this verb, while the rest of the world uses something likeprint
orput
). The existence ofecho
is however well explained in the tutorial from its very beginning.You append elements to a dynamic array using
add
. No other language I know uses this verb: the most used solution isappend
, with the exception of C++'spush_back
. This is annoying, as the tutorial does not say a word about this: I had to search in the documentation of thesystem
module to discover the existence ofadd
. (I discovered some days later that this is briefly mentioned in the manual.)
Another thing that I miss in Nimrod is an easy way to use format strings, similar to C's printf
and Python's format
. Apparently, when you want to use such facility you have to bind to the libc's implementation of printf
. This is helpful if you only want to write very basic types (e.g., integers and floating point numbers), since of course printf
is ignorant about Nimrod's datatypes. (Update: I discovered the existence of the strfmt module in the standard library. However, such a fundamental module does not seem to be advertised in the tutorials.)
The compiler shows a couple of weird behaviors too:
Why does the compiler emit error message using parentheses to identify the line number and column, like
fits.nim(54, 2) Error: ...
?The problem is that the majority of the compiler and tools around (e.g., GCC, Clang, Go, Rust, …) use a different format for error messages, namely
filename:line:column: message
. This means that a few handy macros in Emacs (e.g.next-error
) cannot work without some tweaking, as well as some handy scripts I regularly use (e.g. to highlight errors and warnings using different colors). This is a minor problem, but it is nevertheless annoying, expecially considering that Rust and Go follow the more widely used standard. (I voluntarized to add a command-line flag to the compiler for switching to the more standard syntax, but people weren't interested.)Update: Too bad for my scripts, but at least I was able to make Emacs understand this new syntax. Unfortunately, my tweak only works when I run
nimrod
directly: if I callmake
, it doesn't. I've spent too much time trying to fix it, so at the end I gave up. Anyway, I added the following to my.emacs
file:
(setq compilation-error-regexp-alist
(push '("^\\([a-zA-Z0-9\\.]+\\)(\\([0-9]+\\),\\([0-9]+\\))\s\\(.*$\\)"
1 2 3)
compilation-error-regexp-alist))
- Compile error messages might be improved a bit. When I was implementing
ReadColumnOfFloats
I missed one parameter (it has six parameters, and it is a wrapper to a CFITSIO function with nine parameters!). The error message was not very helpful, since it did not point out which parameter I forgot:
fits.nim(339, 22) Error: type mismatch: got \
(TInternalFitsStruct, int, int64, int64, \
float, ptr float, ptr int, ptr int)
but expected one of:
fits.fitsReadDblColumn(filePtr: TInternalFitsStruct, \
colNum: int, firstRow: int64, firstElem: int64, \
numOfElems: int64, nullValue: float, \
destArray: ptr float, anyNull: ptr int, \
status: ptr int): int
Conclusions
My overall impression of Nimrod is very good. The language is extremely easy to use, yet it has some powerful features. The ability to easily create bindings for C libraries is a big plus. It still shows some minor problems, which are probably due either to its early state (no version 1.0 has been release yet) or to the small number of people contributing to it.
How is errors and exception handling resolved in Nimrod? Do they think exceptions are the answer, whereas in Go Language by google they implement errors as a separate function result...
ReplyDeleteI've written a few articles on the problems with exceptions..
I'm not experienced with Nimrod and don't know about the error system in it. Do you have any experience with handling errors or using exceptions? Is it still an intraprocedural goto system (That's what exceptions are)..
Articles I've written on the topic relating to FPC:
http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Suggestion-for-Modern-Pascal-Error-Contracts
http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Improve-or-Ditch-The-Rude-Exceptions
http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Try-Finally-Is-a-GOTO-That-Is-Too-Limited
Hi Lars, your idea about the fact of declaring an error code in a manner that makes it clear to the compiler is extremely interesting. This would indeed make coding a lot easier!
DeleteFrom what I understand, Nim (the new name for Nimrod) uses setjmp to implement exceptions.
I have tried Go and must say I am not impressed with the way errors are handled: having to explicitly check the return value can be boresome in those cases where you have to do complex validation of the user's input.