2010-11-18

The Scheme of Things/1

In one of my previous posts I anticipated that I was going to study some functional programming languages in order to better understand how to write parallel programs for multi-core and multi-CPU architectures. During my (very slow!) quest I re-discovered a old language I used to be quite fond of: Scheme, a LISP-like language which dates back to the 1970s. In this post and the following I shall provide some information about my relationship with this wonderful small language, as well as some insights about how it might be useful for scientific computations.

Music Typesetting and Scheme

My first experience with Scheme dates back to about ten years ago. I am a passionate music lover, and I was looking for a good program to typeset music. I found the marvellous GNU Lilypond, which was well-suited for what I wanted to do at the time. I contributed a lot of pieces to the Mutopia project (mostly scores by Mozart and other classical composers, see e.g. Mozart's String Quartets from Op. 10), and at the same time I became quite good in extending Lilypond using its scripting language. Guess which language it was? Of course Scheme! Lilypond used GNU Guile, a Scheme interpreter sponsored by the Free Software Foundation (yes, the guys behind Gnu Emacs and GCC). The intention of the Guile coders was to develop an interpreter that could be used by other projects to provide a "plugin" language usable for extending large applications. (Unfortunately, the project did not suceed in gaining enough popularity and it has developed at a quite slow pace: just have a look at these dates and version numbers.)

Partial Differential Equations and Scheme

During 2001-2002 I worked on my master thesis. The main objective of the work was to study the solution of the heat equation given some well-defined boundary conditions, in order to better understand some properties of the reference loads of the Planck/LFI instrument. One of the tools I developed for my study was a small C program which used the Crank-Nicholson implicit method to solve the one-dimensional heat equation numerically.

Since the objective of the program was to simulate how the temperature of a body changes with time, the output was a quite large set of numbers: for each time step, there were tens of temperatures (sampled on a regular grid) that the program printed. According to the type of analysis I was doing, I had to throw away irrelevent temperatures and keep the ones I was interested: at the beginning I was using GNU Awk for the job, but this was not the optimal solution. At some time I realized that what I needed was some way to attach "plugins" to my program: each plugin would have been a small program which would have driven the program and instructed it to print only the information needed by the ongoing analysis.

This of course called for an embedded interpreter. At the time the one I knew best was of course Guile (see above), so I decided to implement Guile bindings in my program. All worked like a charm: once the C program was properly debugged and produce reasonable results, I could start writing Scheme snippets to perform every analysis I wanted and attach them to the program without the need to recompile it. It was like a dream! (Actually, one of the appendices of my thesis contains the source code for some of those Scheme plugins.)

Thermal Modeling and Scheme

After having defended my thesis, I continued studying the thermal performances of Planck. We used a number of closed-source thermal modeling tools based on Fortran 77 for doing the most complex analysis. Such tools required the user to create huge text files (called model files) containing details about how the object under study was to be discretized and what were its characteristics (e.g. density, specific heat…) at each point. Such files were interwoven with Fortran 77 code which calculated and extracted the information required for the analysis (much like Scheme plugins did for my program).

Managing such files was a pain, so I created a couple of programs to help me in the task. The first program, Thermal Scheme, looked for a specific signature in the contents of model files: every text between #{ and #} was interpreted as Scheme code, which was run and its result replaced the Scheme code in the model file. Therefore, giving a text file like

Hello #{(display "world")#}!

Thermal Scheme would translate it into

Hello world!

Of course, Thermal Scheme provided a number of Scheme functions to easily define and discretize complex structures.

Thermal Scheme was able to act like a "data provider": when some command-line flag was used, the program recorded every Scheme definition and built a text file containing a 3-D representation of the object. A companion program, Thermal Viewer, was able to connect to "Thermal Scheme", get this text file and create an interactive 3-D view of the object using OpenGL.

Things Go Awry!

After a few months working with Thermal Scheme, I was supposed to move to the Rutherford Appleton Laboratory (Oxford, UK) to continue working on Planck thermal analyses. I even arranged a demo of the capabilities of Thermal Scheme! But then in Spring 2003 the Italian Government obliged me to do ten months of civilian service, so I was forced to decline that job offer. Ten months after (May 2004) my duties were over and I got a temporary position at the National Institute of Astrophysics, where my first duty was to build a medium-sized software tool for Planck/LFI using RSI IDL (a Fortran-like language for data analysis). So I started studying IDL and completely forgot about Scheme… till a few weeks ago!

In one of my next posts I shall tell you how Scheme evolved in the meantimes, and what I discovered about the performances of some of the currently developed Scheme interpreters/compilers. Stay tuned!

2010-11-08

Authenticating NIS users with PAM - Quirks

Last friday I had a large headache in trying to understand why the account of my friend Simona was not allowing her to authenticate on the computers of our institute. The rule here is to manage all the authentications using a NIS server, and keeping the home directories distributed among a number of servers. It seemed that Simona was not able to authenticate from any of the computers used by our team, but the sysadmin had no clues why this was happening.

The error messages written by our Debian machine (in /var/log/auth.log) clearly indicated that the computer was able to recognize Simona as one of its users, but it did not accepted her password. To understand why this was happening, I had to browse the documentation (and the source code!) of PAM. A really interesting read, but this did not explain why every user (including myself) but Simona was able to authenticate.

After hours of fruitless attempts, I realized there was something wrong when I used ypmatch passwd to compare Simona's key in the NIS map with mine: her encrypted password was much longer than mine. This hinted the nature of the problem: a mismatch in the authentication mechanism.

Remote authentication works by encripting password attempts and comparing them with the encripted password memorized by the server. (The comparison is not done on the passwords directly as they need to travel through the web, making this process insecure.) Of course, for the comparison to be effective the encription algorithm must be the same between the attempt and the password stored in the server!

Some time ago the sysadmin upgraded the operating system of the NIS server, and this apparently changed the default encription algorithm for new users to be Blowfish instead of MD5. (Note that old users like me kept the old MD5 encrypted password.) But our computers (running Debian Lenny 5.0) implemented the plain PAM module, which is only able to use MD5. Therefore, the comparison was made between the MD5-encrypted password entered by Simona (and encripted by our computer) and the Blowfish-encrypted password stored in the NIS server: this obviously failed! This also explained why I was still able to authenticate even after the sysadmin's upgrade.

The solution was to install the Debian package libpam_unix2, which provides a PAM module which recognizes both MD5 and Blowfish, and replace every occurrency of pam_unix.so with pam_unix2.so in the PAM configuration files (in /etc/pam.d/common-*). Phew!

2010-11-02

Clojure and FP - First impressions

This blog entry is to report my achievements with Clojure and Functional Programming (see my previous post).

What I do not like

The first thing that one faces when trying to use Clojure is the difficulty in installing it. The downloads page only lists a few zip files, without saying anything about how to use them. (Sure, there is a Readme file, but I would have like to know something more before downloading anything.) In comparison, the Downloads page of Scala is much better in shape, and a number of pre-built packages are provided. Moreover, Scala is included in Macports, while Clojure is not. Once the ZIP file has been uncompressed, the Readme file is extremely laconic: it just reports the command required to run/build Clojure. Since the REPL has no editing facilities (i.e. you cannot use Backspace/delete to modify what you have written, and there is no history), I would have expected some reference to jLine or rlwrap as these can really save you the day.

In general, Clojure shows its young age in a number of other spots. The documentation is quite sparse, and you often have to look at the source code of some function in order to understand its behavior. Many tools I regularly use for source code (e.g. Vim, Source-highlight and others) do not have native support for Clojure (although you can usually find something in the Web - I find VimClojure one of the best Vim plugins I have ever seen, the "rainbow parentheses" mode is incredibly useful!).

What I like

Being a LISP dialect, Clojure can be extremely elegant. Here is some code I wrote to solve problem 34 from Project Euler.

A few things worth to note:

  • Clojure implements a large number of high-level data structures, like dictionaries. I used the latters in fast-fact, a dictionary whose keys are the 10 digits and whose values are the corresponding factorials.
  • It is really easy to call Java functions like toString: just prepend them with a dot. No need to import modules to do this!
  • Although Clojure is a dynamically typed language (like e.g. Python), it allows to specify the type of the input parameters for functions and let bindings. I have verified that this allows to dramatically increase the speed of the code (3x or 4x times).
  • Like Python, Clojure allows to use docstrings. They can be accessed from the REPL through the doc function.

Python vs. Clojure

Since I am an avid Python user, I immediately rewrote the program in Python:

I tried to follow the same logic used in the Clojure program, i.e. using a dictionary for the first 10 factorials and getting the list of digits for a number by first converting it to a string (an alternative would be to repeatedly apply division and modulus 10). Then, using the Clojure time function and the IPython %time command, I was able to record the times used by each program. In order to get sounder values, I repeated each measurement till the timing did not change significatively any more.

Results are shown in the plot on the right. They look quite impressive: as shown by the following graph, type hints allow Clojure to be more than three times faster than Python. Even not using type hints would have allowed Clojure to be approximately 20% faster than Python.

Preliminary conclusions

Despite its young age and a few rough edges, my overall impression is extremely positive: Clojure is really nice to use and the number of features it provides are extremely interesting. Its speed is also remarkable.