2012-09-07

Discovering Ada

After almost one year since my last post, I am going to write something more regarding "exotic" languages. During the last year I was able to investigate the caracteristics of a well-known, underused language: Ada.

I was interested in it because I found two references to it in the realm of astrophysics: the first one is a spectral line synthesis code for magnetic stellar atmospheres, the second one is a list of satellites which are using software (partly) written in Ada. Also, I always heard of how Ada's typesistem is exceptionally safe and was truly interested in giving it a try.

So here is a list of features of Ada that caught my attention:

  • It is a statically typed language (like C/C++/Fortran/Haskell, unlike Python/Ruby/Scheme). However, unlike C and (to a lesser degree) C++, its type system is strong. This means that the compiler enforces type correctness and will not silently convert e.g. floats to integers.
  • Ada code looks verbose, but very readable. (As I understand, this was one of the driving requirements in the development of the language.)
  • It has a number of interesting features over C and C++, like keyword arguments (called named parameters), nested functions, and some primitive type-inference (e.g. you do not have to declare the type of a variable used in a for loop).
  • It allows code to be split into packages (more on this later).
  • Ada's versatile typesystem allows the programmer to make the compiler doing dimensionality checks. So, e.g., it will signal an inconsistency for code like if obj_speed < field_size, if obj_speed and field_size have been properly declared. Apparently, this feature was considerably extended with the latest version of the language (Ada 2012).
  • It has native support for multitasking. (As far as I know, Ada, Erlang and Go are the only non-academic languages that were designed from the ground up with this capability.)
  • Ada is compiled to machine code: the reference open-source implementation is GNAT, which is part of GCC (GNU compiler collection).
  • Being a tightly integrated component of GCC, it is extremely easy to develop bindings to C/C++ libraries (there is even a tool to do this automatically).
  • Interestingly enough, GNAT is developed by AdaCore, a commercial company which seems to have a quite large user base. It develops both the open-source compiler and a commercial version.

Is Ada outdated?

Before digging into Ada, I had the idea that it was an outdated language with virtually no users today. But I was wrong on both fronts:

  • Ada was born more or less in the same years as C++: work on Ada began in 1976, while Stroustrup's "C with classes" toy language dates back to 1979.
  • There are a lot of Ada users. Only, Ada programmers do not seem to work in the contexts I'm used to.

Is Ada verbose?

I do not like verbosity in general. I have always been amazed by the coinciseness of languages like Haskell: in my opinion, the shorter a program is, the quicker you're able to find problems in it. And, undoubtedly, Ada is quite verbose. A lot of grammar constructs are not strictly necessary for the compiler to understand what the code should do: e.g. the is at the end of procedure/function declaration.

However, I must admit that I've found more than once some source code that was so condensed that it is difficult to understand how it worked &edash; or why it was not working. Readability is probably as important as coinciseness. Ada's designers put a lot of effort in making the language easy to read, even to people that have never learned Ada.

Types, types and types

Ada's strongest advantage is probably its versatile typesystem. You can define "subtypes" which optimally limit the range of values of a primitive type (e.g. an integer which can hold values between 18 and 28): Ada will automatically add bounds checking code. (If you do not limit the range, your subtype works like typedefs in C.)

However, the most interesting feature is the ability to define new types from primitives. In this case you are not allowed to mix the new type with its primitive, unless you explicitly tell to compiler to allow you. So, you can e.g. create three new types distance, time, and speed from the primitive type float: you will not be allowed to combine (e.g. add/multiply) variables of different type, unless you manually override compiler's checks or redefine operators (like C++). This allows to check for measure unit's consistency in the code. (Such a feature is possible in C++, at the expense however of writing a lot of bolierplate code: basically, you have to define a class which wraps a float and implement manually every operation you need on them; on the contrary, Ada already knows how to sum two distance variables, as they behave the same as floats.)

Packages

I am not happy about how C/C++ programs are split into multiple files. You usually separate the classes and functions in different .cpp files, and in each of them you include a .h file which provide the class/function definition. However, the compiler is unaware of the difference between .h and .cpp files: the difference is relevant only to the C preprocessor.

Compare this with Ada packages. You have to write two files, as in C++: one with extension .ads (the specification file, analogous to .h files) and .adb (the implementation file, analogous to .cpp files). However, the Ada language allows you to specify what of the package has to be exported and what is meant to be private. This is similar to the concept of private/public methods in C++ classes, but it works at file level. It is much similar to units in Turbo Pascal, and it is much more effective. (Also, it allows the GNAT compiler to recompile outdated dependencies without the need of a Makefile.)

So, how fast is it?

I was particularly ingrigued by the fact that the GNAT Ada compiler is integrated into GCC. This allows Ada code to be optimized by the same machinery GCC employs for C/C++/Fortran/ObjC code, and it can in principle guarantee the same performance. I was however puzzled by the results of the Computer Language Benchmarks Game, which clearly showed that on average C++ code required less memory and in some cases much less time to run (here is the GNAT/g++ comparison, and here the GNAT/gcc comparison, which is even worse for GNAT).

So I investigated a bit why this difference. I picked the k-nucleotide example, for which the fastest C code is available at this link. I found that, even if Ada is considerably slower than C, it still ranks third. In my opinion, this indicates that the C program has been dramatically optimized, not that Ada is inefficient per se.

As a side note, I found that C/C++ programs in the Shootout are sometimes so optimized that it is difficult to understand what they are doing: look at this implementation of mandelbrot, expecially the function calcrow: do you now what __builtin_ia32_cmplepd and __builtin_ia32_movmskpd are supposed to do without reading the comments? Given the large number of C/C++ programmers, I bet that it's easier to find a relatively large subset which is good in low-level optimizations and assembly coding: this might explain why C/C++ Shootout programs often perform better than Fortran and Ada.