2012-11-09

What Ada and D taught me about C++

In the last weeks I needed to write some C++ code and immediately had strong dislike for the language. This has surprised me a bit, since I have used C++ for almost 20 years. (My first C++ compiler was Borland Turbo C++ 3.0 for Windows.)

This reaction of mine has been probably triggered by two facts: (1) reading Coders at work revealed that C++ is strongly disliked by many language designers, and (2) in the past two years I have learned many new languages whose features have often made myself thinking, "Oh, this approach is way better than C++'s!"

So I began thinking of any viable alternative to C++ for my "big" projects, i.e., those projects that are going to be more than a few hundreds SLOCs and therefore would take advantage by a statically-typed language, or projects that need to be really fast and therefore need a compiled language.

Two interesting languages that satisfy both requirements are Ada and D. In the next paragraphs I'll show what I learned about C++'s limitations from my study on both Ada and D.

C++ and its compatibility with C

I've realized that perhaps the biggest problem with C++ is Stroustrup's decision of preserving compatibility with C as much as possible. Although this decision probably offers a partial explanation of the success of C++, it has lead programmers to accept without question limitations and strange features that other modern languages do not have.

The C language was invented "between 1969 and 1973", as Wikipedia says. This means that it is of the same age as Wirth's Pascal, and some of its characteristics have already begun to show their age. Let's see a few examples.

Separation between the interface and the implementation of classes

Consider the declaration of this C++ class (file foo.hpp):

// File foo.hpp
#ifndef FOO_HPP_INCLUDED
#define FOO_HPP_INCLUDED

class Foo {
public:
    int value;
    
    Foo(int aValue) : value(aValue) {}
    void doSomethingFancy();
};

#endif

Any code which needs to use the Foo class should include this header fileusing #include "foo.hpp". Unless the implementation of Foo::doSomethingFancy is not only a few lines long, it should go in a separate .cpp file (foo.cpp):

// File foo.cpp
#include "foo.hpp"

void Foo::doSomethingFancy()
{
    // Do something fancy!
}

What is the problem with this approach? Let's reimplement class Foo in Python:

class Foo:
    value = 0

    def __init__(self, val):
        self.value = val
    
    def getValue(self):
        return self.value
    
    def doSomethingFancy(self):
        ... # Do something fancy!

As you can see, in Python, you only need one file, which defines both the interface and the implementation of the class. In C++, you need both the header (interface) and the .cpp file (implementation): in this way, you have to keep the two files continuously updated. Moreover, if you are looking for the implementation of a function, you cannot know if it has been inlined in the .hpp file or if it is in the .cpp file.

The problem is that this #include stuff is done by the preprocessor (the /usr/bin/cpp executable, a C heritage), and the C++ compiler is completely oblivious of any inclusion when it comes to parse the files (in principle). This leads to the well-known fact that if, after having compiled your program, you modify foo.hpp, recompiling all the files that depend on it, then the compiler will not be aware of this and strange things will happen. You must circumvent this by relying on other software (see the section of the GNU Make about automatic prerequisites).

Of course, this example is not completely fair as Python is interpreted while C++ is compiled. But consider that the guys that designed D (again, a compiled language) were able to devise a module mechanism that is similar to Python — at the expense of breaking compatibility with C/C++, of course. Here is file foo.d:

// Written in the D language

class Foo {
public:
    int value;
    this(int aValue) { value = aValue; }
    void doSomethingFancy()
    {
        // Do something fancy!
    }
}

To use this class in a program, put import foo; at the beginning of your source code. If the test program is testfoo.d, you can build the program using the command dmd -o testfoo testfoo.d foo.d. This is much like C++, where you would write cc -o testfoo testfoo.cpp foo.cpp, but in this case you do not need any header file at all! (Even if D allows for .di files that are similar to C++ header files, their usage is limited, as explained in this forum post: - also, .di files can be automatically generated by the compiler).

What is the situation with Ada? Its approach is somewhere in the middle between C++ and D: modules (in Ada terminology, packages) are split in two files, one (with extension .ads) providing the interface, the other one (with extension .adb) providing the implementation. But, unlike C++, Ada compilers are able to automatically decide which packages are up-to-date and which ones need to be rebuilt, without resorting to any GNU Make magic. (This feature was present in Borland Turbo Pascal's units as well: this is one of the reasons why I am still in love with Pascal.) Moreover, GNAT includes a handy tool, gnatstub, which automatically generates an .adb file from a .ads file (this is the contrary of what D does with .di files, but it fits the way programs are usually developed in Ada — first the interface, then the implementation.)

Multiple declaration of variables

In section 4.9.2 of the C++ programming language (third edition), Stroustrup warns the reader of a potentially confusing way of declaring variables. Consider this example (taken from Stroustrup's):

int* p, y;  // int* p; int y;

It can be potentially confusing, because it it not clear if y is a pointer to int like p or not. Stroustrup says that such constructs should be avoided. But then, one might ask, why are they accepted by the C++ standard? The answer is simple: to preserve compatibility with C.

On the other hand, both D and Ada forbid such constructs.

Implicit conversion between 0 and NULL

A particularly nasty problem with C++ is the use of NULL. Consider what happens if you want to initialize a string to "false", but forget the double quotes:

#include <iostream>
#include <string>

int main(void)
{
    std::string s = false; // What the programmer meant is "false";
    std::cout << s << std::endl;
    return 0;
}

(this is a real example, see this thread on StackOverflow.)

Compiling the program using g++ 4.4.6 does not produce any warning, yet the program crashes:

$ g++ -o stringtest stringtest.cpp
$ ./stringtest
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct NULL not valid
Aborted (core dumped)

(Note however that g++ 4.7.2 correctly produces the following warning: "warning: converting ‘false’ to pointer type for argument 1 of ‘std::basic_string<_CharT, _Traits, Alloc>::basic_string(constCharT*, const _Alloc&) [with _CharT = char; _Traits = std::char_traits; _Alloc = std::allocator]’ [-Wconversion-null]". Although it suffers from the usual GCC's verbosity, at least the compiler is now able to spot the problem.)

The problem is, the std::string type is not native in C++ but provided by a library. The only string-like type accepted by the raw C++ language is a null-terminated array of ASCII characters. Therefore, std::string is able to automatically convert char * into std::string. The problem is, false is equivalent to 0, which is in turns equivalent to NULL, i.e., a char * pointer. But std::string cannot be initialized to NULL, hence the std::logic_error exception.

Since D has native string types, this kind of errors is less likely to occur. Consider a straightforward translation of the C++ code above into D:

import std.stdio;

void main()
{
    string s = false;
    writeln(s);
}

Compiling it using DMD 2.060 will produce the following error message:

$ dmd stringtest.d
stringtest.d(5): Error: cannot implicitly convert expression (false) of type bool to string

Regarding Ada, this kind of error is impossible as the language avoids almost every implicit typecast. This can be annoying sometimes, but it makes me feel confident of what I'm writing. (It is often said that if your Ada program compiles without errors, then it is probably correct.)