Carlos Fenollosa — Blog

Thoughts on science and tips for researchers who use computers

Which is the best programming language?

September 23, 2011 — Carlos Fenollosa

This classic question from beginners who start coding their own tools for research has only one correct answer: it depends.

If there is no language which is clearly better than the others, why do I have this very simple table on my Unix section?

  1. Bash, awk are your first choice
  2. PHP if you require more power
  3. C only if you know why
  4. Use Java & Eclipse

Don't get me wrong, this table has been compiled from quite a few years of experience, and there are huge assumptions behind it. The first one is that you still don't know what language you should use. If there is no doubt, either because there are some requirements for the project or because there is a language that is designed specifically for that task (e.g. CLIPS for declarative programming), then you need to ask yourself some questions.

I'll start by enumerating the most popular language choices and some of their features

Scripting languages

Scripting languages are good for small or medium projects, because they fit very well with the line of thought of the programmer. This means you can program while you think, which isn't the best for cleanliness, but it gets the work done quickly

  • bash is always your first choice. You already know how to run stuff in the command line, right? So this is basically the same. bash can handle functions and arrays, but that's pretty much all it can do. However, that's usually good enough for small routines, and you can always call other Unix tools. It also avoids the overhead of running another binary (perl, php) as it is already in memory.
  • perl is great to parse text, but slow for anything else. If you need to parse text and do math, use php, which has a faster math engine. It also lacks objects. However, there are very good scientific libraries for it, so you might be forced to use this language anyway.
  • php has nice libraries to connect to databases and in general do web stuff. It is also object oriented, so it is a suitable candidate for small-medium projects which can benefit from object orientation but don't need all the infrastructure from java or C++. In general, unless you are tied to perl, php is a better choice.
  • python is, well, another scripting language. It's way better than perl, and functionally similar to php, so you might want to use it if you like its clean syntax or need to call other python libraries.
    Edit: after having used python for a long time, it is the first language I'd recommend for most use cases
  • ruby is so painfully slow that you should avoid it at any costs. I am including it here only to warn other people against using it.

Compiled languages

Once you start compiling code, things get complicated. However, the results are usually great, fast, and very maintainable. Let's discuss the alternatives.

  • java why java first? Because it's the most appropriate. It has great developing tools (Eclipse), it checks a lot of stuff in compile time, it does not need that the programmer uses pointers—it uses pointers internally, but transparently—and in general is a modern, object-oriented language, which doesn't require legacy stuff like headers. Yes, it is a bit slower than pure C, but the latest versions of the java virtual machine compile to machine code on runtime and achieve great performance. Most computer scientists have mastered it and in general it is widely extended. It is versatile and can be used from simple routines to implement web pages with JSP, to CRMs. Yes, I like java.
  • C is the mother of all programming languages, but this does not mean that it's the best one. It's old, doesn't have objects, and for every byte optimization which earns 1 second of execution time, the programmer needs to waste ten minutes. Optimizations should be done at the compiler level, not the code level. However, C has great compilers, from the good-enough gcc to the awesome icc.
    My recommendation is that you use it only if you know what you're doing. It's awful to parse strings in C, it lacks many scientific libraries compared to perl or java—except math functions, but that's what R is for—and the segmentation faults in general can make you waste several days looking into the code because you declared a variable incorrectly and you have a pagination issue.
    Some might argue that C is as good as the programmer is, but honestly, it makes good programmers waste a lot of time because of small issues.
  • C++ is the alternative if you need to use C in an object-oriented environment. The compiler is also able to run more checks in compile time, so you'll waste less time, but I'd go for java anyway. There is no reason to choose C++ a priori but execution speed
  • objective-C Apple users are sometimes tempted to write obj-C code, because of the excellent development tools on a Mac, but keep in mind that probably there is nobody else who can look at that code afterwards and understand it, because almost nobody uses obj-C. So I'd suggest not to use it unless you're in a hardcore Mac environment or are planning to develop a GUI for a Mac afterwards.
  • fortran there are only two kinds of people who use fortran: physicists and the poor fellows who have to maintain their code afterwards. It was designed for the 50s computers, which means that using it nowadays would be like using steering wheels from John Ford on today's cars. It is easier to understand f2c generated code than the original one. There is not a single reason to use fortran. If you need raw speed use C. If you want to write unmaintainable code, well, use obfuscated perl.

Choosing a language

Now for the difficult task of choosing a language. If you look again at the four items on top of this post, having read the language descriptions above, you might start to see what's going on. This is basically a matter of choosing the right language for your specific task, with some decisions.

Which are my time constraints? Beginner programmers often forget that, for homemade software, the total time constraint is the time you spend programming plus the time you spend running it. If the routine is expected to take 10 minutes, don't waste two hours writing a C program with pointers, write a simple script.

Can I solve it with a simple script? If the answer is yes, use bash. It's a great scripting language, and you can build on top of other Unix tools, like awk, sed, etc.

However, keep in mind that every time you call an external program, the system needs to fork() and, for large loops, this can be a huge overhead. Be rational, and think again of the execution time. Instead to launch 10,000 sed to parse lines, it might be better to write a php script, which is more powerful than bash, it won't need to fork() and the code will probably be simpler.

Will I need to maintain it or reuse the code? Will the code grow? If you think this code can be reused as a library, or integrated into other modules, think of making it into an actual C library or java class. Running scripts within scripts within a big project is generally a bad idea. And please keep in mind that, in a research environment, at some point another person will need to look at your code, so besides writing clean and understandable code, try not to use obscure languages or tools which only you know of.

Do I need to achieve the maximum speed and/or optimizations? Keeping in mind that the latest versions of the java virtual machine are pretty fast, yes, the winner here is C. But we're talking about software which can take you two weeks to code, and which would take months to run if written in perl, but only takes three hours when coded in C. When this happens, choose C.

Will it need to run on different platforms? java and the scripting languages are the only ones which guarantee perfect execution on every environment: Windows, Mac, Linux, Solaris, BSD and others. C can be compiled in different architectures but it's sometimes hard to replace mmap's on Windows or compile against different versions of the libc in different flavors of Linux.

Summary

Let's review the four initial points again.

  1. bash is a great initial choice for small projects which will take about 20 minutes to run and you don't want to waste three hours programming them
  2. php is appropriate for medium projects, which use objects, parse text and do math. perl is another good choice at this point.
  3. C is better left for experts or people who need hardcore optimizations. The rest of us will leave optimizations for the compiler/interpreter and just try to write good code which runs in O(n) if possible.
  4. java is the king of tools and libraries, multi-platform, scales great for big projects, is surprisingly fast and very respectful with novices. Its only drawback is the need of a java virtual machine, but hey, if you use perl you will need its libraries installed, too.

In the end, everyone has their preferred languages, which is fine. It is far more important to write good code than it is to choose the language which fits best for a task. However, failing to foresee the importance of a math routine in the and writing it in perl can lead to the whole research group wasting time until somebody else writes it in C and makes it 1000x faster. Yep, true story. So choose wisely.

Tags: programming

Comments? Tweet