Parallel Programming: Terrible Tooling

Posted on January 19, 2010 by paulmckrcu

In a previous life, I worked on a UNIX kernel on a team of a few tens of developers. This smallish team size resulted in a very smallish set of development tools. To see why, imagine a tool that required one developer one year to create, and that would provide a 1% improvement in everyone’s productivity. Given 50 developers, it would take two years to recoup the time spent developing this tool. In contrast, consider the Linux kernel, which usually has thousands of developers. Assuming 1,000 developers, it would take less than six weeks to recoup the time spent developing this tool.

This example does much to explain the large quantity of tooling available for the Linux kernel. For example, the UNIX kernel I worked on did not have anything like lockdep, powertop, latencytop, or perf for in-kernel use, nor anything like valgrind or perf for user-mode use. Given the relatively small number of developers on that project, it almost always made more sense to find and fix problems manually.

It is important to note that this effect is due to the number of Linux-kernel developers, not to the FOSS nature of the Linux kernel — except insofar as the FOSS nature of the Linux kernel has enabled it to accumulate a large number of developers.

So what does this have to do with the perceived difficulty of parallel programming?

This entry was posted in Uncategorized and tagged is parallel programming hard, locking, parallel, performance. Bookmark the permalink.

6 Responses to Parallel Programming: Terrible Tooling

Anonymous says:

April 11, 2010 at 8:15 pm

Where are the tools

In the good old day programming was left to the programming language (C) and parallelism to the OS (UNIX).

I do not think there were many languages available that enable parallelism in the language itself. This is caused by the programming paradigms on which the languages are based (structured, functional). The paradigm most suited for parallel programming (data flow programming) lends itself better to graphical programming then text, and the important implementations of languages based on this paradigm are exactly one: Labview.

But we are going to need a language that does more then ‘parallel’, we need to be able to specify the real time behavior *in the language*, and we are going to need to program *redundancy* in the language as well.

This need for parallel, real time, redundant systems arises from the fact that more code goes into embedded devices than into typical office systems.

And the language to support all of this still needs to be invented.
- paulmckrcu says:
  
  April 12, 2010 at 12:13 am
  
  Those who fail to learn from history…
  
  There were quite a few large parallel programs written in the good old days, so I do not believe that it is accurate to say that parallelism was left to the OS. And parallelism is finally being added to C and C++, and is being refined in Java and in quite a few other languages. A few decades late, perhaps, but hopefully better late than never.
  
  I am not convinced by your baldly asserted set of requirements. You might well be correct, but as noted earlier, history does not appear to corroborate your assertion that a new language is the answer.
  
  Maybe things will be different this time. But if so, the burden of proof lies squarely on your shoulders, not mine.
  - Anonymous says:
    
    April 14, 2010 at 2:45 pm
    
    Re: Those who fail to learn from history…
    
    Well, I did not mean to say no parallel programs have been written in the past. Of course not.
    
    But few languages support parallel programming natively. Generally calls are made to the OS to support parallelism. Consider spawning processes under unix/c, or embedded RTOS/c. In all cases you library functions that are not native to the language. I admit that preemptive multitasking is a special case of parallelism, and that languages do exist that support parallelism.
    
    A dataflow language like LabVIEW does this intrinsically by:
    – polymorfic operators allowing calculations on arrays at a time, which can be done in parallel
    – scheduling functions (‘vi’s) to execute when the input data becomes available, so scheduling becomes automatic, and so does parallelism
    
    However, in my experience real time programming is not supported well by any language I know (well maybe VHDL would).
    
    If you know of a language that does it all natively, I am certainly interested.
    
    Ferry
    - paulmckrcu says:
      
      April 14, 2010 at 3:22 pm
      
      Languages supporting parallelism
      
      OK, glad to hear that you agree that parallel programs have in fact been written. 🙂
      
      Here are but a few examples of languages natively supporting parallelism:
      
      SQL, as in database kernels (parallelism hidden from developer)
      CodeSourcery VSIPL++, Matlab*p, and Rapidmind (parallelism largely hidden from developer)
      OpenMP (parallelism hints supplied by developer, but parallelism controlled by implementation)
      Ada (task/rendevous model—certainly not my favorite, but an example nonetheless)
      Java (though you would probably argue that task creation is in task libraries, and I would then ask why that should matter to anyone)
      Cilk (spawn/sync model, though I give these guys a big black mark for promotion of the flawed Fibonacci benchmark)
      C/C++ (pthreads, but upcoming versions of these standards are integrating concurrency as a first-class portion of the language)
      
      I believe that you will find that spawning processes involves systems calls regardless. These system calls might be hidden from the developer (as in SQL), wrapped in a language keyword or hint (as in OpenMP and Cilk), or implemented in a library (as in pthreads, Java, and others).
      
      Some language support is of course absolutely required. But there are many ways to provide support for parallelism, and it is not yet clear which is best, or even that a single approach can be best for all situations.
      
      Given your interest in array-at-a-time calculations, I recommend you look at OpenMP, CodeSourcery, RapidMind, and Matlab*p. Given your interest in graphic representation of parallelism, I recommend you look at GEDAE. And given that more than 200 different parallel languages/environments were put forward in the 1990s alone, this list doesn’t even scratch the surface.
      - Anonymous says:
        
        April 19, 2010 at 7:56 pm
        
        Re: Languages supporting parallelism
        
        I have never tried to implement a Fourier Transform in SQL, or a user interface. As far as I know being a language doesn’t make it a programming language.
        
        I know that Ada was supposed to do everything, including parallelism, but I haven’t had the chance to try it yet. But I doubt that it does what I am trying to explain.
        
        And yes I like graphical programming, but that is not the point. The point is that a dataflow language executes things in parallel by nature. Of course there would a scheduler working behind the scenes, from the OS or from the run time no matter. In the real world it could be necessary have multiple tasks, and have them rendezvous, but this is a matter of synchronizing parallel code, not of parallelism.
        
        The above all exist, in one form or another. But how would you design a hard real time program, with concurrent code, and know, even prove, before running it that it will not miss deadlines.
        
        I still think there is a lack of methodology and tools to do these things *by design*.
        
        Feryr
        
        paulmckrcu says:
        
        April 21, 2010 at 4:13 am
        
        Performance, Productivity, Generality: Pick Any Two
        
        SQL is most certainly not intended to be a general-purpose programming language, and its specialized nature is its great strength from both a productivity and a parallel-performance viewpoint. I would by no means recommend using it for either Fourier Transforms or graphical user interfaces. Of course, even within its area of specialty, SQL is by no means perfect, as this Verity Stob article points out in that inimitable Verity Stob fashion.
        
        I believe we can agree that the best software environment for a given person is that environment that enables that person to get the job done quickly, easily, and cost-effectively. We should therefore expect software environments to be as varied as are people: one size will not fit all. Graphical interfaces, with or without dataflow might well be great for some people and not so much for others.
        
        And I most certainly do agree that the area of concurrent real-time programming is in its infancy, so there should be no shortage of opportunities for improvement in this area. Perhaps some of these opportunities are just the right size for your graphical dataflow environment, or perhaps not. This is for neither you nor me to decide, but rather for those people who have concurrent real-time programming jobs that they need to get done. And even among this group, it is quite likely that one size will not fit all, given the incredible variety of concurrent real-time systems.

Comments are closed.