Stupid RCU Tricks: What if I Knew Then What I Know Now?

During my keynote at the 2017 Multicore World, Mark Moir asked what I would have done differently if I knew then what I know now, with the “then” presumably being the beginning of the RCU effort back in the early 1990s. Because I got the feeling that my admittedly glib response did not fully satisfy Mark, I figured I should try again. So imagine that you traveled back in time to the very end of the year 1993, not long after Jack Slingwine and I came up with read-copy lock (now read-copy update, or just RCU), and tried to pass on a few facts about my younger self’s future. The conversation might have gone something like this:

You   By the year 2017, RCU will be part of the concurrency curriculum at numerous universities and will be very well-regarded in some circles.
Me   Nice! That must mean that DYNIX/ptx will also be doing well!

You   Well, no. DYNIX/ptx will disappear by 2005, being replaced by the combination of IBM’s AIX and another operating system kernel started as a hobby.
Me   AIX??? Surely you mean Solaris, HP-UX or Ultrix! And I wouldn’t say that BSD started as a hobby! It was after all fully funded research.

You   No, Sun Microsystems was acquired by Oracle in 2010, and Solaris was already in decline by that time. IBM’s AIX was by then the last proprietary UNIX operating system standing. A new open-source kernel called “Linux” became the dominant OS.
Me   IBM??? But they are currently laying off more people each month than Sequent employs worldwide!!! Why would they even still be in business in 2010?

You   True. But their new CEO, Louis Gerstner, will turn IBM around.
Me   Well, yes, he did just become IBM’s CEO, but before that he was CEO of RJR Nabisco. That should work about as well as John Sculley’s tenure as CEO of Apple. What does Gerstner know about computers, anyway?

You   He apparently knew enough to get IBM back on its feet. In fact, IBM will buy Sequent, so that you will become an IBM employee on April 1, 2000.
Me   April Fools day? Now I know you are joking!!!

You   No joke. You will become an IBM employee on April 1, 2000, seven years to the day after Louis Gerstner became an IBM employee.
Me   OK, I guess that explains why DYNIX/ptx doesn’t make it past 2005. That is really annoying! So the teaching of RCU in universities is some sort of pity play, then?

You   No. Dipankar Sarma will get RCU accepted into Linux in 2002.
Me   I could easily believe that—he is very capable. So what do I do instead?

You   You will take over maintainership of RCU in 2005.
Me   Is Dipankar going to be OK?

You   Of course! He will just move on to other projects. It is just that there will be a lot more work needed on RCU, which you will take on.
Me   What more work could there be? It is a pretty simple mechanism, way simpler than a memory allocator, for example.

You   Well, there will be quite a bit of scalability work needed. For example, you will receive a scalability bug report involving a 512-CPU shared-mmeory system.
Me   Hmmm… It took Sequent from 1985 to 1997 to get from 30 to 64 CPUs, so that is doubling every 12 years, so I am guessing that I received this bug report somewhere near the year 2019. So what did I do in the meantime?

You   No, you will receive this bug report in 2004.
Me   512-CPU system in 2004??? Well, suspending disbelief, this must be why I will start maintaining RCU in 2005.

You   No, a quick fix will be supplied by a guy named Manfred Spraul, who writes concurrent Linux-kernel code as a hobby. So you didn’t do the scalability work until 2008.
Me   Concurrent Linux-kernel coding as a hobby? That sounds unlikely. But never mind. So what did I do between 2005 and 2008? Surely it didn’t take me three years to create a highly scalable RCU implementation!

You   You will work with a large group of people adding real-time capabilities to the Linux kernel. You will create an RCU implementation that allowed readers to be preempted.
Me   That makes absolutely no sense! A context switch is a quiescent state, so preempting an RCU read-side critical section would result in a too-short grace period. That most certainly isn’t going to help anything, given that a crashed kernel isn’t going to offer much in the way of real-time response!

You   I don’t know the details, but you will make it work. And this work will be absolutely necessary for the Linux kernel to achieve 20-microsecod interrupt and scheduling latencies.
Me   Given that this is a general-purpose OS, you obviously meant 20 milliseconds!!! But what could RCU possibly be doing that would contribute significantly to a 20-millisecond interrupt/scheduling delay???

You   No, I really did mean sub-20-microsecond latencies. By 2010 or so, even vanilla non-realtime Linux kernel will easily meet 20-millisecond latencies, assuming the hardware and software is properly configured.
Me   Ah, got it! CPU core clock rates should be somewhere around 50GHz by 2010, which might well make those sorts of latencies achievable.

You   No, power-consumption and heat-dissipation constraints will cap CPU core clock frequencies at about 5GHz in 2003. Most systems will run in the 1-3GHz range even as late as in 2017.
Me   Then I don’t see how a general-purpose OS could possibly achieve sub-20-microsecond latencies, even on a single-CPU system, which wouldn’t have all that much use for RCU.

You   No, this will be on SMP systemss. In fact, in 2012, you will receive a bug report complaining of excessively long 200-microsecond latencies on a system running 4096 CPUs.
Me   Come on! I believe that Amdahl’s Law has something to say about lock contention on such large systems, which would rule out reasonable latencies, let alone 200-microsecond latencies! And there would be horrible reliability problems with that many CPUs! You wouldn’t be able to keep the system running long enough to measure the latency!!!

You   Hey, I am just telling you what will happen.
Me   OK, so after I get RCU to handle insane scalability and real-time response, there cannot be anything left to do, right?

You   Actually, wrong. Energy efficiency becomes extremely important, and you will rewrite the energy-efficiency RCU code more than eight times before you get it right.
Me   Eight times??? You must be joking!!! Seems like it would be better to just waste a little energy. After all, computers don’t consume all that much energy, especially compared to industrial and transportation systems.

You   No, that would not work. By 2005, there are quite a few datacenters that are limited by electrical power rather than by floor space. So much so that large data centers open in Eastern Oregon, on the sites of the old aluminum smelters. When you have that many servers, even a few percent of energy savings translates to millions of dollars a year, which is well worth spending some development effort on.
Me   That is an insanely large number of servers!!! How many Linux instances are running by that time, anyway?

You   By the mid-2010s, the number of Linux instances is well in excess of one billion, but no one knows the exact number.
Me   One billion??? That is almost one server for every family in the world! No way!!!

You   Well, most of the Linux instances are not servers. There are a lot of household appliances running Linux, to say nothing of battery-powered handl-held smartphones. By 2017, most of the smartphones will have multiple CPUs.
Me   Why on earth would you need multiple CPUs to make a phone call? And how would you fit multiple CPUs into a hand-held device? And where do you put the battery, in a large backpack or something???

You   No, the entire device, batteries, CPUs and all, will fit easily into your shirt pocket. And these smartphones can take pictures, record video, do video conference calls, find precise locations using GPS, translate among multiple languages, and much else besides. They are really full-fledged computers that fit in your pocket.
Me   A pocket-sized supercomputer??? And how would I possibly go about testing RCU code sufficiently for your claimed billion instances???

You   Interesting question. You will give a keynote at the 2017 Multicore World in February 2017 at Wellington, New Zealand describing some of your plans. These plans include the use of formal verification in your regression test suite.
Me   Formal verification of highly concurrent code in a regression test suite??? OK, now I know for sure that you are pulling my leg! It has been an interesting conversation, but I must get back to reality!!!

My 1993 self did not have a very accurate view of 2017, did he? As the old saying goes, predictions are hard, especially about the future! So it is quite wise to take such predictions with a considerable supply of salt.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.