paulmckrcu

Can the Kernel Concurrency Sanitizer Own Rust Code?

Posted on October 6, 2021 by paulmckrcu

Given the data-race-freedom guarantees of Rust’s non-unsafe code, one might reasonably argue that there is no point in the Kernel Concurrency Sanitizer (KCSAN) analyzing such code. However, the Linux kernel is going to need unsafe Rust code, and although there has been significant progress in formal verification of such code, one of the leading researchers in this area says “we hope to eventually develop formal methods that can be put directly in the hands of programmers”. We would be wise to take careful note of the word “eventually”. Furthermore, even given unanticipated universal acclamation of Rust within the Linux kernel community combined with equally unanticipated advances in C-to-Rust translation capabilities, a significant fraction of the existing tens of millions of lines of Linux-kernel C code will persist for some time to come. Both the unsafe Rust code and the C code can interfere with Rust non-unsafe code, and furthermore there are special cases where safe code can violate unsafe code’s assumptions. Therefore, run-time analysis of Rust code (safe code included) is likely to be able to find issues that compile-time analysis cannot.

As a result, Rust seems unlikely to render KCSAN obsolete any time soon. This suggests that Rust code should include KCSAN instrumentation, both via the compiler and via atomic and volatile primitives, as is currently the case for the C compilers and primitives currently in use. The public KCSAN interface is as follows:

__kcsan_check_access(): This function makes a relevant memory access known to KCSAN. It takes a pointer to the object being accessed, the size of the object, and an access type (for example, zero for read, KCSAN_ACCESS_WRITE for write, and KCSAN_ACCESS_ATOMIC for atomic read-modify-write).
The __tsan_{read,write}{1,2,4,8}() family of functions serve the same purpose as __kcsan_check_access(), but the size and access type are implicit in the function name instead of being passed as arguments, which can provide better performance. This family of functions is used by KCSAN-enabled compilers.
ASSERT_EXCLUSIVE_ACCESS() is invoked from C code and causes KCSAN to complain if there is a concurrent access to the specified object. A companion ASSERT_EXCLUSIVE_ACCESS_SCOPED() expands the scope of the concurrent-access complaint to the full extent of the compound statement invoking this macro.
ASSERT_EXCLUSIVE_WRITER() is invoked from C code and causes KCSAN to complain if there is a concurrent write to the specified object. A companion ASSERT_EXCLUSIVE_WRITER_SCOPED() expands the scope of the concurrent-writer complaint to the full extent of the compound statement invoking this macro.
ASSERT_EXCLUSIVE_BITS() is similar to ASSERT_EXCLUSIVE_WRITER(), but focuses KCSAN’s attention on changes to bits set in the mask passed as this macro’s second argument.

These last three categories of interface members are designed to be directly invoked in order to check concurrency designs. See for example the direct call to ASSERT_EXCLUSIVE_WRITER() from the rcu_cpu_starting() function in Linux-kernel RCU. It would of course be quite useful for these interface members to be available from within Rust code.

KCSAN also provides a data_race() primitive that suppresses data-race diagnostics, but allows the compiler free rein on optimization. This is used in the Linux kernel for things like infrequent diagnostics that are not part of the overall concurrency design.

KCSAN has helped me fix a number of embarrassing bugs in Linux-kernel RCU that might otherwise have inconvenience end users, so they just might be helpful to people introducing Rust code into the Linux kernel. It is therefore quite encouraging that Marco Elver (KCSAN lead developer and maintainer) points out off-list that the rustc compiler already has sanitizer support, which should suffice not only for KCSAN, but for the equally helpful Kernel Address Sanitizer (KASAN) as well. He also notes that some care is required to pass in the relevant -mllvm options to rustc and to ensure that it is not attempting to link against compiler-rt because doing so is not appropriate when building the Linux kernel. Marco also noted that KCSAN relies on the front-end to properly handle volatile accesses, and Miguel Ojeda kindly confirmed that it does. So there is hope!

For more information on KCSAN and its reason for being:

“Concurrency bugs should fear the big bad data-race detector” (Part 1 and Part 2).
Who’s afraid of a big bad optimizing compiler?.
Calibrating your fear of big bad optimizing compilers.
Sections 4.3.4 (“Accessing Shared Variables”), 15.2 (“Tricks and Traps”), and 15.3 (“Compile-Time Consternation”) of perfbook.

History

October 7, 2021: Add current state of KCSAN/Rust integration. Clarify Rust module per Gary Guo email.
October 12, 2021: Self-review.
October 28, 2021: Add data_race() primitive.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

Will Your Rust Code Survive the Attack of the Zombie Pointers?

Posted on October 5, 2021 by paulmckrcu

Some of the previous posts in this series have been said to be quite difficult, so I figured I owed you all an easy one. And the zombie-pointer problem really does have a trivial solution, at least in the context of the Linux kernel. In other environments, all bets are off.

But first, what on earth is a zombie pointer?

To answer that question, let’s start with a last-in-first-out (LIFO) stack. We don’t know when this algorithm was invented or who invented it, but a very similar algorithm was described in R. J. Treiber’s classic 1986 technical report (IBM Almaden Research Center RJ 5118), and it was referred to as prior art in (expired) US Patent 3,886,525, which was filed in 1973 (hat trick to Maged Michael). Fun though it might be to analyze this code’s original IBM assembly-language implementation, let’s instead look at a C11 implementation:

 1 struct node_t* _Atomic top;
 2
 3 void list_push(value_t v)
 4 {
 5   struct node_t *newnode = (struct node_t *) malloc(sizeof(*newnode));
 6
 7   set_value(newnode, v);
 8   newnode->next = atomic_load(&top);
 9   do {
10     // newnode->next may have become invalid
11   } while (!atomic_compare_exchange_weak(&top, &newnode->next, newnode));
12 }
13
14 void list_pop_all()
15 {
16   struct node_t *p = atomic_exchange(&top, NULL);
17
18   while (p) {
19     struct node_t *next = p->next;
20      
21     foo(p);
22     free(p);
23     p = next;
24   }
25 }

Those used to the Linux-kernel cmpxchg() atomic operation should keep in mind that when the C11 atomic_compare_exchange_weak() fails, it writes the current value referenced by its first argument to the pointer passed as its second argument. In other words, instead of passing atomic_compare_exchange_weak() the new value, you instead pass it a pointer to the new value. Each time atomic_compare_exchange_weak() fails, it updates the pointed-to new value. This sometimes allows this function to be used in a tight loop, as in line 11 above.

With these atomic_compare_exchange_weak() semantics in mind, let’s step through execution of list_push(C) on a stack initially containing objects A and B:

Line 5 allocates memory for the new object C.
Line 7 initializes this newly allocated memory.
Line 8 sets the new object’s ->next pointer to the current top-of-stack pointer.
Line 11 invokes atomic_compare_exchange_weak(), which atomically compares the value of newnode->next to the value of top, and if they are equal, stores the pointer newnode into top and returns true. (As noted before, if the comparison is instead not-equal, atomic_compare_exchange_weak() instead stores the value of top into newnode->next and returns false, repeating until such time as the pointers compare equal.)
One way or another, the stack ends up containing objects C, A, and B, ordered from the top of stack down.

Oddly enough, this code would still work even if line 8 were omitted, however, that would result in a high probability that the first call to atomic_compare_exchange_weak() would fail, which would not be so good for common-case performance. It would also result in compile-time complaints about uninitialized variables, so line 8 is doubly unlikely to be omitted in real life.

While we are at it, let’s step through list_pop_all():

Line 16 atomically stores NULL into top and returns its previous value. After this lines executes, the stack is empty and its previous contents are referenced by local variable p.
Lines 18-24 execute for each object on list p:
1. Line 19 prefetches a pointer to the next object.
2. Line 21 passes the current object to function foo().
3. Line 22 frees the current object.
4. Line 23 advances to the next object.

Thus far, we have seen no zombie pointers. Let’s therefore consider the following sequence of events, with the stack initially containing objects A and B:

Thread 1 starts pushing object C, but is interrupted, preempted, otherwise impeded just after executing line 8.
Thread 2 pops the entire list and frees all of the objects. Object C’s ->next pointer is now invalid because it points to now-freed memory formerly known as object A.
Thread 2 pushes an object, and happens to allocate memory for it at the same address as the old object A, so let’s call it A’.
Object C’s ->next pointer is still invalid as far as an omniscient compiler is concerned, but from an assembly language viewpoint happens to reference the type-compatible object A’. This pointer is now a “zombie pointer” that has come back from the dead, or that has at least come to have a more entertaining form of invalidity.
Thread 1 resumes and executes the atomic_compare_exchange_weak() on line 11. Now this primitive, like its Linux-kernel counterpart cmpxchg(), operates not on the compiler’s idea of what the pointer is, but rather on the actual bits making up that pointer. Therefore, that atomic_compare_exchange_weak() succeeds despite the fact that the object C’s ->next pointer is invalid.
Thread 1 has thus completed its push of object C, and the resulting stack looks great from an assembly-language viewpoint. But the pointer from object C to A’ is a zombie pointer, and compilers are therefore within their rights to do arbitrarily strange things with it.

As noted earlier, this has a trivial solution within the Linux kernel. Please do not peek too early. 😉

But what can be done outside of the Linux kernel?

The traditional solution has been to hide the allocator from the compiler. After all, if the compiler cannot see the calls to malloc() and free(), it cannot figure out that a given pointer is invalid (or, in C-standard parlance, indeterminate). Creating your own allocator with its own function names suffices, and back in the day you often needed to create your own allocator if performance and scalability was important to you.

However, this would also deprive Rust of information that it needs to check pointer operations. So maybe Rust needs to know about allocation and deallocation, but needs to be very careful not to pass this information on to the optimizer. Assuming that this even makes sense in the context of the implementation of Rust.

Perhaps code working with invalid and zombie pointers needs to be carefully written in C. And perhaps there is some way to write it safely in unsafe Rust.

The final approach is to wait until backend support for safe handling of invalid/zombie pointers arrives, and then add Rust syntax as appropriate. Here is some documentation of current efforts towards adding such support to C++:

So there is good progress, but it might still be some time before this is supported by either the standard or by compilers.

History

October 6, 2021: Expand description per feedback from Jonathan Corbet
October 12, 2021: Self-review.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

How Much of the Kernel Can Rust Own?

Posted on October 4, 2021 by paulmckrcu

Rust concurrency makes heavy use of ownership and borrowing. The purpose of this post is not to give an exposition of Rust’s capabilities and limitations in this area, but rather to give a series of examples of ownership in the Linux kernel.

The first example involves Linux-kernel per-CPU variables. In some cases, such variables are protected by per-CPU locks, for example, a number of fields in the per-CPU rcu_data structure are used by the kernel threads that manage grace periods for offloaded callbacks, and these fields are protected by the ->nocb_gp_lock field in the same instance of that same structure. In other cases, access to a given per-CPU variable is permitted only by the corresponding CPU, and even then only if that CPU has disabled preemption. For example, the per-CPU rcu_data structure’s ->ticks_this_gp field may be updated only from the corresponding CPU, and only when preemption is disabled. In the particular case, preemption is disabled as a side-effect of having disabled interrupts.

The second example builds on the first. In kernels built with CONFIG_RCU_NOCB_CPU=n, the per-CPU rcu_data structure’s ->cblist field may be updated from the corresponding CPU, and only when preemption is disabled. However, it is also allowed from some other CPU when the corresponding CPU has been taken offline, but only when that other CPU that is orchestrating the offlining of the corresponding CPU.

(What about kernels built with CONFIG_RCU_NOCB_CPU=y? They must also acquire a ->nocb_lock that is also contained within the per-CPU rcu_data structure.)

The third example moves to the non-per-CPU rcu_node structures, which are arranged into a combining tree that processes events from an arbitrarily large number of CPUs while maintaining bounded lock contention. Each rcu_node> structure has a ->qsmask bitmask that tracks which of its children need to report a quiescent state for the current grace period, and a ->gp_tasks pointer that, when non-NULL, references a list of tasks that blocked while in an RCU read-side critical section that blocks the current grace period. The children of each leaf rcu_node structure are the rcu_data structures feeding into that rcu_node structure. Finally, there is a singleton rcu_state structure that contains a ->gp_seq field that numbers the current grace period and also indicates whether or not it is in progress.

We now have enough information to state the ownership rule for the rcu_state structure’s ->gp_seq field. This field may be updated only if all of the following hold:

The current CPU holds the root rcu_node structure’s ->lock.
All rcu_node structures’ ->qsmask fields are zero.
All rcu_node structures’ ->gp_tasks pointers are NULL.

This might seem excessively ornate, but it is the natural consequence of RCU’s semantics and design:

The next grace period is not permitted to start until after the previous one has ended. This is a design choice that allows RCU to function well in the face of update-side overload from large numbers of CPUs.
A grace period is not allowed to end until all CPUs have been observed in a quiescent state, that is, until all rcu_node structures’ ->qsmask fields have become zero.
A grace period is not allowed to end until all tasks that were preempted while executing within a critical section blocking that grace period have resumed and exited their critical sections, that is, until all rcu_node structures’ ->gp_tasks pointers have become NULL.
If multiple CPUs would like to start a grace period at the same time, there has to be something that works out which CPU is actually going to do the starting, and the last part of that something is the root rcu_node structure’s ->lock.

Trust me, there are far more complex ownership models in the Linux kernel!

The fourth example involves per-CPU data that is protected by a reader-writer lock. A CPU is permitted to both read and update its own data only if it read-holds the lock. A CPU is permitted to read other CPUs’ data only if it write-holds the lock. That is right, CPUs are allowed to write when read-holding the lock and and are allowed to read while write-holding the lock. Perhaps reader-writer locks should instead have been called exclusive-shared locks, but it is some decades too late now!

The fifth and final example involves multiple locks, for example, when some readers must traverse the data structure from an interrupt handler and others must sleep while traversing that same structure. One way to implement this is to have one interrupt-disabled spinlock (for readers in interrupt handlers) and a mutex (for readers that must sleep during the traversal). Updaters must then hold both locks.

So how on earth does the current C-language Linux kernel code keep all this straight???

One indispensable tool is the assertion, which in the Linux kernel includes WARN(), BUG(), and friends. For example, RCU uses these to verify the values of the aforementioned ->qsmask and ->gp_tasks fields during between-grace-periods traversals of the rcu_node combining tree.

Another indispensable tool is lockdep, which, in addition to checking for deadlocks, allows code to verify that conditions are right. A few of the many available lockdep assertions are shown below:

lockdep_assert_held(): Complains if the specified lock is not held by the current CPU/task.
lockdep_assert_held_write(): Complains if the specified reader-writer lock is not write-held by the current CPU/task.
lockdep_assert_held_read():: Complains if the specified reader-writer lock is not read-held by the current CPU/task.
lockdep_assert_none_held_once(): Complains if the current CPU/task holds any locks.
lockdep_assert_irqs_disabled(): Complains if the current CPU has interrupts enabled.
rcu_read_lock_held(): Complains if the current CPU/task is not within an RCU read-side critical section.

Of course, none of these assertions are going to do anything for you unless you make use of them. On the other hand, some of the more complex ownership criteria are going to be quite difficult for compilers or other tooling to intuit without significant help from the developer.

A more entertaining ownership scenario is described in Section 5.4.6 (“Applying Exact Limit Counters”) of perfbook. Chapter 8 (“Data Ownership”) describes several other ownership scenarios.

History

October 12, 2021: Self-review.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

Can Rust Code Own RCU?

Posted on October 4, 2021 by paulmckrcu

Read-copy update (RCU) vaguely resembles a reader-writer lock [1], but one in which readers do not exclude writers. This change in semantic permits RCU readers to be exceedingly fast and scalable. In fact, in the most aggressive case, rcu_read_lock() and rcu_read_unlock() generate no code, and rcu_dereference() emits but a single load instruction. This most aggressive case is achieved in production via Linux-kernel CONFIG_PREEMPT_NONE=y builds. Additional information on RCU is presented at the end of this post.

Can Rust Ownership be Adapted to RCU?

The fact that RCU readers do not exclude writers opens the door to data races between RCU readers and updaters. This in turn suggests that least some RCU use cases are likely to pose challenges to Rust’s ownership semantics, however, discussions with Wedson Almeida Filho at Linux Plumbers Conference suggested that these semantics might be flexed a little bit, perhaps in a manner similar to the way that Linux-kernel RCU flexes lockdep’s semantics. In addition, the data races between the read-side rcu_dereference() primitive and the update-side rcu_assign_pointer() primitive might reasonably be resolved by having the Rust implementation of these two primitives use unsafe mode. Or, perhaps better yet, the Rust implementation of these two primitives might simply wrapper the Linux-kernel primitives in order to get the benefit of existing tools such as the aforementioned lockdep and the sparse static-analysis checker. One downside of this approach is potential performance degradation due to the extra function call implied by the wrappering. Longer term, LTO might eliminate this function call, but if the initial uses of Rust are confined to performance-insensitive device drivers, the extra overhead should not be a problem.

Linux-Kernel Tooling and RCU

Here are a few examples of these Linux-kernel tools:

A pointer can be marked __rcu, which will cause the sparse static-analysis checker to complain if that pointer is accessed directly, instead of via rcu_dereference() and rcu_assign_pointer(). This checking can help developers avoid accidental destruction of the address and data dependencies on which many RCU use cases depend.
In most RCU use cases, the rcu_dereference() primitive that accesses RCU-protected pointers must itself be protected by rcu_read_lock(). The Linux-kernel lockdep facility has been therefore adapted to complain about unprotected uses of rcu_dereference().
Passing the same pointer twice in quick succession to a pair of call_rcu() invocations is just as bad as doing the same with a pair of kfree() invocations: Both situations are a double free. The Linux kernel’s debug-objects facility checks for this sort of abuse.

This is by no means a complete list. Although I am by no means claiming that these sorts of tools provide the same degree of safety as does Rust safe mode, it is important to understand that Rust unsafe mode is not to be compared to standard C, but rather to the dialect of C used in the Linux kernel, augmented by the associated tools, processes, and coding guidelines.

RCU Use Cases

There are in fact Rust implementations of variants of RCU, including:

I currently have no opinion on whether or not these could be used within the Linux kernel, and any such opinion is irrelevant. You see, Rust’s use of RCU within the Linux kernel would need to access RCU-protected data structures provided by existing C code. In this case, it will not help for Rust code to define its own RCU. It must instead interoperate with the existing Linux-kernel RCU implementations.

But even interoperating with Linux-kernel RCU is not trivial. To help with this task, this section presents a few RCU use cases within the Linux kernel.

The first use case is the textbook example where once an RCU-protected object is exposed to readers, its data fields remain constant. This approach allows Rust’s normal read-only mode of ownership to operate as intended. The pointers will of course change as objects are inserted or deleted, but these are handled by rcu_dereference() and rcu_assign_pointer() and friends. These special primitives might be implemented using unsafe Rust code or C code, as the case might be.

In-place updates of RCU-protected objects are sometimes handled using the list_replace_rcu() primitive. In this use case, a new version of the object is allocated, the old version is copied to the new version, any required updates are carried out on the new version, and then list_replace_rcu() is used to make the new version available to RCU readers. Readers then see either the old version or the new version, but either way they see an object with constant data fields, again allowing Rust ownership to work as intended.

However, one complication arises for objects removed from an RCU-protected data structure. Such objects are not owned by the updater until after a grace period elapses, and they can in fact be accessed by readers in the meantime. It is not clear to me how Rust would handle this. For purposes of comparison, within the Linux kernel, the Kernel Concurrency Sanitizer (KCSAN) handles this situation using dynamic runtime checking. With this checking enabled, when an updater takes full ownership of a structure before readers are done with it, KCSAN reports a data race. These issues should be most immediately apparent to anyone attempting to create a Rust wrapper for the call_rcu() function.

Because RCU readers do not exclude RCU updaters, it is possible for an RCU reader to upgrade itself to an updater while traversing an RCU-protected data structure. This is usually handled by a lock embedded in the object itself. From what I understand, the easiest way to apply Rust ownership to these situations is to make the RCU-protected object contain a structure that in turn contains the data protected by the lock. People wishing to apply Rust to these sorts of situations are invited to review the Linux-kernel source code to check for possible complications, for example, cases where readers locklessly read fields that are updated under the protection of the lock, and cases where multiple locks provide one type of protection or another to overlapping subsets of fields in the RCU-protected object.

Again because RCU readers do not exclude RCU updaters, there can also be lockless in-place updates (either from readers or updaters) to RCU-protected objects, for example, to set flags, update statistics, or to provide type-safe memory for any number of lockless algorithms (for example, using SLAB_TYPESAFE_BY_RCU). Rust could presumably use appropriate atomic or volatile operations in this case.

One important special case of RCU readers locklessly reading updated-in-place data is the case where readers cannot be allowed to operate on an object that has been removed from the RCU-protected data structure. These situations normally involve a per-object flag and lock. Updaters acquire the object’s lock, remove the object from the structure, set the object’s flag, and release the lock. Readers check the flag, and if the flag is set, pretend that they did not find the corresponding object. This class of use cases illustrate how segregating read-side and update-side fields of an RCU-protected object can be non-trivial.

RCU is sometimes used to protect combinations of data structures, sometimes involving nested RCU readers. In some cases, a given RCU read-side critical section might span a great many functions across several translation units.

It is not unusual for RCU use cases to include a search function that is invoked both by readers and updaters. This function will therefore sometimes be within an RCU read-side critical section and other times be protected by the update-side lock (or by some other update-side synchronization mechanism).

Sequence locking is sometimes used in conjunction with RCU, so that RCU protects traversal of the data structure and sequence locking detects updates profound enough to cause problems for the RCU traversals. The poster boy for this use case is in the Linux kernel’s directory-entry cache. In this case, certain sequences of rename operations could fool readers into traversing pathnames that never actually existed. Using the sequence lock named rename_lock to protect such traversals allows readers to reject such bogus pathnames. More information on the directory-entry cache is available in Neil Brown’s excellent Linux Weekly News series (part 1, part 2, part 3). One key take-away of this use case is that sequence locking is sometimes used to protect arbitrarily large data structures.

Sequence locking can be used to easily provide readers with a consistent view of a data structure, for a very wide range of definitions of “consistent”. However, all of these techniques allow updaters to block readers. There are a number of other approaches to read-side consistency both within and across data structures, including the Issaquah Challenge.

One final use case is phased state change, where RCU readers check a state variable and take different actions depending on the current state. Updaters can change the state variable, but must wait for the completion of all readers that might have seen the old value before taking further action. The rcu-sync functionality (rcu_sync_init() and friends) implements a variant of RCU-protected phased state change. This use case is notable in that there are no linked data structures and nothing is ever freed.

This is by no means an exhaustive list of RCU use cases. RCU has been in the Linux kernel for almost 20 years, and during that time kernel developers have come up with any number of new ways of applying RCU to concurrency problems.

Rust RCU Options

The most straightforward approach is to provide Rust wrappers for the relevant C-language RCU API members. As noted earlier, wrappering the low-overhead read-side API members will introduce function-call overhead in non-LTO kernels, but this should not be a problem for performance-insensitive device drivers. However, fitting RCU into Rust’s ownership model might not be as pretty as one might hope.

A more utopian approach would be to extend Rust’s ownership model, one way or another, to understand RCU. One approach would be for Rust to gain “type safe” and “guaranteed existence” ownership modes, which would allow Rust to better diagnose abuses of the RCU API. For example, this might help with pointers to RCU-protected objects being erroneously leaked from RCU read-side critical sections.

So how might a pointer to an RCU-protected object be non-erroneously leaked from an RCU read-side critical section, you ask? One way is to acquire a reference via a reference count within that object before exiting that critical section. Another way is to acquire a lock within that object before exiting that critical section.

So what does the Linux kernel currently do to find this type of pointer leak? One approach is to enable KCSAN and build the kernel with CONFIG_RCU_STRICT_GRACE_PERIOD=y, and to run this on a small system. Nevertheless, this might be one area where Rust could improve Linux-kernel diagnostics, albeit not without effort.

For More Information

Additional information on RCU may be found here:

Linux-kernel documentation, with special emphasis on Linux-kernel RCU’s requirements.
The following sections of perfbook discuss other aspects of RCU:
1. Section 9.5 (“Read-Copy Update”).
2. Section 10.3 (“Read-Mostly Data Structures”).
3. Section 13.5 (“RCU Rescues”).
4. Section 15.4.2 (“RCU [memory-ordering properties ]”).
The RCU API, 2019 Edition.
Userspace RCU.
Folly-library RCU.
Section 9.6.3.3 of perfbook lists several other RCU implementations.
Proposed Wording for Concurrent Data Structures: Read-Copy-Update (RCU) (C++ Working Paper).

RCU’s semantics ordered by increasing formality:

The “Fundamental Requirements” section of the aforementioned Linux-kernel RCU requirements. Extremely informal, but you have to start somewhere!
RCU Semantics: A First Attempt. Also quite informal, but with a bit more math involved.
Verifying Highly Concurrent Algorithms with Grace (extended version). Formal semantics expressed in separation logic.
Frightening Small Children and Disconcerting Grown-ups: Concurrency in the Linux Kernel. Executable formal semantics expressed in Cat language. This paper also includes a proof that the traditional “wait for all pre-existing readers” semantic is equivalent to these executable formal semantics within the context of the Linux-kernel memory model.
Linux-kernel memory model (LKMM), see especially the Sync-rcu and Sync-srcu relations in file linux-kernel.cat.

Furthermore, significant portions of Linux-kernel RCU have been subjected to mechanical proofs of correctness:

Lihao Liang applied the C Bounded Model Checker (CBMC) to Tree RCU (paper).
Michalis Kokologiannakis applied Nidhugg to Tree RCU (paper).
Lance Roy applied CBMC to Classic SRCU, represented by Linux-kernel commit 418b2977b343 (“rcutorture: Add CBMC-based formal verification for SRCU”). Sadly, the scripting has not aged well.

For more information, see Verification Challenge 6.

For those interested in the userspace RCU library, there have been both manual and mechanical proofs of correctness. A manual proof of correctness of userspace RCU in the context of a linked list has also been carried out.

Endnotes

[1]

At its core, RCU is nothing more nor less than a way of waiting for already-started things to finish. Within the Linux kernel, something to be waited on is delimited by rcu_read_lock() and rcu_read_unlock(). For their part, synchronize_rcu() and call_rcu() do the waiting, synchronously and asynchronously, respectively.

It turns out that you can do quite a lot with this simple capability, including approximating some reader-writer-lock use cases, emulating a number of reference-counting use cases, providing type-safe memory, and providing existence guarantees, this last sometimes being pressed into service as a poor-man’s garbage collector. Section 9.3.3 (“RCU Usage”) of perfbook provides more detail on these and other RCU use cases.

History

October 12, 2021: Self-review. Note that some of the comments are specific to earlier versions of this blog post.
October 13, 2021: Add RCU-protected phased state change use case.
October 18, 2021: Add read/update common code and Tasseroti citation.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

Can Rust Code Own Sequence Locks?

Posted on October 1, 2021 by paulmckrcu

What Are Sequence Locks?

Sequence locks vaguely resemble reader-writer locks except that readers cannot block writers. If a writer runs concurrently with a reader, that reader is forced to retry its critical section. If a reader successfully completes its critical section (as in there were no concurrent writers), then its accesses will have been data-race free. Of course, an incessant stream of writers could starve all readers, and the Linux-kernel sequence-lock implementation provides extensions to deal with this in cases where incessant writing is a possibility. The usual approach is to instead arrange things so that writers are never incessant.

In the simple case where readers loop until they succeed, with no provisions for incessant writers, a reader looks like this:

do {
  seq = read_seqbegin(&test_seqlock);
  /* read-side access. */
} while (read_seqretry(&test_seqlock, seq));

A writer looks like this:

write_seqlock(&test_seqlock);
  /* Update */
write_sequnlock(&test_seqlock);

The read-side access can run concurrently with the writer’s update, which is not consistent with the data-race-free nature of non-unsafe Rust. How can this be handled?

Linux-Kernel Sequence-Lock Use Cases

First, please note that if Rust-language sequence-locking readers are to interoperate with C-language sequence-locking updaters (or vice versa), then the Rust-language and C-language implementations must be compatible. One easy way to achieve such compatibility is to provide Rust code wrappers around the C-language implementation, or perhaps better yet, around higher-level C-language functions making use of sequence locking. Of course, if Rust-language use of sequence locks only ever accesses data that is never accessed by C-language code, then Rust could provide its own implementation of sequence locking. As always, choose carefully!

Next, let’s look at a few classes of sequence-locking use cases within the Linux kernel.

One common use case is the textbook case where the read-side loop picks up a few values, all within a single function. For example, sequence locking is sometimes used to fetch a 64-bit value on a 32-bit system (see the sock_read_timestamp() function in include/net/sock.h). The remainder of this section looks at some additional use cases that receive significant Linux-kernel use:

Readers can gather data from multiple sources, including functions in other translation units that are invoked via function pointers. See for example the timer_cs_read() function in arch/sparc/kernel/time_32.c.
Readers often perform significant computations. See for example the badblocks_check() function in block/badblocks.c.
Readers can contain sequence-lock writers for that same sequence lock, as is done in the sdma_flush() function in drivers/infiniband/hw/hfi1/sdma.c. Although one could reasonably argue that this could be structured in other ways, this approach does avoid an added check, which could be valuable on fastpaths.
The call to read_seqretry() is sometimes buried in a helper function, for example, in the sdma_progress() function in drivers/infiniband/hw/hfi1/sdma.h and the follow_dotdot_rcu() function in fs/namei.c.
Readers sometimes use memcpy(), as might be expected by those suggesting that sequence-lock readers should clone/copy the data. See for example the neigh_hh_output() function in include/net/neighbour.h.
Sequence-locking readers are often used in conjunction with RCU readers, as is described in the Can Rust Code Own RCU? posting.

Some of these use cases impose significant constraints on sequence-locking implementations. If the constraints rule out all reasonable Rust implementations, one approach would be to provide different Rust implementations for different use cases. However, it would be far better to avoid further API explosion!

Rust Sequence-Lock Options

One approach would be for all sequence-lock critical sections to be marked unsafe, but this removes at least some of the rationale for switching from C to Rust. In addition, some believe that this approach can pose severe Rust-syntax challenges in some use cases.

Another approach is to require all sequence-lock critical sections to use marked accesses. This avoids the data races and presumably also the need for unsafe, but it prevents the kernel concurrency sanitizer (KCSAN) from locating data races involving sequence-lock write-side critical sections. Or might, if it were not for the use of kcsan_flat_atomic_begin() and kcsan_flat_atomic_end() in sequence-locking read-side primitives and the use of kcsan_nestable_atomic_begin() and kcsan_nestable_atomic_end() in sequence-locking write-side primitives. So perhaps Rust could make use of similar hooks.

Yet another approach is to place the data referenced by readers into a separate structure and to switch back and forth between a pair of instances of such structures. This has been attempted in the Linux kernel with unsatisfactory results. In fact, in those cases where it is feasible to use multiple instances of a separate structure, Linux-kernel RCU is usually better choice.

One more approach would be creating something like a “tentatively readable” Rust ownership class that could directly handle sequence-locking read-side critical sections. For example, if a quantity computed from a read within a failed critical section were used subsequently, Rust could emit a warning. Hopefully without much in the way of false positives. (We can dream, can’t we?)

One might instead search the Linux-kernel source tree for uses of sequence locking and to note that it is used only by a very few device drivers:

     16 drivers/dma-buf/
      1 drivers/firmware/efi/
      5 drivers/gpu/drm/
      2 drivers/gpu/drm/amd/amdgpu/
      2 drivers/gpu/drm/i915/gem/
     15 drivers/gpu/drm/i915/gt/
      5 drivers/hwmon/
     72 drivers/infiniband/hw/hfi1/
     11 drivers/md/
     15 drivers/net/ethernet/mellanox/mlx4/
     19 drivers/net/ethernet/mellanox/mlx5/core/lib/

These device drivers (or at least those portions of them that make use of sequence locking) could then be relegated to C code.

However, the use of sequence locking has been increasing, and it is likely that it will continue to increase, including within device drivers. Longer term, Rust’s ownership model should therefore be extended to cover sequence locking in a natural and safe manner.

For More Information

For more on sequence locking, see the seqlock.rst documentation. Sequence locking is also covered in Sections 9.4 (“Sequence Locking”) and 13.4 (“Sequence-Locking Specials”) of perfbook.

History

October 6, 2021: Updated to add sectioning, Linux-kernel use cases, and updated KCSAN commentary.
October 12, 2021: Self-review. Note that some of the comments are specific to earlier versions of this blog post.
October 13, 2021: Add a note discussing possible interoperability between Rust-language and C-language sequence locking.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

Rusting the Linux Kernel: Compiler Writers Hate Dependencies (OOTA)

Posted on October 1, 2021 by paulmckrcu

Out-of-thin-air (OOTA) values are a troubling theoretical problem, but thus far do not actually occur in practice. In the words of the Bard: “It is a tale told by an idiot, full of sound and fury, signifying nothing.”

But what is life without a little sound and fury? Besides, “mere theory” can loom large for those creating the tools required to analyze concurrent software. This post therefore gives a brief introduction to OOTA, and then provides some simple practical solutions, along with some difficult ones.

Here is the canonical OOTA example, involving two threads each having but one statement:

T1: x.store(y.load(mo_relaxed), mo_relaxed);
T2: y.store(x.load(mo_relaxed), mo_relaxed);

Believe it or not, according to the C/C++ memory model, even if both x and y are initially 0, after both threads complete it might be that x==y==42!

LKMM avoids OOTA by respecting dependencies, at least when headed by marked accesses. Plus the dependent access must either be marked or be free of data races. For example:

T1: WRITE_ONCE(x, READ_ONCE(y));
T2: WRITE_ONCE(y, READ_ONCE(x));

If x and y are both initially zero, then they are both guaranteed to stay that way!

There has been great quantities of ink spilled on the OOTA problem, but these three C++ working papers give a reasonable overview and cite a few of the more prominent papers:

P0422R0: Out-of-Thin-Air Execution is Vacuous. This presents a fixed-point analysis that to the best of my knowledge is the first scheme capable of differentiating OOTA scenarios from straightforward reordering scenarios. Unfortunately, this analysis cannot reasonably be applied to automated tooling. This paper also includes a number of references to other work, which unfortunately either impose unnecessary overhead on weakly ordered systems on the one hand or are well outside of what compiler developers are willing to countenance on the other.
P1916R0: There might not be an elegant OOTA fix. This diabolically clever paper shows how back-propagation of undefined behavior can further complicate the OOTA question.
P2215R0: “Undefined behavior” and the concurrency memory model. This paper looks at ways of addressing the issues raised in P1916R0.
P2055R0: A Relaxed Guide to memory_order_relaxed. This paper presents memory_order_relaxed use cases that are believed to avoid the OOTA problem.

As with address, data, and control dependencies, the trivial solution is for Rust to simply promote all READ_ONCE() operations to smp_load_acquire(), but presumably while leaving the C-language READ_ONCE() untouched. This approach increases overhead, but keeps the people building analysis tools happy, or at least happier. Unfortunately, this trivial solution incurs overhead on weakly ordered systems, overhead that is justified only in theory, never in practice.

But if analysis tools are not an issue, then Rust could simply use volatile loads (perhaps even directly using an appropriately Rust-wrapped instance of the Linux kernel’s READ_ONCE() primitive), just as the Linux kernel currently does (but keeping architecture-specific requirements firmly in mind). The problem is after all strictly theoretical.

History

October 12, 2021: Self-review. Note that the comments are specific to earlier versions of this blog post.
November 12, 2021: Add Hans Boehm’s P2215R0 C++ working paper.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

Rusting the Linux Kernel: Compiler Writers Hate Dependencies (Address/Data)

Posted on October 1, 2021 by paulmckrcu

An address dependency involves a load whose return value directly or indirectly determines the address of a later load or store, which results in the earlier load being ordered before the later load or store. A data dependency involves a load whose return value directly or indirectly determines the value stored by a later store, which results in the load being ordered before the store. These are used heavily by RCU. Although they are not quite as fragile as control dependencies, compilers still do not know about them. Therefore, care is still required, as can be seen in the rcu_dereference.rst Linux-kernel coding guidelines. As with control dependencies, address and data dependencies enjoy very low overheads, but unlike control dependencies, they are used heavily in the Linux kernel via rcu_dereference() and friends.

As with control dependencies, the trivial solution is to promote READ_ONCE() to smp_load_acquire(). Unlike with control dependencies, there are only a few such READ_ONCE() instances, and almost all of them are conveniently located in definitions of the rcu_dereference() family of functions. Because the rest of the kernel might not be happy with the increase in overhead due to such a promotion, it would likely be necessary to provide Rust-specific implementations of rcu_dereference() and friends.
Again as with control dependencies, an even more trivial solution is to classify code containing address and data dependencies as core Linux-kernel code that is outside of Rust’s scope. However, given that there are some thousands of instances of rcu_dereference() scattered across the Linux kernel, this solution might be a bit more constraining than many Rust advocates might hope for.
Provide Rust wrappers for the rcu_dereference() family of primitives. This does incur additional function-call overhead, but on the other hand, if initial use of Rust is confined to performance-insensitive device drivers, this added overhead is unlikely to be problem.
But if wrapper overhead nevertheless proves problematic, provide higher-level C-language functions that encapsulate the required address and data dependencies, and that do enough work that the overhead of the wrappering for Rust-language use is insignificant.
And yet again as with control dependencies, the best approach from the Linux-kernel-in-Rust developer’s viewpoint is for Rust to enforce the address/data-dependency code-style restrictions documented in the aforementioned rcu_dereference.rst. There is some reason to hope that this enforcement would be significantly easier than for control dependencies.
Wait for compiler backends to learn about address and data dependencies. This might take some time, but there is ongoing work along these lines that is described below.

One might hope that C/C++’s memory_order_consume would correctly handle address and data dependencies, and in fact it does. Unfortunately, in all known compilers, it does so by promoting memory_order_consume to memory_order_acquire, which adds overhead just as surely as does smp_load_acquire(). There has been considerable work done over a period of some years towards remedying this situation, including these working papers:

P0371R1: Temporarily discourage memory_order_consume (for some definition of “temporarily”).
P0098R1: Towards Implementation and Use of memory_order_consume, which reviews a number of potential remedies.
P0190R4: Proposal for New memory_order_consume Definition, which selects a solution involving marking pointers carrying dependencies. Note that many Linux-kernel developers would likely demand a compiler command-line argument that caused the compiler to act as if all pointers had been so marked. Akshat Garg prototyped marked dependency-carrying pointers in gcc as part of a Google Summer of Code project.
P0750R1: Consume, which proposes carrying dependencies in an extra bit associated with the pointer. This approach is much more attractive to some compiler developers, but many committee members did not love the resulting doubling of pointer sizes.
P0735R1: Interaction of memory_order_consume with release sequences, which addresses an obscure C/C++ memory-model corner case

There appears to be a recent uptick in interest in a solution to this problem, so there is some hope for progress. However, much more work is needed.

More information on address and data dependencies may be found in Section 15.3.2 (“Address- and Data-Dependency Difficulties”) of perfbook.

History

October 12, 2021: Self-review.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

Rusting the Linux Kernel: Compiler Writers Hate Dependencies (Control)

Posted on October 1, 2021 by paulmckrcu

At the assembly language level on many weakly ordered architectures, a conditional branch acts as a very weak, very cheap, but very useful memory-barrier instruction. It orders any load whose return value feeds into the condition codes before all stores that execute after the branch instruction completes, whether the branch is taken or not. ARMv8 also has a conditional-move instruction (CSEL) that provides similar ordering.

Because the ordering properties of the conditional branch involve dependencies from the load to the branch and from the branch to the store, and because the branch is a control-flow instruction, this ordering is said to be due to a control dependency.

Because compilers do not understand them, control dependencies are quite fragile, as documented by the many cautionary tales in the Linux kernel’s memory-barriers.txt documentation (search for the “CONTROL DEPENDENCIES” heading). But they are very low cost, so they are used on a few critically important fastpaths in the Linux kernel.

Rust could deal with control dependencies in a number of ways:

The trivial solution is to promote the loads heading the control dependencies to smp_load_acquire(). This works, but adds instruction overhead on some architectures and needlessly limits compiler optimizations on all architectures (but to be fair, ARMv8 does exactly this when built with link-time optimizations). Another difficulty is identifying (whether manually or automatically) exactly which READ_ONCE() calls need to be promoted.
An even more trivial solution is to classify code containing control dependencies as core Linux-kernel code that is outside of Rust’s scope. Because there are very few uses of control dependencies in the Linux kernel, Rust would not lose much by taking this approach. In addition, there is the possibility of creating higher-level C-language primitives containing the needed control dependencies which are then wrappered for Rust-language use.
The best approach from the Linux-kernel-in-Rust developer’s viewpoint is for Rust to enforce the code style restrictions documented in memory-barriers.txt. However, there is some chance that this approach might prove to be non-trivial.
Wait for compiler backends to learn about control dependencies. This might be a bit of a wait, especially given the difficulty even defining control dependencies within the current nomenclature of the C/C++ standards.

More information on control dependencies may be found in Section 15.3.3 (“Control-Dependency Calamities”) of perfbook.

History

October 12, 2021: Self-review.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

Rusting the Linux Kernel: Atomics and Barriers and Locks, Oh My!

Posted on October 1, 2021 by paulmckrcu

One way to reduce the number of occurrences of unsafe in Rust code in Linux is to push the unsafety down into atomic operations, memory barriers, and locking primitives, which are the topic of this post. But first, here are some materials describing LKMM:

P0124R7: Linux-Kernel Memory Model: A C++ standards-committee working paper comparing the C/C++ memory model to LKMM.
Linux Weekly News series on LKMM (Part 1 and Part 2).
The infamous ASPLOS’18 paper entitled Frightening Small Children and Disconcerting Grown-ups: Concurrency in the Linux Kernel (non-paywalled), with a title-based tip of the hat to the irrepressible Mel Gorman.
Chapter 15 of perfbook (“Advanced Synchronization: Memory Ordering”).
The Linux kernel’s tools/memory-model directory, featuring an executable version of LKMM.

For all of these references, I give a big “Thank You!!!” to my co-authors.

LKMM is not the most complex memory model out there, but neither is it the simplest. In addition, it is in some ways more strict than the C/C++ memory models, which means that strict adherence to coding guidelines is required in order to prevent compiler optimizations from breaking Linux-kernel code. Many of these optimizations are not localized, but are instead scattered hither and yon throughout the compilers, including throughout the compiler backends. The optimizations in the backends are a special challenge to Rust, which seems to take the approach of layering safety on top of (or perhaps within) the compiler frontend. Later posts in this series will look at several pragmatic options available to Rust Linux-kernel code.

There is one piece of good news: Compilers are forbidden from introducing data races into code, at least not into code that is free of undefined behavior.

With all of that out of the way, let’s look at Rust’s options for dealing with Linux-kernel atomics and barriers and locks.

The first approach is to carefully read the P0124R7: Linux-Kernel Memory Model working paper and even more carefully follow its advice in selecting C/C++ primitives that best match Linux-kernel atomics, barriers, and locks. This approach works well for data whose definition and use is confined to Rust code, and with sufficient care and ongoing attention can also work for atomic operations and memory barriers involving data shared with C code. However, expecting Rust locking primitives to interoperate with Linux-kernel locking primitives might not be a strategy to win. It seems wise to make direct use of the existing Linux-kernel locking primitives, keeping in mind that this means properly wrappering them in order to make Rust ownership work properly. Those who doubt the wisdom of wrappering the C-language Linux-kernel locking primitives should consider the following:

Linux-kernel locks are complex and highly optimized. Keeping two implementations is an excellent way to inject profound bugs into the Linux kernel.
Linux-kernel locks are deeply entwined with the lockdep lock dependency checker. The data structures implementing each lock class would need to be shared between C and Rust code, which is another excellent way to inject bugs.
On some architectures, Linux-kernel locks must interact with memory-mapped I/O (MMIO) accesses. Any Rust-language implementation of Linux-kernel locks must therefore be architecture-dependent and must know quite a bit about Linux-kernel MMIO.

As described in later sections, it might be useful to promote READ_ONCE() to smp_load_acquire() instead of implementing it as a volatile load. It might also be useful to promote WRITE_ONCE() to smp_store_release() instead of implementing it as a volatile store, depending on what sort of data-race analysis Rust provides for unsafe code. There is some C/C++ work in flight towards providing better definitions for volatile operations, but it is still early days for this work.

If READ_ONCE() and WRITE_ONCE() are instead to be implemented as volatile operations in Rust, please take care to check the individual architectures that are affected. DEC Alpha requires a full memory-barrier instruction at the end of READ_ONCE(), Itanium requires promotion of volatile loads to acquire loads (but this is carried out by the compiler), and ARMv8 requires READ_ONCE() to be promoted to acquire (but only in CONFIG_LTO=y builds).

Device drivers make heavy use of volatile accesses and memory barriers for MMIO accesses, and Linux-kernel device drivers are no exception. As noted earlier, some architectures require that these accesses interact with locking primitives. Furthermore, there are many device-specific special cases surrounding device control in general and MMIO in particular. Therefore, Rust-language device drivers should access the existing Linux-kernel C-language primitives rather than creating their own, especially to start with. There might well be exceptions to this rule, for example, Rust might be applied to a device driver that is only used by architectures that do not require interaction with locking primitives. But if you write driver containing Rust-language MMIO primitives, please carefully and prominently document the resulting architecture restrictions.

This suggests another approach, namely not bothering implementing any of these primitives in Rust, but rather to make direct use of the Linux-kernel implementations, as suggested earlier for locking and MMIO primitives. And again, this requires wrappering them for use by Rust code. However, such wrappering introduces another level of function call, potentially for tiny functions. Although it is expected that LTO will successfully inline tiny functions, not all of the world is yet ready for LTO. In the meantime, where feasible, developers should avoid invoking tiny C functions from Rust-language fastpaths.

This being the real world, we should expect that the Rust/C determination will need to be made on a case-by-case basis, with many devils in the details.

History

October 12, 2021: Self-review changes.
October 13, 2021: Add explicit justification for wrappering the Linux kernel’s C-language locks and add a few observations about MMIO accesses

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

Rust Concurrency Philosophy: A Historical Perspective

Posted on October 1, 2021 by paulmckrcu

At first glance, Rust’s concurrency philosophy resembles that of Sequent’s DYNIX and DYNIX/ptx in the 1980s and early 1990s: “Lock data, not code” (see Jack Inman’s classic USENIX’85 paper “Implementing Loosely Coupled Functions on Tightly Coupled Engines”, sadly invisible to search engines). Of course, Sequent lacked Rust’s automatic checking, and Sequent’s software engineers made much less disciplined use of ownership than Rust fans recommend. Nevertheless, this resemblance has resulted in some comparisons of Rust with the DEC Alpha, which had a similar concurrency philosophy.

Interestingly enough, DYNIX and early versions of DYNIX/ptx used compile-time-allocated arrays for almost all of its data structures. You want your kernel to support up to N tasks? Very well, build your kernel to have its array of N task structures. This worked surprisingly well, perhaps because the important concurrent applications of that time had very predictable resource requirements, including numbers of tasks. Nevertheless, as you might expect, this did become quite the configuration nightmare. So why were arrays used in the first place?

To the best of my knowledge, the earliest published complete articulation of the reason appeared in Gamsa et al.’s landmark paper “Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System“. The key point is that you cannot protect a dynamically allocated object with a lock located within that object. The DYNIX arrays avoided deallocation (or, alternatively, provided a straightforward implementation of type-safe memory), thus allowing these objects to be protected with internal locks. Avoiding the need for global locks or reference counters was an important key to the performance and scalability prized by Sequent’s customers.

This strategy worked less well when Sequent added a distributed lock manager because the required number of locks was not predictable, nor was there a useful upper bound. This problem was solved in part by the addition of RCU, which provided a high-performance and scalable means of resolving races between acquiring a given object’s lock and deletion (and subsequent freeing) of that same object. Given that DEC Alpha famously had difficulty with RCU, it is only reasonable to ask how Rust will do with it. Or must the concurrency designs of those portions of the Linux kernel that are to be written in Rust be “fitted” to a Rust-language ProcRustean bed [1]? Those who prize Rust’s fearless-concurrency goal above all else might reasonably argue that this ProcRustean bed is in fact a most excellent thing. However, some Linux-kernel maintainers (including this one) might in their turn reasonably argue that within the context of some portions of the Linux kernel, a proper level of fear is a very healthy thing. As is the ability to do one’s job in a reasonably straightforward manner!

One way to avoid this ProcRustean bed is to use the Rust unsafe facility, and in fact “unsafe” has been the answer to a disturbingly large number of my questions about Rust [2]. However, use of this facility introduces the possibility of data races, which in turn raises the question of Rust’s memory model. Within the Linux kernel, the answer to this question is of course LKMM, or perhaps some reasonable subset of LKMM.

However, in my personal experience, I have most frequently seen Rust being used to rewrite scripts that became performance problems upon being more widely deployed than expected. In some cases, these rewrites greatly improved user experience as well as performance. This means that Rust is heavily used outside of the Linux kernel, which in turn means that LKMM might not be the right answer for Rust in general, though some in the Rust community have come out strongly in favor of extending the ProcRustian bed. But this blog series is focused on Rust for the Linux kernel, so the question of the memory model for Rust in general is out of scope. Again, for Rust in the Linux kernel, some subset of LKMM is clearly the correct memory model.

Many of the following posts in this series cover ways that Rust might work with specific aspects of LKMM, including some wild speculation about how Rust’s ownership model might be generalized in a manner similar to Linux-kernel’s lockdep checking has been generalized for cross-released locks and for RCU. Readers wishing to learn more about non-ProcRustean concurrency designs are invited to peruse “Is Parallel Programming Hard, And, If So, What Can You Do About It?, hereinafter called “perfbook”. Specific chapters and sections of the Second Edition of this book will be cited as appropriate by later posts in this series.

Endnotes

[1] Making this 1990s-style concurrency scale usually involves hashed arrays of locks. These are often deadlock-prone, but there are heavily used techniques that (mostly) avoid the deadlocks. See Section 7.1.1.6 (“Acquire Needed Locks First”) in perfbook for one such technique. However, hashed arrays of locks are prone to scalability problems, especially on multi-socket systems, due to poor locality of reference. See for example Section 10.2.3 (“Hash-Table Performance”) for performance results on hash tables, also in perfbook. As a result, most attempts to apply hashed arrays of locks to the Linux kernel resulted in RCU being used instead. The performance and scalability benefits of RCU (and hazard pointers) are shown in Section 10.3 (“Read-Mostly Data Structures”), again in perfbook.

[2]

But please note that Rust’s unsafe code has only limited undefined-behavior unsafe superpowers:

Dereferencing a raw pointer (and it is the programmer’s responsibility to avoid destructive wild-pointer dereferences).
Calling an unsafe function or method.
Accessing or modifying a mutable static variable (and it is the programmer’s responsibility to avoid destructive data races).
Implementing an unsafe trait.
Accessing fields of a union (and it is the programmer’s responsibility to avoid accesses that invoke undefined behavior, or, alternatively, understand how the compiler at hand reacts to any undefined behavior that might be invoked).

I do find the Rust community’s sharp focus on undefined-behavior-induced bugs over other types of bugs to be rather surprising. Perhaps this is because I have not had so much trouble with undefined-behavior-induced bugs. Conspiracy theorists might imagine an unholy alliance between UB-happy developers of compiler optimizations on the one hand and old-school parallel programmers wishing to turn the clock back to the 1990s on the other hand. I am happy to let such theorists imagine such things. 😉

Besides, more recent discussions have focused on memory safety rather than the full gamut of undefined behavior.

History

October 12, 2021: Self-review. Note that some of the comments are specific to earlier versions of this blog post.
October 13, 2021: Add note on memory safety specifically rather than undefined behavior in general.

Posted in Uncategorized | Tagged linux, lkmm, rust | Comments Off

paulmckrcu

Can the Kernel Concurrency Sanitizer Own Rust Code?

History

Will Your Rust Code Survive the Attack of the Zombie Pointers?

History

How Much of the Kernel Can Rust Own?

History

Can Rust Code Own RCU?

Can Rust Ownership be Adapted to RCU?

Linux-Kernel Tooling and RCU

RCU Use Cases

Rust RCU Options

For More Information

Endnotes

History

Can Rust Code Own Sequence Locks?

What Are Sequence Locks?

Linux-Kernel Sequence-Lock Use Cases

Rust Sequence-Lock Options

For More Information

History

Rusting the Linux Kernel: Compiler Writers Hate Dependencies (OOTA)

History

Rusting the Linux Kernel: Compiler Writers Hate Dependencies (Address/Data)

History

Rusting the Linux Kernel: Compiler Writers Hate Dependencies (Control)

History

Rusting the Linux Kernel: Atomics and Barriers and Locks, Oh My!

History

Rust Concurrency Philosophy: A Historical Perspective

Endnotes

History

Archives

Meta