Rust Parallel stream processing with Rayon
https://morestina.net/blog/1432/parallel-stream-processing-with-rayon
Rayon delivers on its promise and produces a parallel version that runs ~1.5x faster, as tested on my machine using a vector of a billion integers. This is fairly impressive given that the individual map and reduce operations in this example boil down to just a CPU instruction each. Rayon’s implementation uses a technique called work stealing to ensure efficient distribution of tasks among threads. The idea behind work stealing is that each thread maintains a local queue of tasks submitted by code running in that thread. By default the thread runs local tasks, and only when it runs out of those does it go ahead and “steal” tasks from queues of other threads. If all threads are equally busy, each just services its own tasks, maximizing data locality and reducing the number of context switches.
The Rayon API provides two distinct abstractions: fork-join and parallel iterators. Fork-join is the lower-level primitive that is deceptively simple: it consists of a single function, join(), which accepts two closures and executes them, potentially in parallel. rayon::join(A, B) pushes B to the thread-local queue and starts executing A – in the same thread. By the time A returns, it’s possible that B had already been stolen by another thread, in which case the current thread waits for it to finish, stealing tasks from other threads while waiting. If B had not been stolen, join() just executes it as well, being no worse off than if the code had been sequential.
Comments
Post a Comment