Work stealing

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors (or cores). It does so efficiently in terms of execution time, memory usage, and inter-processor communication.

In a work stealing scheduler, each processor in a computer system has a queue of work items (computational tasks, threads) to perform. Each work item consists of a series of instructions, to be executed sequentially, but in the course of its execution, a work item may also spawn new work items that can feasibly be executed in parallel with its other work. These new items are initially put on the queue of the processor executing the work item. When a processor runs out of work, it looks at the queues of the other processors and "steals" their work items. In effect, work stealing distributes the scheduling work over idle processors, and as long as all processors have work to do, no scheduling overhead occurs.^[1]

Work stealing contrasts with work sharing, another popular scheduling approach for dynamic multithreading, where each work item is scheduled onto a processor when it is spawned. Compared to this approach, work stealing reduces the amount of process migration between processors, because no such migration occurs when all processors have work to do.^[2]

The idea of work stealing goes back to the implementation of the Multilisp programming language and work on parallel functional programming languages in the 1980s.^[2] It is employed in the scheduler for the Cilk programming language,^[3] the Java fork/join framework,^[4] the .NET Task Parallel Library,^[5] and the Rust Tokio runtime.^[6]^[7]

^ Cite error: The named reference dfs was invoked but never defined (see the help page).
^ ^a ^b Blumofe, Robert D.; Leiserson, Charles E. (1999). "Scheduling multithreaded computations by work stealing" (PDF). J ACM. 46 (5): 720–748. doi:10.1145/324133.324234. S2CID 5428476.
^ Blumofe, Robert D.; Joerg, Christopher F.; Kuszmaul, Bradley C.; Leiserson, Charles E.; Randall, Keith H.; Zhou, Yuli (1996). "Cilk: An efficient multithreaded runtime system". Journal of Parallel and Distributed Computing. 37 (1): 55–69. doi:10.1006/jpdc.1996.0107.
^ Doug Lea (2000). A Java fork/join framework (PDF). ACM Conf. on Java.
^ Leijen, Daan; Schulte, Wolfram; Burckhardt, Sebastian (2009). "The Design of a Task Parallel Library". ACM SIGPLAN Notices. 44 (10): 227. CiteSeerX 10.1.1.146.4197. doi:10.1145/1639949.1640106.
^ "What is Tokio? · Tokio". tokio.rs. Retrieved 2020-05-27.
^ Krill, Paul (2021-01-08). "Tokio Rust runtime reaches 1.0 status". InfoWorld. Retrieved 2021-12-26.

[dfs-1] Cite error: The named reference dfs was invoked but never defined (see the help page).

[jacm-2] Blumofe, Robert D.; Leiserson, Charles E. (1999). "Scheduling multithreaded computations by work stealing" (PDF). J ACM. 46 (5): 720–748. doi:10.1145/324133.324234. S2CID 5428476.

[3] Blumofe, Robert D.; Joerg, Christopher F.; Kuszmaul, Bradley C.; Leiserson, Charles E.; Randall, Keith H.; Zhou, Yuli (1996). "Cilk: An efficient multithreaded runtime system". Journal of Parallel and Distributed Computing. 37 (1): 55–69. doi:10.1006/jpdc.1996.0107.

[lea-4] Doug Lea (2000). A Java fork/join framework (PDF). ACM Conf. on Java.

[5] Leijen, Daan; Schulte, Wolfram; Burckhardt, Sebastian (2009). "The Design of a Task Parallel Library". ACM SIGPLAN Notices. 44 (10): 227. CiteSeerX 10.1.1.146.4197. doi:10.1145/1639949.1640106.

[6] "What is Tokio? · Tokio". tokio.rs. Retrieved 2020-05-27.

[7] Krill, Paul (2021-01-08). "Tokio Rust runtime reaches 1.0 status". InfoWorld. Retrieved 2021-12-26.

[1]

[2]

[3]

[4]

[5]

[6]

[7]