Implicit threading means the language runtime, library, or OS automatically handles the threading behind the scenes. Many approaches are used for this. 5 of them are included below.
Thread Pools
Section titled “Thread Pools”Fixed number of worker threads, waiting for tasks. Faster and memory-efficient because threads are reused instead of re-creating.
Fork–Join
Section titled “Fork–Join”Tasks are recursively split (forked) and results are combined (join).
OpenMP
Section titled “OpenMP”A compiel+runtime library. Directives are used to define parallel regions. Compiler translates them into runtime library calls. Threads are automatically created.
#pragma omp parallel{ // parallel code}In the runtime, the above segment runs on as many threads as there are cores, in parallel.
#include <iostream>#include <vector>#include <algorithm>#include <omp.h>
int main(int argc, char** argv) { long n = 1000000; if (argc > 1) n = std::stol(argv[1]);
std::vector<double> a(n), b(n), c(n); for (long i = 0; i < n; ++i) { a[i] = static_cast<double>(i); b[i] = static_cast<double>(n - i); }
double t0 = omp_get_wtime();
// the below for loop is parallelized #pragma omp parallel for for (long i = 0; i < n; ++i) { c[i] = a[i] + b[i]; }
double t1 = omp_get_wtime();
std::cout << "Added " << n << " elements in " << (t1 - t0) << " seconds\n";
long m = std::min(n, 10L); for (long i = 0; i < m; ++i) { std::cout << "c[" << i << "] = " << c[i] << '\n'; }
return 0;}Grand Central Dispatch
Section titled “Grand Central Dispatch”Aka. GCD. Apple’s system-level concurrency framework. Uses task-based dispatch queues. Tasks (blocks defined by ^{...} or functions) are submitted to queues. GCD decides how to schedule them on available CPU cores.
2 type of queues:
- Serial
Aka. main queue. Blocks removed in FIFO order. - Concurrent
Removed in FIFO order but several may be removed at a time. There are 4 system wide queues divided by quality of service:- QOS_CLASS_USER_INTERACTIVE
- QOS_CLASS_USER_INITIATED
- QOS_CLASS_USER_UTILITY
- QOS_CLASS_USER_BACKGROUND
Both type of queues are, per-process. More queues can be created programmatically.
Used in macOS and iOS.
Intel TBB
Section titled “Intel TBB”Short for Intel Threading Building Blocks. A C++ template+runtime library. TBB schedules tasks automatically based on available cores. Provides high-level parallel constructs such as parallel_for, parallel_reduce, and task-based decomposition. The runtime system manages the thread pool, load balancing, and work stealing.