Concurrent (Parallel) Programming

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

Concurrency, or concurrent programming, is the management of multiple tasks that exist, and possibly execute, at the same time. "asynchronous" and "threaded" are other terms that are also used to refer to concurrent programming. The difference between concurrency and parallelism is explained in the Concurrecy vs. Parallelism section.

Concurrent programming is becoming more and more important as the hardware executing code comes in form of multiple cores that share the system memory. Moore Law stated that the transistor density on the chip doubles every two years. This observation was true for a while, but processor engineering has reached the limits of what is physically possible when it comes to shrinking transistors. The difficulties are mainly related to power consumptions and temperature dissipation, generically known as the “power wall”.

The relationship between dynamic power consumed by a transistor and percentage of time spent switching (α), capacitance (C), clock frequency (F) and voltage differential (V) is given by the following equation:

P = αCFV2

Smaller transistors have a lower capacitance, so C goes down as the transistor shrinks. Also, Dennard scaling law says that as transistors get smaller, both voltage and current scale downward, so the V2 term scales down in the equation. For constant α, whose maximum and desired value is 1, the dynamic power consumption equation would allow increasing the clock frequency, hence the performance of the processor, as C and V decrease.

However, transistors cannot get smaller than a certain size. One of the reasons is that the voltage has to stay above a specific limit, otherwise the noise would become unmanageable. In consequence, the voltage swings cannot get infinitely small. Thinner insulator layers increase static power leakage. Increasing the clock frequency under these circumstances would mean that the dynamic power consumption goes up, the temperature of the circuit goes up, as not sufficient heat can be dissipated fast enough, and the circuit simply melts.

There is another factor limits the overall system's performance even if the processor frequency could be increased further, and that his the Von Neumann bottleneck: the main memory access is orders of magnitude slower than the processor, so even if the processor is fast, it'll idle waiting for memory access. People work around this by onboarding fast caches on processors.

All these factors determined hardware designers to implement a different scaling strategy: not so much shrinking the transistors in a single but packing more cores per chip. Suddenly, there’s not just one single processor that executes instructions serially, one at a time, but more of them, which can execute instruction in a truly parallel fashion.

This development is what makes concurrent programming very relevant. A core is essentially sequential, so in order to use the system effectively, concurrent programming is necessary. Programs must take advantage of parallel hardware, otherwise the majority of the cores in the processor would idle, so concurrent programming support in the language is very important. Even though there is research going on into parallel compilers, attempts to automatically parallelize a program are not going very well, which makes the programmer's job to decompose the task so it renders itself to parallel execution.

The performance of a program that executes on multiple core is governed by Amdahl's Law, which says that the gains in performance of such a program are bounded by how much of the program must be written in sequential manner.

Concurrency vs. Parallelism

"Concurrent execution" of two tasks means that the tasks execution overlaps in time. This does not necessarily means that both tasks are executing at the same time. Concurrency can be achieved with serial hardware, such as a processor that executes one instructions at the time. Concurrency can be of course achieved on parallel hardware too.

Concurrent Execution.png

"Parallel execution" means executing two tasks at the same time. Parallelism is only achievable with replicated hardware, usually a multi-core processor. Determining which hardware performs which task is called hardware mapping. Hardware mapping is usually not under the control of the programmer. The completion time for parallel execution is usually shorter than for concurrent execution.

Parallel Execution.png

In concurrent programming, the programmer usually determines which tasks can be executed in parallel, and implements communication and synchronization between tasks, using programming languages primitives and patterns. The programmer does not decide which tasks will be performed in parallel and they don't specify the hardware mapping. That is left to the programming language runtime scheduler and to the O/S. Internally, concurrency is implemented by interleaving instructions from different threads of executions. Note that the interleaving of the instructions happens at machine code level, not at the programming language statement level.

Concurrency improves performance even without parallelism. This is because tasks usually wait for some event (over network, for example) and concurrent programming helps to preempt that task and use the processor for other tasks that are ready for compute. Concurrency can help to "hide latency". Also, done correctly, concurrency provides ways to structure the program to make it easy to understand and scalable.

For problems that can be resolved in concurrent manner, it is recommended to write the application so that it can scale horizontally, which means running instances of the program, or subroutines, on more cores in parallel. If done correctly, this increases resource utilization and the performance of the program.

Atomicity

When something is considered atomic, or has the property of atomicity, this means that within the context that it is operating, it is indivisible or uninterruptible. These terms mean that within the defined context, something that is atomic will happen in its entirety without anything else happening in that context simultaneously.

The context is important. Something that can be atomic in the context of a thread may not be atomic in the context of a process, which is executed on multiple threads. In other words, the atomicity of an operation ca change depending on the currently defined scope.

Combining atomic operations does not necessarily produce a larger atomic operation. If something is atomic, it is implicitly safe within concurrent contexts. Programming languages provide atomicity primitives, such as atomic data types, or atomic access to memory, where only one thread is guaranteed exclusive access to shared data.

Critical Section

A critical section is a section of a program that needs exclusive access to a shared resource. Exclusive access is also referred to as synchronized access. Mutual exclusion locks can be used to programmatically guard access to a program's critical sections.

Concurrent Programming is Hard

Writing concurrent code is hard, because it is difficult to maintain a mental model of the state of the program at any particular time. The problems a programmer typically runs into while writing concurrent code belong to several classes of problems: race conditions, deadlocks, livelocks and starvation.

Race Condition

A race condition is a class of concurrency problems that are caused by non-deterministic interleavings of machine code executed in different threads. The usually non-deterministic order in which the threads and instructions within a thread are scheduled for execution leads to different, unwelcome results of the program for subsequent executions. In other words, the outcome of the program depends on the interleaving, or scheduling details. The race conditions that involve access to shared data are called data races, but data races are not the only existing race conditions.

Races are introduced because developers are thinking about the problem sequentially: they assume that because a line of code falls before another in the text of the program, that line will be executed first. This assumption true if those lines of code are executed on the same thread, but false if they are executed on different threads. In those situation, a helpful technique is to imagine that large amounts of time pass between different operations, and think about how the program would behave if those operations occur in different order. One naive approach to address race conditions is to sprinkle sleep statements throughout the code, in an attempt to "break the race". This does not solve the problem, it just modifies the timing. The program does not become logical correct, and the downside is that this approach usually degrades the performance.

Race conditions are an insidious type of concurrency bug, because they may not show up until years after the code has been placed in production. They are usually precipitated by an environmental change that modifies the execution timing.

Data Race

Most of the time, race condition occur due to the fact that the threads share data. A very common example is a concurrent operation attempts to read a variable while at some undetermined time another concurrent operation is attempting to write the same variable. This is called a data race. If threads shared no data, then different interleavings should have no effect on the effect of the instructions on their particular threads.

Deadlock

Deadlock appears when there are circular dependencies between threads, and the threads acquire shared resources needed by their dependencies.

Livelock

Starvation

Concurrency Programming Models

The majority of languages address concurrency by providing a representation of O/S threads or green threads at language level, and exposing memory access synchronization primitives to protect shared data in presence of concurrent access. An alternative model is based on Communicating Sequential Processes (CSP), a model introduced by A. Hoare in the "Communicating Sequential Processes" 1978 ACM paper. This concurrency programming model is centered around passing data between independent sequential processes.

Memory Access Synchronization Primitives

Synchronization is the opposite of concurrency. The advantages of concurrency and parallelism is that the instruction interleavings are arbitrary, and that gives the scheduler options that lead to efficient utilization of the processor, which ultimately speeds up the execution of the code. With synchronization, we restrict the scheduler's options and we force scheduling to happen in a certain way. Every time we use synchronization, we are reducing the efficiency, the amount of possible concurrency. Also, using synchronization does not automatically solve race conditions. However, there are cases when synchronization is necessary, when certain things in different threads of execution have to happen in a certain order.

A way to address concurrent updates of shared state in concurrent programming is to use locks. Locks are problematic because they rely of the capacity of the programer to identify problem areas that require concurrent access protection. As the complexity of the program increases, the reliance of this capacity may become problematic.

An alternative approach is to use an actor model.

Mutual Exclusion (Mutex) Locks

Mutual exclusion locks are programming primitives that ensure exclusive (synchronized) access to memory for the thread that holds the lock. The locks must be acquired and released. Go has sync.Mutex and sync.RWMutex.

Semaphore

Binary Semaphore

Counting Semaphore

Communicating Sequential Processes (CSP)

Communicating Sequential Processes (CSP) is a theoretical concurrency model introduced by A. Hoare in the "Communicating Sequential Processes" 1978 ACM paper.

Atomic Data Types

Immutable Data

Confinement

TO REFACTOR

O/S Level Concurrency

Process

A process is an instance of a running program. Each process has a context: its own memory, within an address space. It has code, stack, heap, registers, program counter, stack pointer, data registers. It may share libraries with other processes.

Process Context Switch

When the operating system switches from one process to another, that is called a context switch. The process context must be swapped: saved and replaced with another's.

O/S Threads

An O/S thread is similar conceptually with a process, but it has less context, and it shares some of its context with other threads in the process. A process can have multiple threads inside it. A thread's context include unique stacks, data registers, code. The context that is shared between threads include the virtual memory, the file descriptors.

Threads should be mostly independent of each other. The whole reasons there are different threads is because you feel that the tasks executed by the threads can be executed concurrently at the same time. However, they're not completely independent, they're part of the same process and they share information - heap memory space among other things. When threads communicate inappropriately, it is when race conditions occur.

Go implements threads as goroutines.

Thread Context Switch

When the operating system switches from one thread to another, that is called a thread context switch.

Green Threads

Concurrency in Programming Languages

Go

Go Concurrency

Python

Python Threads and Concurrency