Code Discover: Determinism in real time system

Determinism measures the consistency of the specified time interval between events. Determinism is essential to the real time application. If the event doesn't happen in time as expected, the consequence of the real time application is extreme serious, for example the air bag doesn't unfold in time in a car accident. This article introduces the conceptions of jitter and determinism, then describes what needs to be considered when we implement a real time application.

1: Jitter and determinism

Jitter is the difference between the expected and the actual timing responses. Determinism measures the consistency of the specified time interval between events.

2: Characters for real time operating system

To achieve high determinism, real time operating system supports below features

1): Minimize feature set

Real time operating system focuses on necessary features, such as thread scheduling and file system management. It deletes all unnecessary or resource consumption features, such as GUI and security module.

2): Restricted priority thread scheduler

Thread scheduling based on thread priority. Perform preemptive scheduling among threads of different priority, and perform round-robin scheduling among threads of equal priority.

3): Minimize interrupt and thread switching latencies

3: Design guideline for real time application

3.1 Memory allocation

3.1.1 Steady state allocation

The component should have no memory allocations or de-allocations in the steady state, even in non-time-critical threads. Steady state is defined as the stable operating state of the component. Allocations and de-allocations may occur during initialization and shutdown, which must be processes with a defined beginning and end, which happen at times predictable to the user.

Thrashing the memory manager can cause memory to become fragmented, which will decay performance of the allocator over time. This applies to explicit calls to malloc/free and new/delete as well as “hidden” calls such as via STL containers.

3.1.2 STL

While the use of STL is encouraged in C++ code, it is important to be careful about how STL implementations handle memory management. In many cases, STL provides no guarantees about which operations may cause an “under-the-hood” allocation. In general, if an operation is said to “invalidate iterators,” then it can be assumed that this operation is allowed to cause an allocation under the hood. But the reverse is not true: some operations do not invalidate iterators, but do cause allocations, such as std::list::push_back().

The only completely safe assumption is that anything that adds or removes elements from an STL container may cause a memory management operation. One way to mitigate this problem is to use the “pool” feature of the Boost libraries. This provides a pooled allocator that complies with the STL allocator interface. Using this, you are guaranteed that no memory operations will take place at the OS level unless the container grows beyond the maximum size that it has ever been. However, keep in mind that the pool still uses a mutex, which can lead to “priority inversion” (see the Threading section below).

There are some exceptional cases where certain STL containers provide memory management facilities. The notable example is the std::vector::reserve() method. Using this method, you are guaranteed never to hit the memory allocator unless the container exceeds the reserved size. Very few STL containers provide similar facilities however, making vector one of the few completely safe containers from a memory management standpoint.

3.2 Threading

3.2.1 Priority inversion

Priority inversion can happen any time a high-priority thread depends on a low-priority thread to make progress. The simplest example is when a high-priority thread and a low-priority thread compete for the same mutex, and a medium-priority thread preempts the low-priority one. Since the high-priority thread cannot wake up before the low-priority thread lets go of the mutex, and the low-priority thread cannot wake up until the medium-priority thread yields, the high- and medium-priority threads have inverted priority.

Most RTOSes (including the ones we use: VxWorks and Phar Lap) counter priority inversion with a technique called “priority inheritance.” In this protocol, any thread that owns a mutex “inherits” the priority of the highest-priority thread currently waiting for that mutex. In the above example, that would elevate the low-priority thread above the medium-priority thread long enough to get its work done and let the high-priority thread run. However, it is important to note that this does not eliminate priority inversion; it only establishes a provable worst-case bound. That bound is the summation of the time that all mutexes shared by threads of differing priority are held. If you hold mutexes for a long time, this can still be very bad.

In order to minimize mutex-based priority inversion, observe the following guidelines:

Mutexes should be held for as little time as possible. Note that this is a trade-off: overly fine-grained mutexes can hurt system performance from repeated acquiring/releasing.
To combat the overhead of fine-grained mutexes, eliminate the need for mutexes where possible. A common technique is to create copies of shared data that each thread owns and can operate on independently. Then, when it is time to do something with that data, you can grab a mutex for a short amount of time and “apply” it (by writing it to hardware for instance). Note this is not the same as “thread-local storage,” which is a facility provided by the OS directly. TLS implementations vary widely from OS to OS, and its use is not recommended in portable software.
If it is possible to eliminate the mutex by using atomic operations (such as atomic increment/decrement) or wait-free data structures, then do so. Unfortunately no standard implementations of wait-free queues and so forth are available.

3.2.2 Event-based priority inversion

Event-based priority inversion occurs when a high-priority thread waits on an event or signal from a low-priority thread, and that low-priority thread becomes preempted and cannot progress. Priority inversion from events or signals is often worse than mutex-based priority inversion, since OSes typically have no built-in countermeasure for it. Unlike mutexes, which have a concept of “ownership,” no one owns an event, so the OS cannot tell whose priority to elevate.

A common example is a high priority thread which consumes data coming in off the network. The network is serviced by a low priority thread that sets an event when traffic comes in. The high priority thread waits on that event. If the low-priority thread becomes preempted, it will stall the high-priority thread.

The simple guideline is do not design systems that do this. High priority threads must be able to progress without depending on progress from lower-priority threads in a real-time system.

3.3.3 Round robin

Most RTOSes implement a scheme known as “round robin” for multiple threads at the same priority. Under this scheme, a thread is allowed to run for a certain period of time (usually a handful of milliseconds), then it is swapped out with the next thread in line at the same priority, and so on in a circular fashion. This leads to the decision to run several “unimportant” threads at the same priority in a system, under the assumption that they will get roughly equal time. However, this practice is bad for 2 reasons:

Thread context switch overhead is usually significant, even at a round robin period of 10 msec.
More importantly, the assumption of equal time for round robin threads is incorrect! If a thread yields its time slice prematurely for ANY reason (mutex acquire, sleep, explicit yield), then its time slice is gone until its turn comes around again. For example, say you have threads A, B, and C. A and B compete for the same mutex, and they acquire and release it every 10 microseconds. C runs wide open. What will happen is A will run for 10 microseconds and go to sleep on the mutex. B will do the same. Then C gets a full time slice for 10 milliseconds. A and B get 1/1000^th the time of C!!!

The second point above can lead to a “quasi-starvation” condition if a thread yields very frequently (and remember, a yield can mean a simple mutex acquire) compared to the round robin period of the system and other threads at the same priority do not. You should be able to demonstrate that threads scheduled at identical priority to other threads will not undergo quasi-starvation.

The guideline for this situation is this: if you have a thread in your driver that does any significant amount of work, it should run at a unique priority, or at least only share a priority with other threads in your driver that you can control. Either it’s a background thread that runs lower than everything else, or you knowingly allow it to preempt everything below it. Simply relying on the RTOS scheduler to “do the right thing” almost always produces unwanted results.

3.2.3 Deadlocks

Deadlocks occur when 2 or more threads are waiting on each other to do something, and neither can progress until that happens. It’s difficult to address this situation with a coding convention, apart from “always acquire and release locks in consistent LIFO order.” I am including it here because the working group has reported that deadlocks are the most prevalent problem in RT software.

A common cause of deadlocks is the “hidden mutex.” Your component uses a library that acquires a mutex internally. Also, this library issues callbacks to your component, and the same mutex is owned by the library when the callback is issued (this is a lot more common than you’d hope). Of course, if your component owns a mutex when it calls into the library, then also tries to get the same mutex from within a callback issued by the library, you have a deadlock. The only solutions to this are:

Contact the developer of the library for details on mutexes owned when callbacks are issued (not always possible).
Never acquire a mutex in your callback that your component owns when calling into the library. This may require you to perform actual processing in response to a callback from an internal thread to your driver.

Code Discover

Sunday, February 9, 2014

Determinism in real time system