Circular Buffer

Es
C99 Data Structure Linux

A circular buffer is a really cool fixed-sized FIFO data-structure useful when passing a stream of data from a 'producer' and 'consumer' thread.

1. What is it?

Buffer
Fig.1 Repeating buffer

A circular buffer (a.k.a. ring buffer) is a fix-sized data-structure that loops back to the beginning when the end is reached. It is continuous, repeating and usually used to buffer streams.

Two positions are tracked:

  • the write position (w) which moves as data is written to the buffer, and
  • the read position (r) which moves as data is read from the buffer.
Circular Buffer
Fig.2 Circular buffer representation

Both move clockwise on the ring and point to the start position on the buffer of their respective processes.

Circular buffers are essentially Queues with fixed maximum sizes. Their content is consumed using a FIFO method (first-in, first-out).

Things to keep in mind

  • The read position (r) will always be conceptually 'behind' or equal to the write position (w) even when not in actuality.
  • The read position should never read past the write position.
  • When the read and write positions (r, w) are the same it can mean one of two possible state for the buffer: completely empty or completely full.

2. Design considerations

The simple way of making a circular buffer is through an array. The primary issues to contend with are:

  1. the behaviour of the fill/drain operations respective to the availability of space/data on the buffer,
  2. how to keep track of whether the array is full or empty when the r and w positions are the same,
  3. circling back to the beginning when reaching the end of the array, and
  4. avoiding race conditions with threads.

2.1 Fill/Drain behaviour

The behaviour of these operations comes down to what the circular buffer will be used for and how. When the requested size of data for an operation is not currently available, data for reading and free space for writing, there are 3 possible approach:

  1. All-or-nothing (i.e. fail and return straight away when resources are not enough),
  2. Process what can be processed given the available resources (see fig.3),
  3. Wait until the resource needed are available.
Try my best operation
Fig.3 Try-my-best operations

All-or-nothing means that the operation won't block the calling thread but will need to be tried again later on failure.

Processing what can be processed at the time means not blocking the calling thread waiting until enough resources are available. Though, like 'Fail', there will need to be another call to the operation later to process the remaining data.

Waiting means the call will block the calling thread until the resources required are available and processed. At least, with this, there won't be another call from the calling thread to try again or resume processing whatever was not.

All of these come with their own pros and cons and each are suited for particular scenarios.

2.2 Dealing with Full/Empty states

To keep track of the empty/full state, a simple flag be set to signal whether the buffer has some data or is completely empty.

Sequential empty to full
Fig.4 Sequential fill/drain buffer data

For example (see fig.3):

  • A buffer of size s is initialised and its empty flag is set to true.
  • The producer process writes n bytes of data (w += n). This sets the empty flag to false.
  • Next, n bytes of data is read by the consumer process (r += n). This last event makes w equal to r and so, since there's no more data left to read, the empty flag is set back to true.

When implementing a flag system to keep track of the empty/full states, it's important to be careful especially if the read/write operations are non-blocking as it can lead to race-conditions.

2.3 Circling back

When advancing the write or read positions, some care must be taken in the event adding n makes the position equal or larger than the underlining array's size. This happens when the positions circle back to the beginning. To avoid this, the modulo operator can be used to make sure the index calculated is always correct...

Calculations

position = ( position + n ) % array_size

free_space = ( r + array_size - w ) % array_size

2.3.1 Copy byte-per-byte

A 'naive' implementation of the circular buffer's fill or drain operations would use a 'byte-by-byte' approach.

bool fill( CircularBuffer_t * cbuff, uint8 * src, size_t n ) { const size_t free_space = ( ( cbuff->read_pos + cbuff->size - cbuff->write_pos ) % cbuff->size ); if( free_space < n ) return false; for( size_t i = 0; i < n; ++i ) { cbuff->buffer[ ( cbuff->write_pos + i ) % cbuff->size ] = src[ i ]; } if( n ) { cbuff->write_pos = ( cbuff->write_pos + size ) % cbuff->size; cbuff->empty = false; } return true; }
Byte-per-byte buffer fill (NOT thread-safe)

This is not exactly efficient by any means as a modulo operation is made for every single byte bringing the performance down. A better way would be to do 'copy-by-chunk' instead.

2.3.2 Copy by chunks

Copying entire sections requires using memcpy(..) or an equivalent if using another language than C.

For buffer operations, when the starting position is < the end position, no circling back is needed as the free space is a whole contiguous block (see fig.4a). For writing it would be w < r and, for reading the same applies when r < w.

Chunking states
Fig.5 Chunking states

When the free space is not a one-piece physical block some allowances need to be made. There are 2 cases that require "chunking" calculations:

  1. Where the starting position is > to the end position (see fig.4b), and
  2. Where the starting position is == to the end position (see fig.4c).

Both may very well lead to no actual chunking if the amount to copy is lesser or equal to the size of the first chunk of free space before the circling back occurs.

In any case, processing can be done with just 2 modulo operation instead of the O(n+2) found in the 'byte-per-byte' approach. One modulo to find the amount of free space in the buffer and the other for moving the position up. Both are already a fixture in the previous approach anyway.

bool fill( CircularBuffer_t * cbuff, uint8 * src, size_t n ) { const size_t free_space = ( ( cbuff->read_pos + cbuff->size - cbuff->write_pos ) % cbuff->size ); if( free_space < n ) return false; if( cbuff->read_pos > cbuff->write_pos ) { //no circling back memcpy( &cbuff->buffer[cbuff->write_pos], src, n ); } else { //circling back (i.e. 1-2 x chunks) const size_t free_chunk1 = cbuff->size - cbuff->write_pos; const size_t chunk1_size = ( n <= free_chunk1 ? n : free_chunk1 ); memcpy( &cbuff->buffer[cbuff->write_pos], src, chunk1_size ); if( n > free_chunk1 ) memcpy( &cbuff->buffer[0], &src[chunk1_size], ( n - free_chunk1 ) ); } if( n ) { cbuff->write_pos = ( cbuff->write_pos + size ) % cbuff->size; cbuff->empty = false; } return true; }
By-chunk buffer fill (NOT thread-safe)

This is already pretty nice but there is another pretty neat approach to the fill/drain operations...

2.4 Using the memory mapping trick

Instead of dealing with the physical array directly a 'virtual' mapping can be created as a proxy instead. A second virtual map of the array can then be appended to the end of the first and, thus, allowing for copy operations to happen in 1-chunk even if there's a circle-back.

Virtual memory map
Fig.6 Virtual memory map

Depending on what platform and what virtual memory mapping library is being used some details may be different. That being said, the way this works in Linux is by:

  • creating a memory file descriptor to the physical array and adjusting its size to the array's, int fd = syscall( __NR_memfd_create, "circular_buffer", 0 ); ftruncate( fd, size )
  • initialising a blank anonymous memory map with 2 times the size of the array, u_int8_t * buffer = mmap( NULL, 2 * size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0 );
  • assigning the file descriptor on the map at the beginning and, again, at the size offset. mmap( cbuff->buffer, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, 0 ); mmap( ( cbuff->buffer + size ), size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, 0 );

Now the resulting map allows for the array to be doubly accessible (see fig.5).

In Linux, for the memory mapping trick requires[*] that the array's size be multiples of the system's memory page size (getpagesize() ) so that it lines up with the memory page. Otherwise the contiguous access to the array twice won't work.

When passing a size to the circular buffer initializer it can be as a suggestion. The actual workable size, aligned to the memory page, can be calculated from the suggestion using the following:

const size_t whole_pages = ( size / getpagesize() ) + ( size % getpagesize() > 0 ? 1 : 0 ); const size_t actual_size = whole_pages * getpagesize();

[*] GNU libC (13.8 Memory-mapped I/O): ...addresses for mapping must be page-aligned, and length values will be rounded up.

2.5 Threading

A circular buffer is typically used to stream data from a 'producer' thread to a 'consumer' thread so needs have some thread-safety baked in. If the execution of 2 operations overlap, typically fill/drain, there will be a race-condition which can lead to corruption of the buffer and its data.

There are a couple of things to consider before adding any thread-related operations to the implementation:

  • Is the producer or consumer a "passive" participant with an automated loop?
  • Is the throughput balanced between the producer and consumer thread?
  • Is one thread latency sensitive (i.e. will blocking it be a problem)?

Depending on the answers, one of several ways of implementing may be chosen. Here are some of these...

2.5.1 One mutex to rule them all

Single-mutex threading
Fig.7 Single mutex threading

This is the easiest to implement as it just needs 1 mutex. Each operations try to lock it and do their thing and it's first comes, first serve (see fig.7).

If the mutex is already locked then the operation requesting the lock will wait[1] until it's free.

[1] pthread_mutex_lock: ...if the mutex is already locked, the calling thread shall block until the mutex becomes available.

2.5.2 One-sided signalling

Single-condition threading
Fig.8 Single condition threading (fill-triggered)

One sided signalling is useful when the operation receiving the signal is used by a thread running on a dumb loop (i.e. a worker thread just continuously processing data as it receives it). Something to keep in mind is that since the signal receiving operation waits on the signal it blocks[2] execution and the calling thread by extension.

The condition can either be setup so that it blocks the producer thread until space is available or the consumer thread until data is available (see fig.8).

For implementation, a thread condition (pthread_cond_t) variable is required for signalling. The mutex must also be locked prior to the signal waiting (pthread_cond_wait)[2]. The conditional wait will unlock the mutex while the condition (signal) is not satisfied and then lock it again once satisfied.

[2] pthread_cond_wait ...shall be called with mutex locked by the calling thread or undefined behavior results. These functions atomically release mutex and cause the calling thread to block on the condition variable...

2.5.3 Two-sided signalling

Double-condition threading
Fig.9 Double condition threading

Double conditional signalling would only really be useful in the case where both threads are passive workers and sporadic data is passed to the producer like a proxy.

With this setup both thread's execution is blocked until there are resources available for them to resume their work. The producer thread blocks when there isn't enough space in the buffer for its data to be written. The consumer thread blocks when there is nothing in the buffer. When there isn't enough resources to do what needs to be done CPU cycles are not wasted looping around a while loop since the execution is just blocked (see fig.9).

Something to be aware of is that blocking and resuming does have a tiny bit of overhead which will become not so insignificant when compounded in a loop. Whether it is worth it depends entirely on what the circular buffer will be used for.

3. Implementation

The example implementation uses a producer and consumer thread and has one sided signalling with a try-best drain operation and an all-or-nothing fill operation.

There's a simple repeatable test based on randomised data that compares the generated source with the target buffer post transfer. The size hint, read/write and source data sizes are all editable via the MACROs at the top of the main.c file.

Github Repository