Skip to content
Josip Bakić edited this page Dec 4, 2015 · 6 revisions

This text will go through some of the internal details of the Shielded library. Hopefully, this will help clarify how the code works, since some of it is maybe a bit convoluted.

The Shielded lib is an STM implementation which tries to be as obstruction-free as possible, without complicating. The idea was to simply wrap any piece of data with Shielded<>, and make it thread-safe.

The Shielded* classes have members which allow read-only and read-write access to their contents. As a general, albeit somewhat arbitrary principle, read operations are allowed outside of transactions, but writes are not. A notable exception are collection enumerators, which must be in a transaction to avoid weird behaviour in the face of concurrent updates.

The Shielded<> class is the basic protector for a single piece of data. For value-types it always produces a copy on read, which is obvious and safe. For references, the copy points to the same object, so it should be immutable (or Shielded). Writing creates an internal container, thread-local, in which it saves your changes. Anything you do is invisible to other transactions.

During a transaction you can only see a snapshot of the state of any Shielded object you access - you see the version whose number is <= than your read version, assigned to you on transaction start. Time stands still, so to speak. Writes are just new items in a list of versions, and the old versions are kept as long as there is an open transaction capable of reading them. The process of releasing old versions is called trimming, and is done in parallel with transactions, no locks required. * A small exception exists in the ShieldedDict, when removing a key from the underlying dictionary, since Mono does not support atomic compare-and-remove on a ConcurrentDictionary.

Version numbers are managed by the VersionList internal class. It issues tickets for reading and writing. The tickets form a singly-linked list, in which every following member has a +1 version number (a 64bit integer value, BTW). Write transactions generate new entries in this list when they pass their commit checks, and every new transaction run takes a reading ticket by simply reading the last element from the list. Since only write transactions add to the list, then starting a transaction does not have to compete with threads that are committing at the same time. Due to this, a read-only transaction is sure to proceed without any significant interruptions.

Shielded does commit-time locking - acquiring a global lock before commit, while checking whether each of the accessed IShielded objects allows it. Basically, they have to be free from locks by other transactions, and they must be unchanged from the time your transaction opened, i.e. their latest version is <= your read version. As they are checked, the IShieldeds will, if they have changes to write, lock themselves. The system ensures that upon exiting the global lock, either all the fields with changes will be locked and the write can proceed, or they will all be unlocked, changes rolled back, and the transaction can start again from the beginning.

The time spent under global lock should be relatively short, proportional to the number of accessed Shielded fields. This is obviously very sensitive to IShielded implementation details, which is why this interface is internal.

While an IShielded is locked, no other thread can commit into it (i.e. their IShielded.CanCommit calls return false), and any reads or writes into it get blocked, but only if the other transaction's read version is >= than the version being written. This means that they should be able to read the new data, so they must wait for it to be written. In that case they will be blocked until the write completes. Since version numbers keep incrementing, this blocking can only happen at the beginning of a transaction, and gets less likely with time.

The partial locking of the fields is implemented as a combination of spin-waiting and Monitor.Wait. Initially, the waiting thread will spin-wait for a while, but if the field still remains locked, it will use Monitor.Wait to put itself to sleep until the lock is released. Most of the time the locks are released quickly, so we avoid the cost of a context switch by spin-waiting, but in some situations the waiting time can be long (notably, when Shield.WhenCommitting or Shield.RunToCommit mechanisms are used!), and long periods of spin-waiting are very bad for performance. This is implemented in the WriteStamp class.

Versions of data in a field are kept in a singly linked list which goes backwards in time, so a write is done by simply swapping the head with a new element, whose Older reference points to the old head. Any concurrent reader with a read version smaller than our write stamp is indifferent to this - even if they would see our element, they would simply proceed to his Older. On the other hand, concurrent readers with a larger stamp will block. After all writes complete, a transaction releases it's lock. The blocked threads are released, and able to read this latest write.

An interesting thing to note is a race between a reader checking the lock of a field, and a CanCommit call on another thread that is just about to set it. Note that the thread calling CanCommit is still checking it's fields, so it has not yet acquired a version number for writing. It's write stamp will have a null-version, and readers will safely ignore it. We know that even if the commit check passes, whatever version number gets assigned to that thread for writing will be > than our reader's version, because a writer always gets a new, incremented version number.

The use of commit-time locking for checks, and partial field locking during writing, means that very little time is spent under locks, that the locking is more granular, and most importantly, that during a transaction run you are more or less free to do whatever you please. A transaction that is still running does not take any locks, and cannot block other threads.

The Shield.WhenCommitting and Shield.RunToCommit mechanisms can cause a write stamp lock to be held for an arbitrarily long time. It is important to note, however, that they do not hold any global lock. Only the fields that their transactions plan to write into are locked. Thus, they block only those other threads that are in direct conflict with them, which have to wait for their outcome. This blocking is why these mechanisms should be carefully considered, and used only when appropriate.

A transaction run can fail even before the check - when you access a read-write field which has been written into since you started, then we know you will not be able to commit. A TransException is thrown, and caught in the Shield class, which then rolls back and retries your transaction from the start. Although exceptions are known to be expensive, your thread is wasting time no matter what it does next, so it's desirable to stop it immediately.

A particularly complicated part of the story are commutes. Commutable operations get delayed and performed in a sub-transaction which runs just before the commit check. They read what is at that moment the latest data (i.e. they get a new read version), reducing their risk of conflict. After them is the commit check, where they are checked first, against their higher read version, and if they conflict only they get rolled back and retried. If they pass the check, the main transaction fields are checked against their older read version, and depending on them the commit proceeds. The global lock is left and either all the modified fields are locked, or none are.

There are two implementations of IShielded, apart from the basic Shielded<> generic class. One is ShieldedDict<>. It internally uses a .NET ConcurrentDictionary and implements key-based transactional protection, which means that only transactions which access the same keys can conflict with each other. The other collections are implemented using Shielded<> containers as building blocks. Their capability for parallelism depends on their structure. For example, the ShieldedTree uses a Shielded container for every node, which means you may have parallel writes on it if they do not have to, during tree rebalancing, change the same nodes.

The remaining IShielded implementation is the class which implements Conditional and PreCommit functionalities. This was originally implemented with transactional collection types from the Shielded library itself, but it now uses a little trickery to make it faster and minimize it's impact on transaction run times. Unlike WhenCommitting, which is relatively low-level and not optimized for creating a large number of subscriptions, the Conditional and PreCommit subscriptions can be created in large numbers safely.

From a user perspective, the library guarantees stable snapshot reading, protection against conflicts, and as carefree use as possible. I hope the library will be useful in dealing with complex concurrent systems, allowing simpler reasoning about your code, and making development easier and less error-prone.

Clone this wiki locally