/*
* @(#)Exchanger.java 1.13 06/03/30
*
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
*/
package java.util.concurrent;
import java.util.concurrent.atomic.*;
import java.util.concurrent.locks.LockSupport;
/**
* A synchronization point at which threads can pair and swap elements
* within pairs. Each thread presents some object on entry to the
* {@link #exchange exchange} method, matches with a partner thread,
* and receives its partner's object on return. An Exchanger may be
* viewed as a bidirectional form of a {@link SynchronousQueue}.
* Exchangers may be useful in applications such as genetic algorithms
* and pipeline designs.
*
* <p><b>Sample Usage:</b>
* Here are the highlights of a class that uses an {@code Exchanger}
* to swap buffers between threads so that the thread filling the
* buffer gets a freshly emptied one when it needs it, handing off the
* filled one to the thread emptying the buffer.
* <pre>{@code
* class FillAndEmpty {
* Exchanger<DataBuffer> exchanger = new Exchanger<DataBuffer>();
* DataBuffer initialEmptyBuffer = ... a made-up type
* DataBuffer initialFullBuffer = ...
*
* class FillingLoop implements Runnable {
* public void run() {
* DataBuffer currentBuffer = initialEmptyBuffer;
* try {
* while (currentBuffer != null) {
* addToBuffer(currentBuffer);
* if (currentBuffer.isFull())
* currentBuffer = exchanger.exchange(currentBuffer);
* }
* } catch (InterruptedException ex) { ... handle ... }
* }
* }
*
* class EmptyingLoop implements Runnable {
* public void run() {
* DataBuffer currentBuffer = initialFullBuffer;
* try {
* while (currentBuffer != null) {
* takeFromBuffer(currentBuffer);
* if (currentBuffer.isEmpty())
* currentBuffer = exchanger.exchange(currentBuffer);
* }
* } catch (InterruptedException ex) { ... handle ...}
* }
* }
*
* void start() {
* new Thread(new FillingLoop()).start();
* new Thread(new EmptyingLoop()).start();
* }
* }
* }</pre>
*
* <p>Memory consistency effects: For each pair of threads that
* successfully exchange objects via an {@code Exchanger}, actions
* prior to the {@code exchange()} in each thread
* <a href="package-summary.html#MemoryVisibility"><i>happen-before</i></a>
* those subsequent to a return from the corresponding {@code exchange()}
* in the other thread.
*
* @since 1.5
* @author Doug Lea and Bill Scherer and Michael Scott
* @param <V> The type of objects that may be exchanged
*/
public class Exchanger<V> {
/*
* Algorithm Description:
*
* The basic idea is to maintain a "slot", which is a reference to
* a Node containing both an Item to offer and a "hole" waiting to
* get filled in. If an incoming "occupying" thread sees that the
* slot is null, it CAS'es (compareAndSets) a Node there and waits
* for another to invoke exchange. That second "fulfilling" thread
* sees that the slot is non-null, and so CASes it back to null,
* also exchanging items by CASing the hole, plus waking up the
* occupying thread if it is blocked. In each case CAS'es may
* fail because a slot at first appears non-null but is null upon
* CAS, or vice-versa. So threads may need to retry these
* actions.
*
* This simple approach works great when there are only a few
* threads using an Exchanger, but performance rapidly
* deteriorates due to CAS contention on the single slot when
* there are lots of threads using an exchanger. So instead we use
* an "arena"; basically a kind of hash table with a dynamically
* varying number of slots, any one of which can be used by
* threads performing an exchange. Incoming threads pick slots
* based on a hash of their Thread ids. If an incoming thread
* fails to CAS in its chosen slot, it picks an alternative slot
* instead. And similarly from there. If a thread successfully
* CASes into a slot but no other thread arrives, it tries
* another, heading toward the zero slot, which always exists even
* if the table shrinks. The particular mechanics controlling this
* are as follows:
*
* Waiting: Slot zero is special in that it is the only slot that
* exists when there is no contention. A thread occupying slot
* zero will block if no thread fulfills it after a short spin.
* In other cases, occupying threads eventually give up and try
* another slot. Waiting threads spin for a while (a period that
* should be a little less than a typical context-switch time)
* before either blocking (if slot zero) or giving up (if other
* slots) and restarting. There is no reason for threads to block
* unless there are unlikely to be any other threads present.
* Occupants are mainly avoiding memory contention so sit there
* quietly polling for a shorter period than it would take to
* block and then unblock them. Non-slot-zero waits that elapse
* because of lack of other threads waste around one extra
* context-switch time per try, which is still on average much
* faster than alternative approaches.
*
* Sizing: Usually, using only a few slots suffices to reduce
* contention. Especially with small numbers of threads, using
* too many slots can lead to just as poor performance as using
* too few of them, and there's not much room for error. The
* variable "max" maintains the number of slots actually in
* use. It is increased when a thread sees too many CAS
* failures. (This is analogous to resizing a regular hash table
* based on a target load factor, except here, growth steps are
* just one-by-one rather than proportional.) Growth requires
* contention failures in each of three tried slots. Requiring
* multiple failures for expansion copes with the fact that some
* failed CASes are not due to contention but instead to simple
* races between two threads or thread pre-emptions occurring
* between reading and CASing. Also, very transient peak
* contention can be much higher than the average sustainable
* levels. The max limit is decreased on average 50% of the times
* that a non-slot-zero wait elapses without being fulfilled.
* Threads experiencing elapsed waits move closer to zero, so
* eventually find existing (or future) threads even if the table
* has been shrunk due to inactivity. The chosen mechanics and
* thresholds for growing and shrinking are intrinsically
* entangled with indexing and hashing inside the exchange code,
* and can't be nicely abstracted out.
*
* Hashing: Each thread picks its initial slot to use in accord
* with a simple hashcode. The sequence is the same on each
* encounter by any given thread, but effectively random across
* threads. Using arenas encounters the classic cost vs quality
* tradeoffs of all hash tables. Here, we use a one-step FNV-1a
* hash code based on the current thread's Thread.getId(), along
* with a cheap approximation to a mod operation to select an
* index. The downside of optimizing index selection in this way
* is that the code is hardwired to use a maximum table size of
* 32. But this value more than suffices for known platforms and
* applications.
*
* Probing: On sensed contention of a selected slot, we probe
* sequentially through the table, analogously to linear probing
* after collision in a hash table. (We move circularly, in
* reverse order, to mesh best with table growth and shrinkage
* rules.) Except that to minimize the effects of false-alarms
* and cache thrashing, we try the first selected slot twice
* before moving.
*
* Padding: Even with contention management, slots are heavily
* contended, so use cache-padding to avoid poor memory
* performance. Because of this, slots are lazily constructed
* only when used, to avoid wasting this space unnecessarily.
* While isolation of locations is not much of an issue at first
* in an application, as time goes on and garbage-collectors
* perform compaction, slots are very likely to be moved adjacent
* to each other, which can cause much thrashing of cache lines on
* MPs unless padding is employed.
*
* This is an improvement of the algorithm described in the paper
* "A Scalable Elimination-based Exchange Channel" by William
* Scherer, Doug Lea, and Michael Scott in Proceedings of SCOOL05
* workshop. Available at: http://hdl.handle.net/1802/2104
*/
/** The number of CPUs, for sizing and spin control */
private static final int NCPU = Runtime.getRuntime().availableProcessors();
/**
* The capacity of the arena. Set to a value that provides more
* than enough space to handle contention. On small machines
* most slots won't be used, but it is still not wasted because
* the extra space provides some machine-level address padding
* to minimize interference with heavily CAS'ed Slot locations.
* And on very large machines, performance eventually becomes
* bounded by memory bandwidth, not numbers of threads/CPUs.
* This constant cannot be changed without also modifying
* indexing and hashing algorithms.
*/
private static final int CAPACITY = 32;
/**
* The value of "max" that will hold all threads without
* contention. When this value is less than CAPACITY, some
* otherwise wasted expansion can be avoided.
*/
private static final int FULL =
Math.max(0, Math.min(CAPACITY, NCPU / 2) - 1);
/**
* The number of times to spin (doing nothing except polling a
* memory location) before blocking or giving up while waiting to
* be fulfilled. Should be zero on uniprocessors. On
* multiprocessors, this value should be large enough so that two
* threads exchanging items as fast as possible block only when
* one of them is stalled (due to GC or preemption), but not much
* longer, to avoid wasting CPU resources. Seen differently, this
* value is a little over half the number of cycles of an average
* context switch time on most systems. The value here is
* approximately the average of those across a range of tested
* systems.
*/
private static final int SPINS = (NCPU == 1) ? 0 : 2000;
/**
* The number of times to spin before blocking in timed waits.
* Timed waits spin more slowly because checking the time takes
* time. The best value relies mainly on the relative rate of
* System.nanoTime vs memory accesses. The value is empirically
* derived to work well across a variety of systems.
*/
private static final int TIMED_SPINS = SPINS / 20;
/**
* Sentinel item representing cancellation of a wait due to
* interruption, timeout, or elapsed spin-waits. This value is
* placed in holes on cancellation, and used as a return value
* from waiting methods to indicate failure to set or get hole.
*/
private static final Object CANCEL = new Object();
/**
* Value representing null arguments/returns from public
* methods. This disambiguates from internal requirement that
* holes start out as null to mean they are not yet set.
*/
private static final Object NULL_ITEM = new Object();
/**
* Nodes hold partially exchanged data. This class
* opportunistically subclasses AtomicReference to represent the
* hole. So get() returns hole, and compareAndSet CAS'es value
* into hole. This class cannot be parameterized as "V" because
* of the use of non-V CANCEL sentinels.
*/
private static final class Node extends AtomicReference<Object> {
/** The element offered by the Thread creating this node. */
public final Object item;
/** The Thread waiting to be signalled; null until waiting. */
public volatile Thread waiter;
/**
* Creates node with given item and empty hole.
* @param item the item
*/
public Node(Object item) {
this.item = item;
}
}
/**
* A Slot is an AtomicReference with heuristic padding to lessen
* cache effects of this heavily CAS'ed location. While the
* padding adds noticeable space, all slots are created only on
* demand, and there will be more than one of them only when it
* would improve throughput more than enough to outweigh using
* extra space.
*/
private static final class Slot extends AtomicReference<Object> {
// Improve likelihood of isolation on <= 64 byte cache lines
long q0, q1, q2, q3, q4, q5, q6, q7, q8, q9, qa, qb, qc, qd, qe;
}
/**
* Slot array. Elements are lazily initialized when needed.
* Declared volatile to enable double-checked lazy construction.
*/
private volatile Slot[] arena = new Slot[CAPACITY];
/**
* The maximum slot index being used. The value sometimes
* increases when a thread experiences too many CAS contentions,
* and sometimes decreases when a spin-wait elapses. Changes
* are performed only via compareAndSet, to avoid stale values
* when a thread happens to stall right before setting.
*/
private final AtomicInteger max = new AtomicInteger();
/**
* Main exchange function, handling the different policy variants.
* Uses Object, not "V" as argument and return value to simplify
* handling of sentinel values. Callers from public methods decode
* and cast accordingly.
*
* @param item the (non-null) item to exchange
* @param timed true if the wait is timed
* @param nanos if timed, the maximum wait time
* @return the other thread's item, or CANCEL if interrupted or timed out
*/
private Object doExchange(Object item, boolean timed, long nanos) {
Node me = new Node(item); // Create in case occupying
int index = hashIndex(); // Index of current slot
int fails = 0; // Number of CAS failures
for (;;) {
Object y; // Contents of current slot
Slot slot = arena[index];
if (slot == null) // Lazily initialize slots
createSlot(index); // Continue loop to reread
else if ((y = slot.get()) != null && // Try to fulfill
slot.compareAndSet(y, null)) {
Node you = (Node)y; // Transfer item
if (you.compareAndSet(null, item)) {
LockSupport.unpark(you.waiter);
return you.item;
} // Else cancelled; continue
}
else if (y == null && // Try to occupy
slot.compareAndSet(null, me)) {
if (index == 0) // Blocking wait for slot 0
return timed? awaitNanos(me, slot, nanos): await(me, slot);
Object v = spinWait(me, slot); // Spin wait for non-0
if (v != CANCEL)
return v;
me = new Node(item); // Throw away cancelled node
int m = max.get();
if (m > (index >>>= 1)) // Decrease index
max.compareAndSet(m, m - 1); // Maybe shrink table
}
else if (++fails > 1) { // Allow 2 fails on 1st slot
int m = max.get();
if (fails > 3 && m < FULL && max.compareAndSet(m, m + 1))
index = m + 1; // Grow on 3rd failed slot
else if (--index < 0)
index = m; // Circularly traverse
}
}
}
/**
* Returns a hash index for the current thread. Uses a one-step
* FNV-1a hash code (http://www.isthe.com/chongo/tech/comp/fnv/)
* based on the current thread's Thread.getId(). These hash codes
* have more uniform distribution properties with respect to small
* moduli (here 1-31) than do other simple hashing functions.
*
* <p>To return an index between 0 and max, we use a cheap
* approximation to a mod operation, that also corrects for bias
* due to non-power-of-2 remaindering (see {@link
* java.util.Random#nextInt}). Bits of the hashcode are masked
* with "nbits", the ceiling power of two of table size (looked up
* in a table packed into three ints). If too large, this is
* retried after rotating the hash by nbits bits, while forcing new
* top bit to 0, which guarantees eventual termination (although
* with a non-random-bias). This requires an average of less than
* 2 tries for all table sizes, and has a maximum 2% difference
* from perfectly uniform slot probabilities when applied to all
* possible hash codes for sizes less than 32.
*
* @return a per-thread-random index, 0 <= index < max
*/
private final int hashIndex() {
long id = Thread.currentThread().getId();
int hash = (((int)(id ^ (id >>> 32))) ^ 0x811c9dc5) * 0x01000193;
int m = max.get();
int nbits = (((0xfffffc00 >> m) & 4) | // Compute ceil(log2(m+1))
((0x000001f8 >>> m) & 2) | // The constants hold
((0xffff00f2 >>> m) & 1)); // a lookup table
int index;
while ((index = hash & ((1 << nbits) - 1)) > m) // May retry on
hash = (hash >>> nbits) | (hash << (33 - nbits)); // non-power-2 m
return index;
}
/**
* Creates a new slot at given index. Called only when the slot
* appears to be null. Relies on double-check using builtin
* locks, since they rarely contend. This in turn relies on the
* arena array being declared volatile.
*
* @param index the index to add slot at
*/
private void createSlot(int index) {
// Create slot outside of lock to narrow sync region
Slot newSlot = new Slot();
Slot[] a = arena;
synchronized (a) {
if (a[index] == null)
a[index] = newSlot;
}
}
/**
* Tries to cancel a wait for the given node waiting in the given
* slot, if so, helping clear the node from its slot to avoid
* garbage retention.
*
* @param node the waiting node
* @param the slot it is waiting in
* @return true if successfully cancelled
*/
private static boolean tryCancel(Node node, Slot slot) {
if (!node.compareAndSet(null, CANCEL))
return false;
if (slot.get() == node) // pre-check to minimize contention
slot.compareAndSet(node, null);
return true;
}
// Three forms of waiting. Each just different enough not to merge
// code with others.
/**
* Spin-waits for hole for a non-0 slot. Fails if spin elapses
* before hole filled. Does not check interrupt, relying on check
* in public exchange method to abort if interrupted on entry.
*
* @param node the waiting node
* @return on success, the hole; on failure, CANCEL
*/
private static Object spinWait(Node node, Slot slot) {
int spins = SPINS;
for (;;) {
Object v = node.get();
if (v != null)
return v;
else if (spins > 0)
--spins;
else
tryCancel(node, slot);
}
}
/**
* Waits for (by spinning and/or blocking) and gets the hole
* filled in by another thread. Fails if interrupted before
* hole filled.
*
* When a node/thread is about to block, it sets its waiter field
* and then rechecks state at least one more time before actually
* parking, thus covering race vs fulfiller noticing that waiter
* is non-null so should be woken.
*
* Thread interruption status is checked only surrounding calls to
* park. The caller is assumed to have checked interrupt status
* on entry.
*
* @param node the waiting node
* @return on success, the hole; on failure, CANCEL
*/
private static Object await(Node node, Slot slot) {
Thread w = Thread.currentThread();
int spins = SPINS;
for (;;) {
Object v = node.get();
if (v != null)
return v;
else if (spins > 0) // Spin-wait phase
--spins;
else if (node.waiter == null) // Set up to block next
node.waiter = w;
else if (w.isInterrupted()) // Abort on interrupt
tryCancel(node, slot);
else // Block
LockSupport.park(node);
}
}
/**
* Waits for (at index 0) and gets the hole filled in by another
* thread. Fails if timed out or interrupted before hole filled.
* Same basic logic as untimed version, but a bit messier.
*
* @param node the waiting node
* @param nanos the wait time
* @return on success, the hole; on failure, CANCEL
*/
private Object awaitNanos(Node node, Slot slot, long nanos) {
int spins = TIMED_SPINS;
long lastTime = 0;
Thread w = null;
for (;;) {
Object v = node.get();
if (v != null)
return v;
long now = System.nanoTime();
if (w == null)
w = Thread.currentThread();
else
nanos -= now - lastTime;
lastTime = now;
if (nanos > 0) {
if (spins > 0)
--spins;
else if (node.waiter == null)
node.waiter = w;
else if (w.isInterrupted())
tryCancel(node, slot);
else
LockSupport.parkNanos(node, nanos);
}
else if (tryCancel(node, slot) && !w.isInterrupted())
return scanOnTimeout(node);
}
}
/**
* Sweeps through arena checking for any waiting threads. Called
* only upon return from timeout while waiting in slot 0. When a
* thread gives up on a timed wait, it is possible that a
* previously-entered thread is still waiting in some other
* slot. So we scan to check for any. This is almost always
* overkill, but decreases the likelihood of timeouts when there
* are other threads present to far less than that in lock-based
* exchangers in which earlier-arriving threads may still be
* waiting on entry locks.
*
* @param node the waiting node
* @return another thread's item, or CANCEL
*/
private Object scanOnTimeout(Node node) {
Object y;
for (int j = arena.length - 1; j >= 0; --j) {
Slot slot = arena[j];
if (slot != null) {
while ((y = slot.get()) != null) {
if (slot.compareAndSet(y, null)) {
Node you = (Node)y;
if (you.compareAndSet(null, node.item)) {
LockSupport.unpark(you.waiter);
return you.item;
}
}
}
}
}
return CANCEL;
}
/**
* Creates a new Exchanger.
*/
public Exchanger() {
}
/**
* Waits for another thread to arrive at this exchange point (unless
* the current thread is {@linkplain Thread#interrupt interrupted}),
* and then transfers the given object to it, receiving its object
* in return.
*
* <p>If another thread is already waiting at the exchange point then
* it is resumed for thread scheduling purposes and receives the object
* passed in by the current thread. The current thread returns immediately,
* receiving the object passed to the exchange by that other thread.
*
* <p>If no other thread is already waiting at the exchange then the
* current thread is disabled for thread scheduling purposes and lies
* dormant until one of two things happens:
* <ul>
* <li>Some other thread enters the exchange; or
* <li>Some other thread {@linkplain Thread#interrupt interrupts} the current
* thread.
* </ul>
* <p>If the current thread:
* <ul>
* <li>has its interrupted status set on entry to this method; or
* <li>is {@linkplain Thread#interrupt interrupted} while waiting
* for the exchange,
* </ul>
* then {@link InterruptedException} is thrown and the current thread's
* interrupted status is cleared.
*
* @param x the object to exchange
* @return the object provided by the other thread
* @throws InterruptedException if the current thread was
* interrupted while waiting
*/
public V exchange(V x) throws InterruptedException {
if (!Thread.interrupted()) {
Object v = doExchange(x == null? NULL_ITEM : x, false, 0);
if (v == NULL_ITEM)
return null;
if (v != CANCEL)
return (V)v;
Thread.interrupted(); // Clear interrupt status on IE throw
}
throw new InterruptedException();
}
/**
* Waits for another thread to arrive at this exchange point (unless
* the current thread is {@linkplain Thread#interrupt interrupted} or
* the specified waiting time elapses), and then transfers the given
* object to it, receiving its object in return.
*
* <p>If another thread is already waiting at the exchange point then
* it is resumed for thread scheduling purposes and receives the object
* passed in by the current thread. The current thread returns immediately,
* receiving the object passed to the exchange by that other thread.
*
* <p>If no other thread is already waiting at the exchange then the
* current thread is disabled for thread scheduling purposes and lies
* dormant until one of three things happens:
* <ul>
* <li>Some other thread enters the exchange; or
* <li>Some other thread {@linkplain Thread#interrupt interrupts}
* the current thread; or
* <li>The specified waiting time elapses.
* </ul>
* <p>If the current thread:
* <ul>
* <li>has its interrupted status set on entry to this method; or
* <li>is {@linkplain Thread#interrupt interrupted} while waiting
* for the exchange,
* </ul>
* then {@link InterruptedException} is thrown and the current thread's
* interrupted status is cleared.
*
* <p>If the specified waiting time elapses then {@link
* TimeoutException} is thrown. If the time is less than or equal
* to zero, the method will not wait at all.
*
* @param x the object to exchange
* @param timeout the maximum time to wait
* @param unit the time unit of the <tt>timeout</tt> argument
* @return the object provided by the other thread
* @throws InterruptedException if the current thread was
* interrupted while waiting
* @throws TimeoutException if the specified waiting time elapses
* before another thread enters the exchange
*/
public V exchange(V x, long timeout, TimeUnit unit)
throws InterruptedException, TimeoutException {
if (!Thread.interrupted()) {
Object v = doExchange(x == null? NULL_ITEM : x,
true, unit.toNanos(timeout));
if (v == NULL_ITEM)
return null;
if (v != CANCEL)
return (V)v;
if (!Thread.interrupted())
throw new TimeoutException();
}
throw new InterruptedException();
}
}