Replace synchronized with atomic+while loop in case of low lock contention

atomic operation in java
compare and swap java
atomicinteger vs synchronized
atomic instructions
atomic operations c++
atomic primitives
compare and swap vs test and set
compare and swap loop

I have two functions which must run in a critical section:

public synchronized void f1() { ... }
public synchronized void f2() { ... }

Assume that the behavior is as following:

  • f1 is almost never called. Actually, under normal conditions, this method is never called. If f1 is called anyway, it should return quickly.
  • f2 is called at a very high rate. It returns very quickly.
  • These methods never call each other and there is no reentrancy as well.

In other words, there is very low contention. So when f2 is called, we have some overhead to acquire the lock, which is granted immediately in 99,9% of the cases. I am wondering if there are approaches to avoid this overhead.

I came up with the following alternative:

private final AtomicInteger lock = new AtomicInteger(0);

public void f1() {
    while (!lock.compareAndSet(0, 1)) {}

    try {
        ...
    } finally {
        lock.set(0);
    }
}

public void f2() {
    while (!lock.compareAndSet(0, 2)) {}

    try {
        ...
    } finally {
        lock.set(0);
    }
}

Are there other approaches? Does the java.util.concurrent package offer something natively?

update

Although my intention is to have a generic question, some information regarding my situation:

f1: This method creates a new remote stream, if for some reason the current one becomes corrupt, for example due to a timeout. A remote stream could be considered as a socket connection which consumes a remote queue starting from a given location:

private Stream stream;

public synchronized void f1() {
     final Stream stream = new Stream(...);

     if (this.stream != null) {
         stream.setPosition(this.stream.getPosition());
     }
     this.stream = stream;
     return stream;
}

f2: This method advances the stream position. It is a plain setter:

public synchronized void f2(Long p) {
    stream.setPosition(p);
}

Here, stream.setPosition(Long) is implemented as a plain setter as well:

public class Stream {
    private volatile Long position = 0;

    public void setPosition(Long position) {
        this.position = position;
    }
}

In Stream, the current position will be sent to the server periodically asynchronously. Note that Stream is not implemented by myself.

My idea was to introduce compare-and-swap as illustrated above, and mark stream as volatile.

From the description and your example code, I've inferred the following:

  1. Stream has its own internal position, and you're also tracking the most recent position externally. You use this as a sort of 'resume point': when you need to reinitialize the stream, you advance it to this point.

  2. The last known position may be stale; I'm assuming this based on your assertion that the stream periodically does asynchronously notifies the server of its current position.

  3. At the time f1 is called, the stream is known to be in a bad state.

  4. The functions f1 and f2 access the same data, and may run concurrently. However, neither f1 nor f2 will ever run concurrently against itself. In other words, you almost have a single-threaded program, except for the rare cases when both f1 and f2 are executing.

    [Side note: My solution doesn't actually care if f1 gets called concurrently with itself; it only cares that f2 is not called concurrently with itself]

If any of this is wrong, then the solution below is wrong. Heck, it might be wrong anyway, either because of some detail left out, or because I made a mistake. Writing low-lock code is hard, which is exactly why you should avoid it unless you've observed an actual performance issue.

static class Stream {
    private long position = 0L;

    void setPosition(long position) {
        this.position = position;
    }
}

final static class StreamInfo {
    final Stream stream = new Stream();
    volatile long resumePosition = -1;

    final void setPosition(final long position) {
        stream.setPosition(position);
        resumePosition = position;
    }
}

private final Object updateLock = new Object();
private final AtomicReference<StreamInfo> currentInfo = new AtomicReference<>(new StreamInfo());

void f1() {
    synchronized (updateLock) {
        final StreamInfo oldInfo = currentInfo.getAndSet(null);
        final StreamInfo newInfo = new StreamInfo();

        if (oldInfo != null && oldInfo.resumePosition > 0L) {
            newInfo.setPosition(oldInfo.resumePosition);
        }

        // Only `f2` can modify `currentInfo`, so update it last.
        currentInfo.set(newInfo);

        // The `f2` thread might be waiting for us, so wake them up.
        updateLock.notifyAll();
    }
}

void f2(final long newPosition) {
    while (true) {
        final StreamInfo s = acquireStream();

        s.setPosition(newPosition);
        s.resumePosition = newPosition;

        // Make sure the stream wasn't replaced while we worked.
        // If it was, run again with the new stream.
        if (acquireStream() == s) {
            break;
        }
    }
}

private StreamInfo acquireStream() {
    // Optimistic concurrency: hope we get a stream that's ready to go.
    // If we fail, branch off into a slower code path that waits for it.
    final StreamInfo s = currentInfo.get();
    return s != null ? s : acquireStreamSlow();
}

private StreamInfo acquireStreamSlow() {
    synchronized (updateLock) {
        while (true) {
            final StreamInfo s = currentInfo.get();

            if (s != null) {
                return s;
            }

            try {
                updateLock.wait();
            }
            catch (final InterruptedException ignored) {
            }
        }
    }
}

If the stream has faulted and is being replaced by f1, it is possible that an earlier call to f2 is still performing some operations on the (now defunct) stream. I'm assuming this is okay, and that it won't introduce undesirable side effects (beyond those already present in your lock-based version). I make this assumption because we've already established in the list above that your resume point may be stale, and we also established that f1 is only called once the stream is known to be in a bad state.

Based on my JMH benchmarks, this approach is around 3x faster than the CAS or synchronized versions (which are pretty close themselves).

Lock-free multithreading with atomic operations, Synchronizing threads at a lower level. A task performed by a computer is said to be atomic when it is not divisible The worst-case scenario: on a platform that doesn't provide atomic operations it The thread that is running the CAS loop wouldn't notice the change and perform the swap successfully. 2 Replace synchronized with atomic+while loop in case of low lock contention Nov 7 '18. 1 Counting lines / max character in line with scanner Aug 31 '18.

Your example isn't doing what you want it to. You are actually executing your code when the lock is being used. Try something like this:

public void f1() {
    while (!lock.compareAndSet(0, 1)) {
    }

    try {
        ...
    } finally {
        lock.set(0);
    }
}

To answer your question, I don't believe that this will be any faster than using synchronized methods, and this method is harder to read and comprehend.

How CAS (Compare And Swap) in Java works, public class MyApp { private int count = 0; public synchronized void upateVisitors​() In case of x86 architecture it is just a single CPU instruction LOCK XADD which might yield better performance than classic load CAS loop. variables but in realistic contention levels atomic variables outperform lock. For example, incrementing a reference count using LOCK INC [mem] on an Intel Atom core (an in-order design) has essentially the same cost as a regular INC [mem] instruction, and somewhat more complicated atomic operations like exchange or exchange-add end costing about 2x to 3x as much as a “vanilla” memory read-modify-write instruction. By

Another approach is to use a timestamp lock which works like a modification count. This works well if you have a high read to write ratio.

Another approach is to have an immutable object which stores state via an AtomicReference. This works well if you have a very high read to write ratio.

Java theory and practice: Going atomic, The addition of the atomic variable classes in java.util.concurrent changes that situation. at a lower cost than that of synchronization, but they have limitations. While writes to volatile variables are guaranteed to be immediately visible that is experiencing contention is to reduce the granularity of the lock  4) the dispatch method called from the servitore-thread, takes the lock on the event that the subscriber sent to it as function argument, process the dispatchment then notify the accomplishment of the dispatching witch an event.notify(), this one that in a synchronized area the System is waiting to end his Spublish method

Improving lock performance in Java, Lock contention occurs when a thread is trying to enter the As a matter of fact, synchronization in JVM is optimized for the uncontended case and for the In the minimalistic example above, it might not change much at the first place. Replacing the Integer in count tracking with AtomicInteger would most  When a thread releases an intrinsic lock, a happens-before relationship is established between that action and any subsequent acquisition of the same lock. Locks In Synchronized Methods. When a thread invokes a synchronized method, it automatically acquires the intrinsic lock for that method's object and releases it when the method returns. The

Non-blocking Algorithms, The Single Writer Case; More Advanced Data Structures Based on Volatile Variables The writing of a value to a volatile variable is an atomic operation. be set to false , and the thread will spin one more time around the while loop. Optimistic locking tends to work best with low to medium contention on  The synchronized hydronic loop offers solutions to many of the problems associated with conventional hydronic systems. It also proves that a systems-based approach, as opposed to selecting components individually, is necessary to maximize efficiency, decrease downtime and, ultimately, improve occupant comfort and the bottom line.

Synchronization: Why and How to Avoid It, Here, we cover synchronization mechanisms and alternatives to and use them to illustrate the use of mutexes, locks, atomic variables, and thread local storage. In this case, the histogram represents the number of pixels (y-axis) Only when the probability of contention is low and the time to execute  Synchronized blocks in Java are marked with the [code ]synchronized[/code] keyword. A synchronized block in Java is synchronized on some object. All synchronized blocks synchronized on the same object can only have one thread executing inside them

Comments
  • Why are they synchronized? What resource is being accessed concurrently? This is what you should focus on
  • @JeanLogeart I follow your thought process, but I would like to omit that from the question. That is, consider this is a generic/educational question.
  • Those while loops become busy loops until the lock value is updated. Does profiling show a bottleneck?
  • @AndrewS In this specific scenario, the while loop in f2 is almost always an if. Again, this is a generic question.
  • If you want mutual exclusion, synchonized is a better and more natural choice. You better block the thread and not consume any CPU than burning CPU cycles to check if you are not blocked any more. Those cycles can be used for computation for some other thread.
  • Thank you for your detailed answer, very much appreciated. The idea of having a 'fast' and a 'slow' path to acquire the stream is an interesting one. Though, I am not sure if the code above is correct. Let me explain. Assume thread t1 is executing f1 and is about to execute currentInfo = newInfo;. Now, thread t2 executes f2(position). Assume that acquireStream() can just quickly return the current stream. Now the calls s.setPosition(newPosition); and s.resumePosition = newPosition; will be lost forever, because t1 is not seeing that anymore in f1. Or am I missing something?
  • You’re correct, and the code assumes that f1 is only called once the stream is known to have faulted, so any concurrent work done by f2 is "lost" (in the sense that it would be unable to complete since the "old" stream is broken).
  • Right, I did take that inferred assumption into account. Though I am not sure if I understand your assumption and if it is practical. Typically, f1 is only called when the stream/socket is broken, for example due to a disconnect. Unless you have a way to 'pause' other threads (which might call f2), you always have a critical section between f1 and f2? So I am not really sure what you mean with "the code assumes that f1 is only called once the stream is known to have faulted".
  • What I mean is, if f1 is only called when the stream is in a bad state, then what does it matter if f2 runs concurrently with the old stream? When f2 tries to modify or consume the (bad) stream, wouldn't it just fail? If this is a flawed assumption, then as you say, my solution is broken.
  • The main concern is that the position is inherited from the old stream. As you could see in the code snippet, Stream.setPosition(pos) is actually a simple setter. Is does not do any network IO. So whenever a new stream is created, I want to make sure that it always inherits the position from the old (bad) stream. That's why I believe there is a critical section between f1 and f2 which cannot be avoided.
  • Oops, thank you. I altered my question to reflect this.
  • @pbillen Ah okay, got it. Have you run a test with the updated version? I wouldn't expect it to be much faster (if any) than a simple synchronized method.
  • "I don't believe that this will be any faster than using synchronized methods" Not true. Using synchronized is about 50% slower than the atomic int version, at least on my machine