Concurrency bugs in Lucene: How to fix optimistic concurrency failures

(elastic.co)

55 points | by aoli-al2 天前

4 comments

aoli-al18 小时前
I’m the author of Fray, a concurrency testing framework for the JVM, and I’m excited to finally share what I’ve been building over the past few years!
Fray[1] is a concurrency testing tool for Java that can help you find and debug tricky race conditions that manifest as assertion violations, run-time exceptions, or deadlocks. I’d love to hear your thoughts—feel free to ask me anything! And if you’re curious, give Fray a try.
[1]: https://github.com/cmu-pasta/fray
- _benedict18 小时前
  We have something very similar[1] we use in the Apache Cassandra project to test complex cluster behaviours.
  We appear to use exactly the same basic technique, using byte weaving to intercept concurrency primitives such as synchronized, LockSupport etc to pause the system thread and run them on some schedule.
  We only currently run (deterministic) probabilistic traces though, we can’t search the interleaving space. But the traces for a whole cluster are extremely complex and probably unsearchable.
  I have been meaning to publish it for broader consumption for years now, but there’s always something more important to do. It’s great to see some dedicated efforts in this space.
  [1] https://github.com/apache/cassandra/tree/trunk/test/simulato...
  aoli-al16 小时前
  This looks super cool!
  It seems that all controlled threads are wrapped with `InterceptibleThread` in the Cassandra simulator. Does this work for ThreadPools (e.g., ForkJoinPool) as well? We had a hard time intercepting thread objects because they are used by the language runtime (e.g., GC threads) as well and we don’t want to interfere with them. Additionally, modifying application code just track thread creation isn’t ideal. To work around this, we came up with this combination of JVMTi and Java Agent solution and we use JVMTi to monitor thread creation and termination.
  As for searching schedules, yes, it is hard to search all possible schedules. However, it turns out many searching algorithms such as probabilistic concurrency testing[1] or partial order sampling[2] are still better than random walk. So it is worth to give them a try.
  [1] https://www.microsoft.com/en-us/research/wp-content/uploads/... [2] https://www.cs.columbia.edu/~junfeng/papers/pos-cav18.pdf
  _benedict15 小时前
  We do currently require all threads to be created by one of our own factories, but that's primarily because this grew out of a non-byte weaving approach (where we explicitly replaced our concurrency primitives). Looking at the class now, all of its state could easily be stashed in either global or ThreadLocal variables, so I don't see anything that would stop us working with FJP etc.
  > Additionally, modifying application code just track thread creation isn’t ideal.
  This would certainly be necessary, but don't you anyway need to rewrite the application to trap synchronised, volatile, atomic accesses etc? It doesn't seem all that different to rewrite calls to Thread::start. The issue of JVM threads is perhaps a little trickier, but I am not averse to some ugly integrations. Just take a look at how we make RNGs deterministic
  > So it is worth to give them a try.
  Thanks for the tips! I am not sure when I will have time to apply these techniques to our simulator, but they are no doubt valuable for the protocol simulations I am relying on today, so maybe I will have a justification to explore them sometime soon.
  Really cool work too. I hope it manages to make its way into more hands, so that this technique can be used more widely.
- vlovich12314 小时前
  How does this compare with a generic tool like Antithesis? I recognize closed source money vs open source free but from a feature perspective would Antithesis be more effective at finding the issues since it’s not limited to stuff happening in the JVM / can test concurrency of more complicated network topologies between components?
  aoli-al13 小时前
  AFAIK, Antithesis uses a hypervisor to achieve deterministic execution. This can be less effective because the hypervisor does not have language semantics and faces a larger search space. You may check Figures 5 and 6 in our technical report[1], where we compare Fray against RR, a record and replay tool that can also be used for concurrency testing at OS level[2].
  [1]: https://arxiv.org/pdf/2501.12618
  [2]: https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo...
vlovich12314 小时前
> The motivation behind building Fray stems from a noticeable gap between academia and industry: while deterministic concurrency testing has been extensively studied in academic research for over 20 years, practitioners continue to rely on stress testing—a method widely acknowledged as unreliable and flaky—to test their concurrent programs.
To be fair, the gap is because writing the tests is the hard part. Tests for deterministic testing frameworks can be more complicated because you have to simulate more complex situations with more components interacting (otherwise the simpler targeted tests would have caught your bug). So it works well in terms of making your existing integration tests gain extra value, but the complexity of writing and maintaining those integration tests is the actual challenge.
Don’t get me wrong. I love deterministic simulation testing and along with property tests and mutation testing it’s best in class techniques for having confidence in the efficacy of your tests. Just that the challenges are on the less sexy side of writing the tests whereas academia focuses on the sexy frameworks piece.
- aoli-al13 小时前
  Using Fray does not require knowledge about "deterministic testing" or "controlled concurrency." This is one of its goals: developers write normal concurrency tests, and Fray controls the execution behind the scenes.
  In fact, when we evaluate Fray, we collect all existing concurrency tests from Lucene, Kafka, and Guava, and running them under different thread inter-leavings can already reveal so many bugs. [1]
  [1]: https://github.com/cmu-pasta/fray/blob/main/docs/bugs.md
  vlovich12313 小时前
  Writing good “normal” concurrency tests is hard is what I’m saying. I get that it slots in well with existing tests that are already written.
comrade123419 小时前
Was Lucene the project that started at apple as part of cyberdog or whatever that old email client was called?
Depressing that apple mail had better search 25 years ago than today.
- softwaredoug19 小时前
  I believe it was Doug Cuttings side project to learn Java. It was like his 5th search engine.
  comrade123419 小时前
  I see on the wiki his third was at apple. It must have been in cyberdog because the search results were so good and it also had similar search rules/wildcards/etc to Lucene later.
  I remember working with lucene around 2000/2001 and how good the results were.
nyanpasu6414 小时前
How do you know a program is free of data races?
- aoli-al13 小时前
  Fray does not know if a program is free of data races. Even if there are data races in a program, Fray can still find bugs, but this violates the soundness guarantee, so Fray may miss data race bugs.