Published on April 19, 2017 by Microsoft Research

Users demand for 24/7 dependability of cloud services. Unfulfilled dependability is costly, yet, there are complex challenges to reach an ideal dependability. Behind cloud computing is a collection of hundreds of complex systems written in millions of lines of code that are brittle and prone to failures. In this talk, I am discussing about one of unsolved problems in distributed systems, “distributed concurrency bugs”. Distributed concurrency bugs are caused by nondeterministic orders of distributed events such as message arrivals, crashes, and reboots. I am presenting my insight I gain from our bug study, which can help many research on bug combating. And I am presenting my effort to advance distributed system model checker to unearth hidden bugs in systems. I am proposing a principle of semantic awareness to tackle the major problem of model checker, “state space explosion”. In this work, I am showing that leveraging semantic knowledge of systems under test can help model checker finds bugs 2x – 340x faster than state of the art.

See more on this video at

Leave a Reply

1 Comment on "Unearthing Concurrency Bugs in Cloud-Scale Distributed Systems"

Notify of

Sandy Staab
Sandy Staab
11 months 22 days ago

Too difficult to understand his English.