Presented in Moscow in December 2016 at Heisenbug conference and in Yekaterinburg in April 2017. Video and slides are in Russian.
# Abstract
Distributed systems meet us on a professional way more often and often. Modern popular sites and applications contain “under the hood” a distributed system — they challenge developers due to the fundamental complexity of their development, and a huge range of possible compromises in design. Andrey will talk about that part of challenges which are present in testing, about existing limitations and their impact on functionality. Issues will be covered:
- How distributed systems are different from centralized systems?
- What does it all mean for testing?
- What properties and characteristics must be checked in distributed systems and how to do it?
- Which approaches to testing of distributed systems are there and what problems do they solve?
- What problems do remain unresolved?
The talk is built on an example of persistent distributed queue, which is being developed at Yandex. Attendees will learn how and what was tested by Andrey along Yandex team and what results were obtained.
# Materials
Download slides in Russian (PDF)
# References
- Testing Distributed Systems — curated list of resources on testing distributed systems
- “Simple Testing Can Prevent Most Critical Failures” — great paper with overview of different defect types in distributed systems and how to find them. If you have time to read only one paper this is the one.
- Inside Yandex: Data Storage and Processing Infrastructure — several talks on data infrastructure at Yandex (in Russian)
- Talks by Kyle Kingsbury (Aphyr) — if you are testing distributed systems you must be familiar with Kyle’s work
Not exactly references, but you could check out interviews with me on testing distributed systems, one in November 2016 and another one in June 2017 (both in Russian).
# Other versions
Shorter version of this talk was presented at DUMP 2017 conference in Yekaterinburg in April 2017.