Operating a Large, Distributed System in a Reliable Way: Practices I Learned

by bill-s, 2019-07-22T02:07:48.899Z

For the past few years, I've been building and operating a large distributed system: the payments system at Uber. I've learned a lot about distributed architecture concepts during this time and seen first-hand how high-load and high-availability systems are challenging not just to build, but to operate as well. Building the system itself is a fun job. Planning how the system will handle 10x/100x traffic increase, ensuring data is durable, regardless of hardware failures is intellectually rewarding. However, operating a large, distributed system has been an eye-opening experience for myself.

Read More