Very interesting blog post by Todd Hoff at highscalability.com presenting “7 Years of YouTube Scalability Lessons in 30 min” based on a presentation from Mike Solomon, one of the original engineers at YouTube:
…. The key takeaway away of the talk for me was doing a lot with really simple tools. While many teams are moving on to more complex ecosystems, YouTube really does keep it simple. They program primarily in Python, use MySQL as their database, they’ve stuck with Apache, and even new features for such a massive site start as a very simple Python program.
That doesn’t mean YouTube doesn’t do cool stuff, they do, but what makes everything work together is more a philosophy or a way of doing things than technological hocus pocus. What made YouTube into one of the world’s largest websites? Read on and see...
- 4 billion Views a day
- 60 hours of video is uploaded every minute
- 350+ million devices are YouTube enabled
- Revenue double in 2010
- The number of videos has gone up 9 orders of magnitude and the number of developers has only gone up two orders of magnitude.
- 1 million lines of Python code
- Python - most of the lines of code for YouTube are still in Python. Everytime you watch a YouTube video you are executing a bunch of Python code.
- Apache - when you think you need to get rid of it, you don’t. Apache is a real rockstar technology at YouTube because they keep it simple. Every request goes through Apache.
- Linux - the benefit of Linux is there’s always a way to get in and see how your system is behaving. No matter how bad your app is behaving, you can take a look at it with Linux tools like strace and tcpdump.
- MySQL - is used a lot. When you watch a video you are getting data from MySQL. Sometime it’s used a relational database or a blob store. It’s about tuning and making choices about how you organize your data.
- Vitess- a new project released by YouTube, written in Go, it’s a frontend to MySQL. It does a lot of optimization on the fly, it rewrites queries and acts as a proxy. Currently it serves every YouTube database request. It’s RPC based.
- Zookeeper - a distributed lock server. It’s used for configuration. Really interesting piece of technology. Hard to use correctly so read the manual
- Wiseguy - a CGI servlet container.
- Spitfire - a templating system. It has an abstract syntax tree that let’s them do transformations to make things go faster.
- Serialization formats - no matter which one you use, they are all expensive. Measure. Don’t use pickle. Not a good choice. Found protocol buffers slow. They wrote their own BSON implementation, which is 10-15 time faster than the one you can download.
Read the blog
Watch the video