Yesterday “Are You Normal?” was listed on the front page of Facebook’s “Recently Popular” apps page which was a great surprise and fantastic exposure for AYN. But it meant that the development team had to accelerate a whole host of optimizations that we had planned. It is a little bit like a restaurant getting a surprise review in the New York Times. Everybody shows up at once and lines up around the block. Your kitchen has to serve five times as many meals as it is used to. The wait staff needs to juggle more tables. The host needs to deal with irate customers tired of waiting in line.
There is one big difference though: a restaurant with a long lineup outside is sort of glamorous. A website with slow page load times is sort of embarrassing.
So what did we do to handle a sudden influx of five times the load? Basically, everybody scrambled to do their part.
The two major categories of levers we can pull are “hardware” and “software”.
We were helped tremendously on the hardware side by our relationship with Joyent. Within minutes they were able to allocate more memory to our database box. Within hours we could double the number of virtual machines allocated to our web server front ends running Ruby on Rails. Joyent uses some sophisticated Solaris (ZFS) tricks to move or duplicate machines with little or no downtime. (I say “little or no” because it depends whether you are moving an app server or a database server.)
Joyent’s Mark Mayo walked us through the process and did the parts that required Joyent admin access (configuring the load balancer, instantiating new virtual machines). Although we hope and expect to see more automation on the Joyent side in upcoming months, the personal service provided went far beyond copying files and configuring devices. Mark helped us find and analyze hotspots based upon vast experience with similar apps both within the Facebook platform and elsewhere. Mark helped us to quickly reconfigure our database server (both hardware and software) to increase performance. That freed my team to focus on the application logic.
What my team focused on was primarily caching. We wanted to make sure that we never asked the database the same question twice. Caching is tricky though, because sometimes you can ask the same question twice and get legitimately different answers. For example, consider asking a weatherman the current probability of precipitation. You don’t want to ask once a minute. Humidity tends not to fluctuate that quickly and it’s probably better for him to spend his time doing something predictive. On the other hand, if you only ask once a week then you’ll spend many rainy days at the beach. So you need to decide what is the right middle ground between accuracy and workload. Even more tricky is to determine which events completely invalidate your answer. For example, if the wind changes direction, does that mean you should throw out the prediction of POP?
So caching is important and fairly simple to understand but challenging to implement. That makes it all the more admirable that one of our developers, Ian Suda, was able to do it with relatively few mistakes, despite the stress of the rush and a pre-existing migraine headache. We already had some caching in place and planned more, but the Facebook deluge forced us to think quickly and accelerate our plans.
Things are a bit quieter today, but we picked up thousands of new users and will use the new capacity to cope with the “new normal” level of activity. We’re also planning for that next burst of activity. We look forward to future traffic challenges: as any restaurateur will tell you, being too busy is better than being ignored.



This is an excellent front-lines and tech-talk post Paul! Nice to learn about some of the aspects of back end planning. More of the techy posts would be stellar. Maybe get the tech team to do some guest posts as well on their experiences…
Glad I could be of help, Paul! It’s always a thrill to see a customer succeed!