027: Scaling – CodePen

This week on CodePen Radio, we’re talking about Scaling. We’re returning to the land of servers to discuss how we scaled CodePen.

What is scaling?

1:30 When we talk about scaling, we’re talking about our backend servers being able to handle the growth of our user base. We’re really lucky to even have this problem. We’ve been able to set up our backend infrastructure to handle future growth on CodePen.

Scaling is making sure your website won’t crash because of your user growth.

How did we setup CodePen?

3:43 Most web apps start on one server. Usually because it’s easy, it’s affordable, and you don’t really need more.

4:06 We started with a Rails application, a MySQL instance, and Reddis, all on a single server. Finances were the main reason. We bought what we thought was a big box, and a good investment, but it ended up being underpowered, so we upgraded and kept it on the sidelines.

4:53 We prepaid for access to the server instance. You can buy access from Amazon Web Services in one or three year increments. It’s not really a good idea to buy access for 3 years, since Amazon regularly drops prices, but buying access for a year ends up being a pretty good deal. If you know you’re going to need access to a server for awhile, we’d suggest paying up front and getting that discount.

Getting started

6:12 You buy a server and you build a web application. You don’t need a bunch of servers to get started. We started with a single server. After growing for awhile, we needed to start upgrading.

There are two important things that we did:

We moved the preprocessors onto their own servers.
We didn’t know when we started, but it was super insecure to run preproccessors on the same server as our app. Also, the load was unpredictable, because we had no way of knowing which pieces of our app were using up the most processor power and memory. So we moved the preprocessors off of our main server. We ended up using two servers for the preprocessor; one for load balancing, which takes requests and then shares tasks with the other server, which is dedicated to sharing the load.
We signed up for monitoring with New Relic.

11:21 You need to know when your CPU or memory usage is too high, so you’ll want to sign up for monitoring by New Relic. They actually provide this service for free, but they do have paid plans. Their basic monitoring provides data about your server for 7 days.

13:12 When we started, we were looking at our stats on New Relic all the time, because we hadn’t learned where the bottlenecks were yet. This is why it’s so important to sign up for monitoring right away, so you can get to know your application and learn about the resources being used on your server, and how optimizing can help you scale.

14:15 New Relic also provides alerts: if your CPU is being overloaded, or your disk is filling up, you can receive alerts so you can fix the problem before your server goes down.

15:30 New Relic has some extremely powerful tools that you can only get in the Pro plan.

Next steps

16:01 We’re taking our next steps in scaling CodePen. We’ve reached the point where running our own servers is actually not the optimal setup.

We kept hitting all these bottlenecks, and we didn’t know how to diagnose them, so we weren’t able to scale.

We’re web developers, and we specialize in the Ruby rails, so setting up Ruby on Rails was natural. We also set up Node.js, but it didn’t go as easily as we thought it would. We hit a bottleneck with Socket.io, so we tried a different service called Faye. Faye wasn’t working out either, so we switched to a service called PubNub. Now we don’t have to write any server logic, it’s all handled for us, so we can focus on what we’re good at, which is Ruby development.

Part of scaling a company is learning what you do and don’t do well.

20:17 There’s going to be a point where we’ll have to shard the database, because it’s already massive. There are scaling problems we’ll face in the future, but we’ll cross that bridge when we get there.

How code affects scaling

20:52 The code that you write affects the way your web app scales.

Some lessons we’ve learned:

Using more powerful servers can hide problems you could solve with better code.

22:22 Get the low-hanging fruit problems in your code taken care of before you scale up to bigger servers. Look at the code that is causing bottlenecks, and see if you can rewrite it to be more efficient.

27:50 You need to take error alerts about your server very seriously. A good way to do this is to avoid letting errors to build up. Don’t become desensitized to notifications. Don’t allow yourself to become overwhelmed by unnecessary alerts, or you could end up missing something important among all the noise.

Take your error notifications very seriously.

30:58 We were notified recently that our database had about three months left before the current growth rate would fill the disk space. But what we didn’t know was that there were spikes of usage, and we almost hit the limit (at one point, our server was at 98% disk usage). So we realized that we had to get a new server set up, right away, next couple of days, or everything would come crashing down.

Trust the monitoring tools, but stay vigilant and make sure you’re keeping an eye on everything.

If you’re enjoying this show, please take a minute to leave us a review in iTunes. We really appreciate it, and thanks to everyone who has already left a review! (We read all of them!)

Show Links: