This week on CodePen Radio, we're talking about Scaling. We're returning to the land of servers to discuss how we scaled CodePen.

What is scaling?

1:30 When we talk about scaling, we're talking about our backend servers being able to handle the growth of our user base. We're really lucky to even have this problem. We've been able to set up our backend infrastructure to handle future growth on CodePen.

Scaling is making sure your website won't crash because of your user growth.

How did we setup CodePen?

3:43 Most web apps start on one server. Usually because it's easy, it's affordable, and you don't really need more.

4:06 We started with a Rails application, a MySQL instance, and Reddis, all on a single server. Finances were the main reason. We bought what we thought was a big box, and a good investment, but it ended up being underpowered, so we upgraded and kept it on the sidelines.

4:53 We prepaid for access to the server instance. You can buy access from Amazon Web Services in one or three year increments. It's not really a good idea to buy access for 3 years, since Amazon regularly drops prices, but buying access for a year ends up being a pretty good deal. If you know you're going to need access to a server for awhile, we'd suggest paying up front and getting that discount.

Getting started

6:12 You buy a server and you build a web application. You don't need a bunch of servers to get started. We started with a single server. After growing for awhile, we needed to start upgrading.

There are two important things that we did:

  1. We moved the preprocessors onto their own servers.

    We didn't know when we started, but it was super insecure to run preproccessors on the same server as our app. Also, the load was unpredictable, because we had no way of knowing which pieces of our app were using up the most processor power and memory. So we moved the preprocessors off of our main server. We ended up using two servers for the preprocessor; one for load balancing, which takes requests and then shares tasks with the other server, which is dedicated to sharing the load.

  2. We signed up for monitoring with New Relic.

11:21 You need to know when your CPU or memory usage is too high, so you'll want to sign up for monitoring by New Relic. They actually provide this service for free, but they do have paid plans. Their basic monitoring provides data about your server for 7 days.

13:12 When we started, we were looking at our stats on New Relic all the time, because we hadn't learned where the bottlenecks were yet. This is why it's so important to sign up for monitoring right away, so you can get to know your application and learn about the resources being used on your server, and how optimizing can help you scale.

14:15 New Relic also provides alerts: if your CPU is being overloaded, or your disk is filling up, you can receive alerts so you can fix the problem before your server goes down.

15:30 New Relic has some extremely powerful tools that you can only get in the Pro plan.

Next steps

16:01 We're taking our next steps in scaling CodePen. We've reached the point where running our own servers is actually not the optimal setup.

We kept hitting all these bottlenecks, and we didn't know how to diagnose them, so we weren't able to scale.

We're web developers, and we specialize in the Ruby rails, so setting up Ruby on Rails was natural. We also set up Node.js, but it didn't go as easily as we thought it would. We hit a bottleneck with Socket.io, so we tried a different service called Faye. Faye wasn't working out either, so we switched to a service called PubNub. Now we don't have to write any server logic, it's all handled for us, so we can focus on what we're good at, which is Ruby development.

Part of scaling a company is learning what you do and don't do well.

20:17 There's going to be a point where we'll have to shard the database, because it's already massive. There are scaling problems we'll face in the future, but we'll cross that bridge when we get there.

How code affects scaling

20:52 The code that you write affects the way your web app scales.

Some lessons we've learned:

Using more powerful servers can hide problems you could solve with better code.

22:22 Get the low-hanging fruit problems in your code taken care of before you scale up to bigger servers. Look at the code that is causing bottlenecks, and see if you can rewrite it to be more efficient.

27:50 You need to take error alerts about your server very seriously. A good way to do this is to avoid letting errors to build up. Don't become desensitized to notifications. Don't allow yourself to become overwhelmed by unnecessary alerts, or you could end up missing something important among all the noise.

Take your error notifications very seriously.

30:58 We were notified recently that our database had about three months left before the current growth rate would fill the disk space. But what we didn't know was that there were spikes of usage, and we almost hit the limit (at one point, our server was at 98% disk usage). So we realized that we had to get a new server set up, right away, next couple of days, or everything would come crashing down.

Trust the monitoring tools, but stay vigilant and make sure you're keeping an eye on everything.


If you're enjoying this show, please take a minute to leave us a review in iTunes. We really appreciate it, and thanks to everyone who has already left a review! (We read all of them!)

Show Links:

Comments

  • g_goodman

    Hi Chris, thanks for sharing.

    I’ve read your notes on the talk but haven’t had a chance to listen to the full talk quite yet.

    As someone who runs a similar (but smaller-scale) service I’m in the process of hitting some of those growth pains right now. It certainly is a challenge to scale out beyond the quaint single-server setup when your team has little to no devops experience. In my case, In my case, I’m in it alone for the time being and have no devops experience and so my biggest time-sink in moving forward isn’t writing code but is instead ramping up on cloud offerings, PaaS and SaaS offerings and trying to find a happy medium between high cost (but easy maintainability) and low-cost but higher risk in-house approaches.

    I will take heed of your NewRelic suggestion.

    Keep up the good work and talks,

    Geoff

  • great to know I am not the only one that struggles with this scaling issues.

  • Daniel Olson

    Regarding reserved instances, as new instances come out and pricing drops, the value of your initial purchase still applies. You don’t lose money or time based on the initial purchase.

    From my experience purchasing a reserved instance for 3 years then upgrading to a new instance the reserve isn’t tied to the original one. It works like a credit where the credit value doesn’t change but the credits “worth” and what you get for it does.

    I hope that makes any sense, ha.

    When you mention running preprocessors on the site, are you referring to how each “pen” has the ability to use LESS, etc.. with live on the site?

    On the topic of New Relic, would you prefer that over the AWS monitors?

    Great podcast!