We’ve been naming all our services cp____ lately. Get it? cp is short for CodePen. Clever, I know. We have many such services. The latest to join them is cpshots. We’ve been doing our own screenshotting for a long time now, but like all software we’re always working on it for various everchanging reasons.

Robert and I discuss this last round of changes to this service. Part of the purpose of this round is that we wanted to bring image resizing in-house to keep costs down. We need screenshots in various sizes because we literally show them in different circumstances at different sizes, but also because of the responsive images syntax. That resizing we do with sharp via a Lamda. But we only want to do it once! That is, whenever a Pen has changed and the screenshot needs to be regenerated. In order to do that, we send the requests to a Cloudflare Worker which, through the KV store, knows if we already have the screenshot or not. If yes, send it along. If no, head over 1) the screenshotting service to get the screenshot 2) the Lambda for resizing it 3) over to S3 to store the images 4) back to the Cloudflare Worker to serve it.

It’s a lot of moving parts! But they are all very smart, simple, tuned parts that are designed to do what they do well. The best parts, for us, is that this project was a great excuse to give Robert a trial-by-fire of a fairly complex tech stack, and that we made literally every part of it work on both development and staging. Making sure every service of CodePen runs locally means that there is no secret magic, we can all work on it and see it working.

Time Jumps

  • 01:03 Rolling out large infastructure to an audience
  • 04:18 Creating different size images ourselves
  • 05:30 What were the moving parts?
  • 06:54 Using AWS Lambda
  • 09:31 4 sizes of screenshots at CodePen
  • 12:29 Sponsor: Jetpack & WooCommerce Sales
  • 13:24 How did we architect this?
  • 16:06 Our local dev environment
  • 19:45 Making other improvements besides the primary one
  • 27:16 Lessons learned

Sponsor: Jetpack and WooCommerce Black Friday Sales

Huge sales in WordPress land! As we write, there is less than 4 days left in the Jetpack Black Friday Sale, which is 60%+ off all Jetpack products and plans for your first year. Over in WooCommerce land, nearly everything in the entire WooCommerce marketplace is 40% off and there is less than 6 days left. It’s worth it to spend a little time thinking ahead about what you might need for the year ahead as these deals are pretty massive.

Transcript

[Radio channel adjustment]

Announcer: Today, on CodePen Radio.

Chris Coyier: Hey, everybody. Time for another CodePen Radio. Let's talk technical infrastructure a little bit because we just finished a project like that, which we're affectionately calling cpshots, C-P standing for CodePen and then we named stuff. That's how we named everything lately. It's CP this and CP that.

This is cpshots, but shots meaning screenshots, which is a thing that we do on CodePen for many reasons, which we'll talk about in this show. The person we sicked upon this project was Robert, who has been on the show. How long have you been here, Robert? Like six weeks or something?

Robert Kieffer: Yeah. I started on the 13th of September, so whatever that math is.

Chris: Yeah. five-something weeks. You're brand new, and you've already finished a major infrastructural project, which is-- [laughter]

Robert: Yeah.

Chris: Which is something.

Robert: Yeah. Exactly.

Chris: I just was looking over your shoulder this morning. You were looking at charts and graphs of how our servers are behaving on this particular project, because it's definitely the kind of project where you look at charts and graphs of servers.

Robert: Yeah, it was fun. It's the first time I rolled out a significant chunk of infrastructure to a large audience since I worked at Facebook, actually. My last job, we never really did -- we never had a particularly large audience, so this was a refreshing change from that.

Chris: Yeah. There's a little bit of scale at CodePen. I'd say we're a middling app in that way. But when it comes to content at CodePen, there's a freaken' lot of it. [Laughter]

Robert: Yeah. Yeah.

Chris: People run across it.

Robert: For sure.

Chris: People, like we have good-ish -- I mean I don't know how to relate it, but there's some element of SEO to it. People look for stuff. They land on it in CodePen. Then they explore CodePen. It's like you're looking at grids of content, sometimes four and six at a time, and paginating through it.

CodePen is such a visual place that it's our charge in the world to make sure that exploring that content is fun, visually rewarding, and useful to you. So, we deal in iFrames a lot on CodePen. If your Pen is animated, meaning it has set interval or request animation frame or keyframe animations in CSS or whatever, we want to show you that animation even when we're showing you the little baby grid version in CodePen.

We do that, but iFrames are fricken' expensive. If we don't have to show you one, we prefer not to - for your own performance and computer bandwidth reasons. If we determine that there's nothing moving on a Pen, we'll just show you the image. That's probably the number one place that images are used on Pens.

They are requested to the tune of many, many, many, many, many millions a month. One of our desires was, "Okay." Sorry for all the backup here, but I need to set the stage, right? We want to serve--

Let's say you're browsing on your mobile device. We want to serve you a smaller -- as small of image as we possibly can because, for Web performance, we can't just send you a 3,000-pixel image. It's irresponsible for bandwidth and performance reasons, so our goal in life is, "Why don't we use what HTML provides?" which is a responsive images syntax for screenshots. We'll take a canonical screenshot, and then we'll make different sizes of it or have some kind of service in place to serve different sizes of it such that we're doing the responsible thing for screenshots.

For a long time, we used Cloudflare's built-in feature for this. It's part of Cloudflare's Workers product. If you want to request an image, we send all our requests through a Cloudflare worker (on purpose) because we wanted to take advantage of this feature. You can have it send resized versions.

The backstory there is, while that's affordable and a nice API and pretty cool, the scale that we're at actually made it still kind of expensive. We looked at that cost and were like, "How can we bring that down?" Well, one way is to just do it yourself instead of rely on an external service to create the different size images for you. We have plenty of experience doing this ourselves. We'll take that screenshot ourselves, and we'll create smaller versions of them ourselves. We're taking on technical debt and saving ourselves a few bucks. That plus a number of other reasons was like, "Okay, let's take this one."

The other big reason is, we have Robert now. It turned out to be kind of a clever project to onboard a new employee as far as our local development environment is concerned, how AWS works, how Cloudflare works. This touched so many different areas that we're like, "Let's do it. This is a perfect little project." We attempted to scope it out and let you go.

00:05:15

Robert: Yeah, it was definitely a good project to start with because it really had me diving in the deep end of the cloud infrastructure that you guys are using. It was a great first step.

Chris: Yeah. Moving parts, what were they?

Robert: Well, like you said, the existing system was really sitting on top of Cloudflare for serving the images that went into users' browsers. But the source image, the service that generates the actual screenshot was using a system called Browserless, which is a third-party product. It sits on top of Puppeteer.

Basically, it's a service where you hand it a URL and, in the background, it loads up Puppeteer, fetches that webpage, and then takes a screenshot of it and returns that back to you. It's great. You get to treat it like a black box. But once you have that image, you still have to figure out how to resize it.

Chris: They do not help with that. [Laughter]

Robert: Right. Right. Exactly. You can tell them what size you want that screenshot to be, but from there, if you want to have optimized sizes-- Well, we take our screenshots at 1280 width.

Chris: Yep.

Robert: If you want anything down from that, you've got to decide for yourself, "What do I want, and how do I create that?" Using the Cloudflare service is certainly a great way of doing that until you get to a scale where you're spending thousands of dollars a month serving these thumbnails, which is kind of where CodePen was at.

Chris: We're like, "We'll resize them. We'll take the screenshots ourselves," which you've got to do anyway.

Robert: Right.

Chris: Then we'll resize them ourselves. In that case, I don't think it's quite possible to make a Cloudflare worker do that resizing. There are some limitations to what a worker can do.

00:07:03

Robert: This is the first time I've used Cloudflare workers. They're great. For handling a request and that first pass of, like, "What do we do with this?" and having logic at the edge of your network, they're phenomenal. It's a really neat product. But for doing heavy lifting code where you need to, for example, take a bunch of thumbnails and resize them to multiple formats and multiple resolutions, it's not great for that. You're constrained in a number of ways in the JavaScript context that you run there.

Chris: The Lambda.

Robert: [Laughter] Yeah. We ended up going a slightly different route, which was to lean on the AWS Lambdas. What we ended up doing was basically coming up with a URL scheme for screenshots, routing those through Cloudflare. We have a little bit of logic that decides whether or not we need to actually create a new screenshot or do we already have one that we've already created.

In the case we don't, we route that to a Lambda function. That Lambda function issues the request to the browserless service and then uses a library called Sharp to resize that screenshot and write it to both -- I think we're doing JPEG and WebP formats.

Chris: Yeah, and we do it like five times or something. There's 1280 and 800 - or something. I forget what all the numbers are, but it's just so that we took a tour around the app and figured out places where we need them. Sometimes they're actually pretty small. I think one of them might be 100 or 200-some pixels wide because we serve it pretty darn small.

Robert: Yeah. That's actually kind of an interesting process for me was just figuring out what resolutions do we actually need. It's a mix of requirements. You want sizes that are not so fine. You don't want 20 different sizes. Otherwise, every client is going to be requesting every size image depending on how wide the browser is, what the responsive layout is, and you don't want to have to generate all those because it's computationally expensive.

You, Stephen, and I, we had a long conversation about what makes sense and how do we decide. I think what we landed on was -- I don't know if listeners are going to be interested in this -- we have four sizes: one at 1280, one at 800, one at 512, and one at, I believe, 360.

Chris: Oh, okay.

Robert: Yeah. We decided to do away with that -- I think we had a 120 or 160 size, but by the time you actually need that, it turns out you're dealing with such small images that you're unlikely to actually have that in the UI.

Chris: Right. Which is fine. This was all thought about before. It's not like we're thinking about it for the first time on this podcast.

Then the different sizes get made. You even had a cool insight during that process where it's actually more efficient to make the smaller sizes from the one that you just made down rather than make them all from the canonical. It was just kind of a clever little aspect to this.

00:10:28

Robert: Yeah. It does take a significant amount of time to make these images. If the user is seeing that, seeing the screenshot for the first time, they pay that price for everybody else that's going to see it afterward.

We were able to shave about, I think, 30% of the time off of the thumbnail rendering by just starting with the largest thumbnail and then working our way down and having each consecutive one be the input for the next steps. We actually are dealing with less pixels each time. It was a nice little optimization.

Chris: It matters, I think, because what's funny about this is it's asynchronous - kind of. If the client, literally a browser, is like, "Oh, I don't have this image yet," the URL for the straight-up IMG tag, unbeknownst to the browser, there's machinery firing in the background, like, "Go! Go! Go! Get the screenshot! Put it in the S3 bucket! Return the thing!"

It does all that, and then it's like, [heavy breathing] "Here you go. I got it." You know? It does it on the fly. The browser is just waiting for it to happen.

What it looks like to a user is just a hung request. Fortunately, with images, who cares? Hung requests don't manifest negatively in the UI all that much. There's just a blank space until it pops in, but we do want it to be as fast as possible.

If there was some magic that could get this down to some magical 100 milliseconds or something, we'd do that. But it can't. There are some hard-stop slownesses. One of them is just literally a magic number we just coded into the screenshot machine that's like, "Why don't you wait something like 2.5 seconds or something, 3 seconds, because we want your Pen to have a chance to download its own assets and just situate itself? [Laughter] Then we take the screenshot."

Well, that means the request for these images have a hard-baked three seconds that they're just going to additionally wait.

Robert: Yep.

Chris: We're not just racing the networking clock here.

00:12:33

[Guitar music starts]

Chris: This episode of CodePen Radio is brought to you in part by Automattic and the stuff that they build, like Jetpack and WooCommerce.

Short and sweet here, it's that time of year. Huge sales. Black Friday, Cyber Monday stuff. Both these sales are live by the time you're listening to this.

60% off Jetpack anything for your first year. 60%, that's wild. All products, all plans, everything. You better believe I'm going to be looking around at sites I might need to purchase Jetpack for in the next year because 60% off is wild.

WooCommerce, the whole WooCommerce marketplace is 40% off. A couple of exceptions here and there, I guess. But for the most part, 40% off anything WooCommerce marketplace.

Tremendous deals. Check them out. Now is the time to be making those purchases if you're into saving money that is.

Thanks for the sponsorship.

[Guitar music ends]

00:13:36

Chris: All right, so one of the other things was like, okay, one way you can architect a solution like this is to just do it live. We're just fire up a Lambda, get it deployed, and such that it just works on production. That knowledge might be just locked in your head. You're the screenshots guy, so if there are any problems, we're just going to ask you from now on.

That has not been how we worked at CodePen for a while now. We've been like, let's make sure that all aspects of this run in our local dev environment exactly as they work in production and in staging and then in production and that we have a mechanism for deploying through those steps. It feels much more rock-solid in a physical way that now when I load up our dev environment and work on CodePen, this screenshot service is running on my local machine just like it is on production.

Robert: Yeah. That was definitely one of the more interesting aspects of this for me was fleshing out the development environment.

The screenshot service relies on Cloudflare workers on Lambda on the browserless service that we talked about and on S3. Those are four pretty big cloud components. But when you're developing, you really want to be working with local versions of all of that stuff because you want to be able to diagnose, debug, and do that rapid iterative development that actually makes it tolerable to write code.

[Laughter]

Robert: We had to figure out how to create local versions of all those services, and it was surprisingly nontrivial. Cloudflare, there's a product called Mini-Flare, which gives you the Cloudflare worker environment that you can run locally, and that's a nice little docker container.

Chris: It's interesting. I think that one started out as kind of a community project. I don't know for sure who to credit for that. Then Cloudflare says, "Oh--"

Robert: Yeah, I don't know that I looked into the history of that.

00:15:43

Chris: Yeah. It was new to me because I would have pointed you towards Cloudflare's wrangler or whatever and be like, "Oh, yeah. They have some kind of wrangler worker thing."

Then you and Alex were like, "No, check this out: Mini-Flare."

I'm like, "Whoa! This is cool. This is way fancier than I thought it was going to be."

Robert: It works great. Then for S3, we used a product called Mini-IO. It also worked great for a local file system.

Chris: There are all these componentry and we're jacking them into our dev environment, right?

Robert: Yeah.

Chris: You're making tabs and tmux, which is the little faux UI thing in the command line that runs all our -- [laughter] more servers than you can shake a stick at.

Robert: Yeah. The tmux configuration for CodePen is pretty [laughter] amazing. Just the number of terminal windows that come up when you basically say, "All right. Run the development environment," is entertaining.

Chris: It's been years of refining that because the goal is it's DX. I want myself and any employee of CodePen to run one command--

Robert: Yep.

Chris: --and magic happens. It installs software. It checks what you have. It makes sure everything is cool. Then [tongue roll] it's all ready to go.

Robert: Yeah, which is great. I mean for somebody coming on new, like myself, that provides the breadcrumbs you need to figure out, okay, how are these services being run and how do I get them? It's all codified starting with the tmux configuration, but then all the scripts and stuff that run underneath that.

The answer to, "How does it work?" really is in the source. You've just got to be willing to grovel through it.

Chris: Yeah. Yep. I think that this was an upgrade there in that, as much as we try to make that the case for absolutely everything at CodePen, there's, I think, rough edges here and there where it's like, "Well, that, actually, that's not-- [Laughter] That works a different way or whatever."

Not the case anymore with screenshots. Now that entire service has now been really folded into the mothership, and that's awesome.

00:17:49

Robert: Yeah. There were definitely a couple of little hiccups there. One of the problems that I had is I've got a new Mac, so I'm running on an M1 processor. These development tools are just kind of getting to the point where, like, "Okay. They'll actually run well," but we ran into an issue with Puppeteer. We didn't have a docker image that would run browserless on my Mac. That was one of the things that I was actually running outside of docker.

Then we had this interesting issue with the Lambda, so we had a Lambda image, a docker image that we were running that worked great up until I got to where I could actually point my development version of CodePen at these thumbnail URLs. Then we discovered it was completely synchronous. It would do one thumbnail at a time. In my development environment, each thumbnail was taking eight or ten seconds, so I'd bring up the trending page and have to wait a minute for all of those screenshots to come in.

Chris: [Laughter]

Robert: One of the things I ended up doing was just creating a little JavaScript server that sat on COA. It was like 60 lines of code to create a very rudimentary Lambda environment that would run our Lambda function in an asynchronous manner so that we could actually have all these screenshots rendering concurrently.

Chris: Yeah, because that's how they work in real life.

Robert: Yeah.

Chris: You can call a Lambda as many times as you want.

Robert: Yeah, so getting a local dev environment going was definitely challenging, but I think it's paid off. Like you said, the entire CodePen experience does run on our laptops, so that's pretty cool.

Chris: Yeah. Yeah, it's cool. Then if somebody else needs to poke around at this code, there's no knowledge of the system that is hidden from them.

Robert: Yep.

Chris: That's the clutch part. That's pretty cool. Then as we tend to do with projects, it's kind of one of those, "While you're in there--" you know?

[Laughter]

Robert: Yeah. Yeah.

Chris: Make it a little better, so that's what all the different sizes was about, and making sure that when we roll this out that it wasn't for nothing. We improved the experience, in a way. I think we have. I think the screenshots--

It's not like you're going to notice some crazy improvement in the look of the screenshots because we're taking them in literally the exact same technology.

Robert: Yeah. I mean if we've done everything right, users will never notice. [Laughter] It's a bit of a wash from the user experience.

Chris: Right. Maybe a little quicker only because the source set syntax is a little better than it was before.

00:20:30

Robert: Yeah. The one thing that I was a little -- on the CSS markup side, the front-end markup side of things, the one thing that was a little disappointing was that the sizes attribute of the image or the sources tag.

Chris: Yeah.

Robert: Which is supposed to really be there to allow the browser to decide what format and resolution image to download. That ended up being a bit of a no-op for us because the context in which these screenshots are rendered for CodePen is so dynamic that there's really no way to codify, like, "On this page, we expect this image to be this size." It really just depends on where the client is as far as the breakpoints and all that.

Chris: Yeah.

Robert: So, it was like, hmm.

Chris: Truly, a no-op indeed in that we just gave up.

Robert: Yeah.

Chris: [Laughter]

Robert: I had it coded up at one point. I was like, "I just can't." There's no way I can accurately codify what size image we're going to have to download based strictly on a markup, which is really what that particular attribute is about.

00:21:38

Chris: Then you had a good insight, though, that because this is a client-rendered app -- [laughter] I've used this trick a couple of times recently -- you kind of get to skip the line a little bit. You know how big we're going to use that image right away because we have access to the DOM right away. It's almost for non-client side rendered apps where that source set and sizes matter more.

Robert: Yeah. Yeah.

Chris: You literally wrote code that's like, "Wait. I know exactly how big this image is going to be. I'm going to put the right source right in the markup immediately."

Robert: Yeah. We have a little React component that is used. It's one React component that basically has a little hook of, like, "Okay. How big are we now? All right. Let's go fetch the image." It works great.

Chris: It does.

Robert: Like I said, the users should never notice.

Chris: Yeah. We got to skip. We totally skipped that. I keep saying source set and sizes. We use neither. Those are two technologies we don't use. We don't use sizes because we wrote our own little sizes math that happens because of the client-side rendered image, and we're not using source set either. We're using the picture syntax because the pictures index is the one that supports -- you know, "Does your browser support WebP? Yes? Then here's the WebP version. Else, use the JPEG."

We are using the picture syntax with our own sizes math. That's how that turned out. Cool. Well, that was a heck of a project. We'll link up, I guess, all the technologies and such that we used to kind of make it happen.

Now that it's in place, are you feeling--? I know you're looking at charts and graphs and stuff. There are some anomalies. I don't know if they're radio-appropriate because you're like, we don't understand them yet.

[Laughter]

Robert: Yeah.

Chris: For example, there are two servers that take screenshots, and we load balance them. Yet another just CodePen-ism. That's how we roll. There are at least two of everything because if one of them falls over, they're load balanced. It's just safer that way.

But when you look at the charts, you're like, "Wow. Screenshot server one and screenshot server two are different animals."

00:23:48

Robert: Yeah. Well, and just getting to production with this was an interesting little journey because I'm fairly new to CodePen. I don't have a good feel for the amount of traffic that screenshots are going to generate, and so there were a lot of questions around, like, well, if we turn this on for users, how do we estimate the amount of traffic that it's going to generate? Because we basically completely reset the screenshot world, as soon as we turn this on, it was going to throw a bunch of traffic at the system. Is two servers going to be enough to handle that, or four servers? I just did not have a good sense of that.

It was complicated by the fact that some screenshots were going to be requested millions of times, like the stuff on the trending page. Whereas users' individual Pens, those might not get requested for days or weeks or until the user actually looks at them. There was this big question mark of how is that load going to balance out. Do we just need to figure out how to process the first 100 Pens first and then everything else would be this slow trickle, or is there going to be tons of traffic getting thrown at the system?

The answer was a little of both. When we rolled it out -- [laughter] I think when we first turned this on to all users, there were some signs of things falling over a little bit. Then we rolled out a fix that patched things up after the fact. We got it out without too much -- [laughter] without too many hiccups.

Chris: Right. Right.

Robert: It was a fun little journey.

Chris: Mm-hmm. Gosh. You add enough retry logic and enough servers behind it that even if they misbehave, it's fine.

Robert: [Laughter] Yeah.

Chris: [Laughter]

Robert: Well, one of the more interesting conversations I had with Alex was if we decide to change the resolutions of the images we want to generate or the URL path, and we end up having to regenerate all the screenshots. How big a design decision is that? Do we need to get this right (right now, upfront) so we never ever have to do that? Or can we just be like, "Eh, we'll just wipe all the screenshots, and we'll regenerate them if we need to"?

It turned out it was more of the latter. If we need to change things to where we have to invalidate all the screenshots we have, it'll cost a couple of hundred bucks in cloud compute time but it's not like, "Oh, my God! What have we done?!" Which is a really cool property of the system because you want to have that flexibility. You don't want to be so locked into your design decisions that you can just never change things.

Chris: Yes.

Robert: And then discover you screwed it up. [Laughter]

Chris: Yep. That was great to think about those kind of requirements. We assume we're not going to change it, but--

I think we picked kind of a middle ground, right? We didn't code the things for ultimate flexibility.

Robert: Yeah. Yep.

Chris: There are some costs if we decide to change things, but those costs are payable. In this case, I think what we--

Isn't the vibe that if we decide we need totally different sizes of something that we'd make that code change and then scale up the servers for a little while so they do it? Then they're done. Then we scale them back down. Cool.

Yeah. It's not flexible forever, but it's flexible enough. Cool.

Any other lessons learned here, as we wrap it up?

00:27:21

Robert: No. As a new employee, it was a great project to start with. [Laughter] I think the original estimate was that it would take two weeks. Here we are six weeks later and really just kind of have it out to where I'm not having to look at the production graphs every day to make sure nothing is falling over.

I hope, from your perspective, a lot of that was spent in making the development environment a little more robust and figuring out how to get Lambda, Cloudflare, and S3 all running locally.

Chris: With so many other things going on, it's not absolutely the only thing that you do. Yeah, now we just have the rest of the world to redo.

It does remind me of our admin project, though, that you weren't here quite for it yet, but sometimes the development projects we take on are, on purpose, educational. We're like, "Let's do this product. It's going to have something ultimately useful for us, but it's kind of secondary." The purpose is academic.

Robert: Mm-hmm.

Chris: Let's prove out this technology. Then if we actually build something and we hate it, we didn't bet the farm on it.

In this case, the worst thing that could happen is we could have just been like, "Fine! We'll just pay more money then." [Laughter]

Robert: [Laughter]

Chris: "A failed case for this one," but we didn't have to get there. All right, well, thanks, Robert. Good job on the project.

Robert: Yeah. Thanks.

Chris: We'll talk to you soon.

Robert: Yep. Thank you.

[Radio channel adjustment]