Update: The big problem here was saving emoji throughout CodePen, but that's fixed now.


A super common character set on the web is UTF-8. Here's some funky characters I can literally just drop in the code and there is a good chance you'll be looking at the same icon characters I am:

✌ ✿ ✌ ☮ ★ 

That depends on quite a bit though. It depends on how those get characters get saved on our side. In this case, a WordPress blog and a MySQL database. It depends on the character set we have on this blog's theme. In HTML5:

<meta charset="UTF-8">

It also depends on your browser, platform, and version. All those things need to line up for you to see those characters OK. On the CodePen side, we'll take your unicode characters and save them, for the most part. There is one major issue though. CodePen uses MySQL utf8, which actually doesn't save all possible unicode characters.

Mathias Bynens has a good blog post on this. Regarding UTF-8:

UTF-8 is a variable-width encoding; it encodes each symbol using one to four 8-bit bytes. Symbols with lower numerical code point values are encoded using fewer bytes. So every possible unicode character is either 1, 2, 3, or 4 bytes.

Then regarding MySQL:

Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported.

So there is a subset of unicode characters that simply just won't save in CodePen because we're using this utf8 setting. There is a way to fix it that Mathias outlines. We plan to do that, but it's a big undertaking for us and it'll take some time. Just be aware that until that happens, you won't be able to save 4-byte unicode characters. The danger is fairly high as well: when you save, it will only save the code up until it sees that character and then cut off the rest. Pretty nasty.

So don't use the famous pile-of-poo character until we update this post to say otherwise! This is a fairly common limitation because of the how widespread MySQL is. For instance, this very blog post won't take 4-byte unicode either.