Skip to content
Back to blog

Convert Emoji Characters to Unicode in JavaScript

Leibniz Li

@leibnizli
+

Emojis are not just simple characters; they are part of the Unicode Supplementary Planes. Understanding how JavaScript handles them is crucial for modern web development, especially when dealing with string lengths, databases, and cross-platform display.

The Problem: Why .length Lies

In JavaScript, strings are UTF-16 encoded. Standard characters like "A" or "β˜ƒ" (Snowman) sit in the Basic Multilingual Plane (BMP) and take up 16 bits. However, modern emojis (like πŸ˜€) sit in the Supplementary Planes and are represented by two 16-bit code units, known as a Surrogate Pair.

'A'.length;    // 1 (BMP)
'β˜ƒ'.length;    // 1 (BMP)
'πŸ˜€'.length;    // 2 (Supplementary Plane - Surrogate Pair)
'πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦'.length; // 11 (Wait, what? See "ZWJ" below)

Solution 1: String.prototype.codePointAt()

Before ES6, we used charCodeAt(), which only returned the first half of a surrogate pair. Modern JavaScript provides codePointAt(), which correctly identifies the full 32-bit Unicode number.

const emoji = 'πŸ˜€';

// ❌ The old way (Incorrect for emojis)
console.log(emoji.charCodeAt(0)); // 55357 (Surrogate lead)

// βœ… The modern way
console.log(emoji.codePointAt(0)); // 128512
console.log(emoji.codePointAt(0).toString(16)); // "1f600"

Solution 2: String.fromCodePoint()

To convert a Unicode number back to a visible emoji, use the static method String.fromCodePoint().

// Decimal
String.fromCodePoint(128512); // "πŸ˜€"

// Hexadecimal
String.fromCodePoint(0x1F600); // "πŸ˜€"

// Multiple points
String.fromCodePoint(0x1F1FA, 0x1F1F8); // "πŸ‡ΊπŸ‡Έ"

Batch Conversion: Multiple Characters

If you have a string containing various emojis and characters, you can easily convert them all at once. For beginners, the Spread Operator [...] is your best friend because it is "Unicode-aware"β€”it won't break your emojis apart.

1. From String to Numbers (Encoding)

You can get an array of Unicode numbers (Decimal or Hex) using .map():

const text = "Hi πŸ˜€ πŸš€";

// Convert to Decimal numbers
const decimals = [...text].map(char => char.codePointAt(0));
console.log(decimals); 
// [72, 105, 32, 128512, 32, 128640]

// Convert to Hexadecimal strings (common for CSS/JS escape)
const hexCodes = [...text].map(char => `0x${char.codePointAt(0).toString(16)}`);
console.log(hexCodes);
// ["0x48", "0x69", "0x20", "0x1f600", "0x20", "0x1f680"]

2. From Numbers to String (Decoding)

To turn those numbers back into a readable string, use String.fromCodePoint with the Rest Operator ...:

const myNumbers = [128512, 128640, 9731];

// This "spreads" the array items as individual arguments
const result = String.fromCodePoint(...myNumbers);
console.log(result); // "πŸ˜€πŸš€β˜ƒ"

3. Using a Simple Loop

If you find .map() confusing, a for...of loop works perfectly too:

const input = "🍎🍊";
for (let char of input) {
  let hex = char.codePointAt(0).toString(16);
  console.log(`Character: ${char} -> Unicode: U+${hex.toUpperCase()}`);
}
// Character: 🍎 -> Unicode: U+1F34E
// Character: 🍊 -> Unicode: U+1F34A

Complex Emojis: Zero Width Joiners (ZWJ)

Some emojis look like a single character but are actually a sequence of multiple Unicode points joined by a special character called a Zero Width Joiner (ZWJ, \u200D).

For example, the "Family" emoji πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ is composed of: Man + ZWJ + Woman + ZWJ + Girl + ZWJ + Boy.

To correctly iterate over these, use the Spread Operator [...] or Array.from(), which are Unicode-aware:

const complexEmoji = 'πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦';

// ❌ Standard split (breaks the emoji)
console.log(complexEmoji.split('')); 
// ["\ud83d", "\udc68", "‍", "\ud83d", ...]

// βœ… Unicode-aware iteration
const points = [...complexEmoji];
console.log(points); 
// ["πŸ‘¨", "‍", "πŸ‘©", "‍", "πŸ‘§", "‍", "πŸ‘¦"]

// 1. Get code points as Hexadecimal (commonly used in docs)
const hexCodes = [...complexEmoji].map(c => `0x${c.codePointAt(0).toString(16)}`);
console.log(hexCodes);
// ["0x1f468", "0x200d", "0x1f469", "0x200d", "0x1f467", "0x200d", "0x1f466"]

// 2. Get code points as Decimal (Non-Hexadecimal)
const decimalCodes = [...complexEmoji].map(c => c.codePointAt(0));
console.log(decimalCodes);
// [128104, 8205, 128105, 8205, 128103, 8205, 128102]

// 3. Convert Decimal back to Emoji
console.log(String.fromCodePoint(...decimalCodes)); 
// "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦"

// 4. Convert Hex strings back to Emoji
const emojiFromHex = String.fromCodePoint(...hexCodes);
console.log(emojiFromHex); 
// "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦"

Practical Use Cases

  1. Database Storage: Ensure your database (like MySQL) uses utf8mb4 to store these 4-byte characters.
  2. Input Limiting: When limiting characters in a Bio or Tweet, count emojis correctly using [...str].length instead of str.length.
  3. Custom Text Renderers: Useful for Canvas-based games or high-performance UI components.

Need an Interactive Tool?

If you're working with these conversions frequently, check out our Online Unicode & Emoji Converter. It allows you to bidirectionally convert between text, Hex, CSS, and JS escape sequences in real-time.

Try the Online Unicode Converter β†’