App Icon

UTF Encoding

Back to Home

Overview

Unicode Transformation Format (UTF) is a set of character encodings that can represent every character in the Unicode character set. The most common variants are UTF-8, UTF-16, and UTF-32, each with different characteristics and use cases.

Technical Details

UTF-8

  • Variable-width encoding (1-4 bytes)
  • ASCII compatible
  • Most common on the web
  • Space efficient

UTF-16

  • Variable-width encoding (2-4 bytes)
  • Used in Windows and Java
  • Supports surrogate pairs
  • Endianness dependent

UTF-32

  • Fixed-width encoding (4 bytes)
  • Simple indexing
  • Memory intensive
  • Endianness dependent

Examples

Character Encoding Examples

Character: A
UTF-8:    41
UTF-16:   0041
UTF-32:   00000041

Character: €
UTF-8:    E2 82 AC
UTF-16:   20AC
UTF-32:   000020AC

Character: 😊
UTF-8:    F0 9F 98 8A
UTF-16:   D83D DE0A
UTF-32:   0001F60A

Implementation

JavaScript Example

// UTF-8 Encoding/Decoding
const text = "Hello 😊";
const utf8Bytes = new TextEncoder().encode(text);
console.log(utf8Bytes); // Uint8Array [72, 101, 108, 108, 111, 32, 240, 159, 152, 138]

// UTF-16 Encoding/Decoding
const utf16Bytes = new Uint16Array(text.length);
for (let i = 0; i < text.length; i++) {
    utf16Bytes[i] = text.charCodeAt(i);
}
console.log(utf16Bytes); // Uint16Array [72, 101, 108, 108, 111, 32, 55357, 56842]

// Converting between encodings
const decoder = new TextDecoder('utf-8');
const decodedText = decoder.decode(utf8Bytes);
console.log(decodedText); // "Hello 😊"

References