FrontendDeveloper.in

ECMAScript question detail

Unicode

Prior to ES6, JavaScript strings are represented by 16-bit character encoding (UTF-16). Each character is represented by 16-bit sequence known as code unit. Since the character set is been expanded by Unicode, you will get unexpected results from UTF-16 encoded strings containing surrogate pairs(i.e, Since it is not sufficient to represent certain characters in just 16-bits, you need two 16-bit code units).

let str = '𠮷';

console.log(str.length);            // 2
console.log(text.charAt(0));        // ""
console.log(text.charAt(1));        // ""
console.log(text.charCodeAt(0));    // 55362(1st code unit)
console.log(text.charCodeAt(1));    // 57271(2nd code unit)

console.log(/^.$/.test(str)); // false, because length is 2
console.log('\u20BB7'); // 7!(wrong value)
console.log(str === '\uD842\uDFB7'); // true

ECMAScript 6 added full support for UTF-16 within strings and regular expressions. It introduces new Unicode literal form in strings and new RegExp u mode to handle code points, as well as new APIs(codePointAt, fromCodePoint) to process strings.

let str = '𠮷';

// new string form
console.log('\u{20BB7}'); // "𠮷"

// new RegExp u mode
console.log(new RegExp('\u{20BB7}', 'u'));
console.log(/^.$/u.test(str)); // true

//API methods
console.log(str.codePointAt(0)); // 134071
console.log(str.codePointAt(1)); // 57271

console.log(String.fromCodePoint(134071)); // "𠮷"
Back to all ECMAScript questions
Get LinkedIn Premium at Rs 399