FrontendDeveloper.in

ECMAScript question detail

Well formed unicode strings

Unicode strings are mainly used for representing a wide range of characters from different languages and symbols. In UTF-16, strings which contain lone surrogates(16-bit Code Unit) are considered as "malformed" or "not well formatted". These lone surrogates can be of two types,

  1. Leading surrogates: Range between 0XD800 to 0XDBFF
  2. Trailing Surrogate: Range between 0XDC00 to 0XDFFF

Well-Formed Unicode Strings feature introduced below two string methods to check and convert into wellformed strings.

  1. String.prototype.isWellFormed: This method is used to check if the string contains lone surrogates or not. Returns true, if unicode string is not present. The following stings can be verified either as well-formed or not well-formed strings,
const str1 = "Hello World \uD815";
const str2 = "Welcome to ES2024";
const str3 = "Welcome to ES2024 😀";

console.log(str1.isWellFormed()); // false
console.log(str2.isWellFormed()); // true
console.log(str2.isWellFormed()); // true

Note: Emojis are considered as well-formed unicode strings.

  1. String.prototype.toWellFormed: This method is used to return a string by converting unpaired surrogate(i.e, leading and trailing surrogates) code points with U+FFFD Replacement characters.
const str1 = "Hello World \uD815";
const str2 = "Welcome to ES2024";

console.log(str1.toWellFormed()); // Hello World �
console.log(str2.toWellFormed()); // Welcome to ES2024

These two methods are mainly helpful for developers to work with string encoding without any errors. For example, the below encoding process throws an error due to lone surrogates,

const url = "https://somedomain.com/query=\uD423";

try {
console.log(encodeURI(url));
} catch (e) {
console.log('Error:', e.message); // Expected: URIError: URI malformed
}

After applying toWellFormed() method, the lone surrogate is replaced with the Unicode replacement character (U+FFFD). It make sure encodeURI() is processed without errors.

console.log(encodeURI(url.toWellFormed())); // https://somedomain.com/query=%ED%90%A3
Back to all ECMAScript questions
Get LinkedIn Premium at Rs 399