ECMAScript question detail
Well formed unicode strings
Unicode strings are mainly used for representing a wide range of characters from different languages and symbols. In UTF-16, strings which contain lone surrogates(16-bit Code Unit) are considered as "malformed" or "not well formatted". These lone surrogates can be of two types,
- Leading surrogates: Range between
0XD800to0XDBFF - Trailing Surrogate: Range between
0XDC00to0XDFFF
Well-Formed Unicode Strings feature introduced below two string methods to check and convert into wellformed strings.
- String.prototype.isWellFormed:
This method is used to check if the string contains lone surrogates or not. Returns
true, if unicode string is not present. The following stings can be verified either as well-formed or not well-formed strings,
const str1 = "Hello World \uD815";
const str2 = "Welcome to ES2024";
const str3 = "Welcome to ES2024 😀";
console.log(str1.isWellFormed()); // false
console.log(str2.isWellFormed()); // true
console.log(str2.isWellFormed()); // true
Note: Emojis are considered as well-formed unicode strings.
- String.prototype.toWellFormed:
This method is used to return a string by converting unpaired surrogate(i.e, leading and trailing surrogates) code points with
U+FFFDReplacement characters.
const str1 = "Hello World \uD815";
const str2 = "Welcome to ES2024";
console.log(str1.toWellFormed()); // Hello World �
console.log(str2.toWellFormed()); // Welcome to ES2024
These two methods are mainly helpful for developers to work with string encoding without any errors. For example, the below encoding process throws an error due to lone surrogates,
const url = "https://somedomain.com/query=\uD423";
try {
console.log(encodeURI(url));
} catch (e) {
console.log('Error:', e.message); // Expected: URIError: URI malformed
}
After applying toWellFormed() method, the lone surrogate is replaced with the Unicode replacement character (U+FFFD). It make sure encodeURI() is processed without errors.
console.log(encodeURI(url.toWellFormed())); // https://somedomain.com/query=%ED%90%A3