Unicode Transformation (Best-Fit Mapping)

Description

This page is vulnerable to various Unicode transformation issues such as Best-Fit Mappings, Overlong byte sequences, Ill-formed sequences.

Best-Fit Mappings occurs when a character X gets transformed to an entirely different character Y. In general, best-fit mappings occur when characters are transcoded between Unicode and another encoding.

Overlong byte sequences (non-shortest form) - UTF-8 allows for different representations of characters that also have a shorter form. For security reasons, a UTF-8 decoder must not accept UTF-8 sequences that are longer than necessary to encode a character. For example, the character U+000A (line feed) must be accepted from a UTF-8 stream only in the form 0x0A, but not in any of the following five possible overlong forms:

0xC0 0x8A
0xE0 0x80 0x8A
0xF0 0x80 0x80 0x8A
0xF8 0x80 0x80 0x80 0x8A
0xFC 0x80 0x80 0x80 0x80 0x8A

Ill-Formed Subsequences As REQUIRED by UNICODE 3.0, and noted in the Unicode Technical Report #36, if a leading byte is followed by an invalid successor byte, then it should NOT consume it.