Unicode transformation issues

  • This page is vulnerable to various Unicode transformation issues such as Best-Fit Mappings, Overlong byte sequences, Ill-formed sequences. <br/><br/> <strong>Best-Fit Mappings</strong> occurs when a character X gets transformed to an entirely different character Y. In general, best-fit mappings occur when characters are transcoded between Unicode and another encoding. <br/><br/> <strong>Overlong byte sequences</strong> (non-shortest form) - UTF-8 allows for different representations of characters that also have a shorter form. For security reasons, a UTF-8 decoder must not accept UTF-8 sequences that are longer than necessary to encode a character. For example, the character U+000A (line feed) must be accepted from a UTF-8 stream only in the form 0x0A, but not in any of the following five possible overlong forms: <ul> <li> 0xC0 0x8A </li> <li> 0xE0 0x80 0x8A </li> <li> 0xF0 0x80 0x80 0x8A </li> <li> 0xF8 0x80 0x80 0x80 0x8A </li> <li> 0xFC 0x80 0x80 0x80 0x80 0x8A </li> </ul><br/> <strong>Ill-Formed Subsequences</strong> As REQUIRED by UNICODE 3.0, and noted in the Unicode Technical Report #36, if a leading byte is followed by an invalid successor byte, then it should NOT consume it. <br/>
  • Identify the source of these Unicode transformation issues and fix them. Consult the web references below for more information.