Unicode transformation issues

  • This page is vulnerable to various Unicode transformation issues such as Best-Fit Mappings, Overlong byte sequences, Ill-formed sequences.

    Best-Fit Mappings occurs when a character X gets transformed to an entirely different character Y. In general, best-fit mappings occur when characters are transcoded between Unicode and another encoding.

    Overlong byte sequences (non-shortest form) - UTF-8 allows for different representations of characters that also have a shorter form. For security reasons, a UTF-8 decoder must not accept UTF-8 sequences that are longer than necessary to encode a character. For example, the character U+000A (line feed) must be accepted from a UTF-8 stream only in the form 0x0A, but not in any of the following five possible overlong forms:
    • 0xC0 0x8A
    • 0xE0 0x80 0x8A
    • 0xF0 0x80 0x80 0x8A
    • 0xF8 0x80 0x80 0x80 0x8A
    • 0xFC 0x80 0x80 0x80 0x80 0x8A

    Ill-Formed Subsequences As REQUIRED by UNICODE 3.0, and noted in the Unicode Technical Report #36, if a leading byte is followed by an invalid successor byte, then it should NOT consume it.
  • Identify the source of these Unicode transformation issues and fix them. Consult the web references below for more information.