Unicode's role in XSS vulnerabilities.

jacek siwek

04 March, 2024

Web application security is a crucial concern in today’s digital landscape. Cross-Site Scripting (XSS) attacks pose a significant threat to web applications, allowing attackers to inject malicious scripts into trusted websites. Request validation mechanisms are implemented to mitigate such attacks by blocking certain characters or patterns commonly associated with malicious code. However, recent discoveries suggest that there is a possibility of bypassing these validation mechanisms using Unicode characters, which could lead to successful XSS attacks.

One method of bypassing request validation involves leveraging Unicode characters as substitutes for blocked characters. The manipulation lies in exploiting specific behaviors of database systems, such as Microsoft SQL Server, which convert certain Unicode characters into their ASCII equivalents during data storage. This behavior can inadvertently allow an ASP.Net application to become vulnerable to XSS attacks, even when employing an HTML vector.

In the context of web security, an HTML vector refers to any method or technique that leverages HTML content to carry out malicious activities.

Before we delve into our example, it’s crucial to understand why databases, such as Microsoft SQL Server, sometimes convert Unicode characters into their ASCII equivalents. This behavior often depends on the database’s encoding settings, which may not be explicitly configured to handle Unicode characters uniformly. By default, some databases attempt to optimize storage or maintain compatibility by converting certain Unicode characters into ASCII. This conversion process, while seemingly harmless, can open up vulnerabilities in web applications, especially when the transformed characters bypass security measures designed to filter out potential XSS attack vectors.

Now to illustrate the problem, let’s consider the following example:

＜script＞alert(document.domain)＜/script＞

If this code snippet is saved and returned by the database, the application may interpret it as a harmless HTML tag. However, due to the conversion of the Unicode characters into their ASCII equivalents by the database, the resulting code would be:

&lt;script&gt;alert(1)&lt;/script&gt;

In this example, the Unicode characters used and their corresponding Unicode and ASCII codes are as follows:

Char Unicode

＜ U+FF1C
＞ U+FF1E

To better illustrate the difference, we have the following three signs:

&lt; ﹤ ＜

where:

< is the less-than sign (U+003C)
﹤ is the small less-than sign (U+FE64)
＜ is the fullwidth less-than sign (U+FF1C)

The XSS vulnerability arises from the failure of the request validation mechanism to properly handle the converted Unicode characters. Although the initial input may appear benign and conform to the allowed character set, the subsequent conversion can introduce unexpected behavior, resulting in the execution of malicious code.

In the provided example, the use of Unicode characters allows an attacker to evade the restrictions imposed by the validation mechanism. By replacing the blocked characters (< and >) with visually similar Unicode characters (＜ and ＞), the attacker tricks the application into treating the injected code as harmless HTML, thereby bypassing the validation checks.

To mitigate the risk of such bypasses and protect against XSS attacks, it is crucial to implement a multi-layered approach to web application security. Consider the following best practices:

Implement Context-Aware Output Encoding: Use output encoding libraries or frameworks that enforce context-aware escaping to neutralize potential XSS vectors. This approach ensures that any user-supplied input is appropriately sanitized before being rendered.
Employ a Web Application Firewall (WAF): Implement a WAF that includes XSS protection capabilities. A WAF can help detect and block malicious requests, including those attempting to bypass validation mechanisms using Unicode characters.
Validate and Sanitize User Input: Conduct thorough input validation and sanitization to ensure that user-supplied data does not contain potentially malicious scripts or characters.

Other Insights

Insider threat - The average insider threat attack scenario. How attackers can take over an entire domain in a few steps. Part 2.

Server shutdown via GraphQL during real-life pentest

Insider threat - why security measures don't matter. Part 1

Happy to get a call or email
and help!