Prototype pollution – and bypassing client-side HTML sanitizers | Securitum. Leading european penetration testing company.

MICHAŁ BENTKOWSKI

18 August, 2020

In this article I’ll cover the prototype pollution vulnerability and show it can be used to bypass client-side HTML sanitizers. I’m also considering various ways to find exploitation of prototype pollution via semi-automatic methods. It could also be a big help in solving my XSS challenge.

Prototype pollution basics

Prototype pollution is a security vulnerability, quite specific to JavaScript. It stems from JavaScript inheritance model called prototype-based inheritance. Unlike in C++ or Java, in JavaScript you don’t need to define a class to create an object. You just need to use the curly bracket notation and define properties, for example:

const obj = {
  prop1: 111,
  prop2: 222,
}

This object has two properties: prop1 and prop2. But these are not the only properties we can access. For example a call to obj.toString() would return "[object Object]". toString (along with some other default members) comes from the prototype. Every object in JavaScript has a prototype (it can also be null). If we don’t specify it, by default the prototype for an object is Object.prototype.

In DevTools, we can easily check a list of properties of Object.prototype:

We can also find out what object is a prototype of a given object, by checking its __proto__ member or by calling Object.getPrototypeOf:

Similarly, we can set the prototype of the object using __proto__ or Object.setPrototypeOf:

In a nutshell, when we try to access a property of an object, JS engine first checks if the object itself contains the property. If it does, then it is returned. Otherwise, JS checks if the prototype has the property. If it doesn’t, JS checks the prototype of the prototype… and so on, until the prototype is null. It’s called the prototype chain.

The fact that JS traverses the prototype chain has an important effect: if we could somehow pollute the Object.prototype (that is, extend it with new properties), then all JS objects would have these properties.

Consider the following example:

const user = { userid: 123 };
if (user.admin) {
    console.log('You are an admin');
}

At the first sight, it may seem that it’s not possible to make the if-condition true as user object doesn’t have a property called admin. However, if we pollute the Object.prototype and define property called admin, then the console.log will execute!

Object.prototype.admin = true;
const user = { userid: 123 };
if (user.admin) {
    console.log('You are an admin'); // this will execute
}

This proves that prototype pollution may have a huge impact on security of applications as we can define properties that would change their logic. There are only a few known cases of abusing the vulnerability though (please let me know if you know more!):

Olivier Artreau exploited it to gain RCE in Ghost CMS,
I exploited it to gain RCE in Kibana,
POSIX has shown that RCE via prototype pollution is possible in ejs, as well as in pug and handlebars.

Before going to the main point of this article, I need to cover one more topic: how the protype pollution may occur in the first place?

The entry point of this vulnerability is usually the merge operation (that is copying all properties from one object to the other object). For instance:

const obj1 = { a: 1, b: 2 };
const obj2 = { c: 3, d: 4 };
merge(obj1, obj2) // returns { a: 1, b: 2, c: 3, d: 4}

Sometimes the operation work recursively, for instance:

const obj1 = {
  a: {
    b: 1,
    c: 2,
  }
};
const obj2 = {
  a: {
    d: 3
  }
};
recursiveMerge(obj1, obj2); // returns { a: { b: 1, c: 2, d: 3 } }

The basic flow of recursive merge is:

Iterate over all properties of obj2 and check if they exist in obj1.
If a property exists, then perform a merge operation on this property.
If a property doesn’t exist, then copy it from obj2 to obj1.

In the real world, if user has any control of objects being merged, then usually one of the objects come from the output of JSON.parse. And JSON.parse is a little bit special because it treats __proto__ as a “normal” property, i.e. without its special meaning of being a prototype accessor. Consider the following example:

In the example, obj1 was created using the curly bracket notation of JS, while obj2 is created with JSON.parse. Both objects have only one property defined, called __proto__. However, accessing obj1.__proto__ returns Object.prototype (so __proto__ is the special property that returns the prototype), while obj2.__proto__ contains the value given in the JSON, namely: 123. This proves that __proto__ property is treated differently in JSON.parse than in ordinary JavaScript.

So now imagine a recursiveMerge function that merges two objects:

obj1={}
obj2=JSON.parse('{"__proto__":{"x":1}}')

The function would work more or less like the following steps:

Iterate over all properties in obj2. The only property is __proto__.
Check if obj1.__proto__ exists. It does.
Iterate over all properties in obj2.__proto__. The only property is x.
Assign: obj1.__proto__.x = obj2.__proto__.x. Because obj1.__proto__ points to Object.prototype, then the prototype is polluted.

This type of bug was identified in many popular JS libraries, including lodash or jQuery.

Prototype pollution and HTML sanitizers

Now we know what prototype pollution is and how a merge operation can introduce the vulnerability. As I mentioned earlier, all publicized examples of exploiting prototype pollution focused on NodeJS, where the goal was to achieve Remote Code Execution. However, client-side JavaScript can also be affected by the vulnerability. So the question I asked myself was: what can attackers gain from prototype pollution in the browser’s world?

I focused my attention on HTML sanitizers. HTML sanitizers are libraries whose job is to take an untrusted HTML markup, and delete all tags or attributes that could introduce an XSS attack. Usually they’re based on allow-lists; that is, they have a list of tags and attributes that are allowed, and all other ones are deleted.

Imagine that we have a sanitizer that allows only <b> and <h1> tags. If we fed it with the following markup:

<h1>Header</h1>This is <b>some</b> <i>HTML</i><script>alert(1)</script>

It should clean it to the following form:

<h1>Header</h1>This is <b>some</b> HTML

HTML sanitizers need to maintain the list of allowed elements attributes and elements. Basically, libraries usually employ one of two ways to store the list:

1. In an array

The library might have an array with a list of allowed elements, for instance:

const ALLOWED_ELEMENTS = ["h1", "i", "b", "div"]

Then to check if some element is allowed, they simply call ALLOWED_ELEMENTS.includes(element). This approach makes it safe from prototype pollution as we cannot extend an array; that is, we can’t pollute the length property, nor the indexes that already exist.

For instance, even if we do:

Object.prototype.length = 10;
Object.prototype[0] = 'test';

Then ALLOWED_ELEMENTS.length still returns 4 and ALLOWED_ELEMENTS[0] is still "h1".

2. In an object

The other solution is to store an object with allowed elements, for instance:

const ALLOWED_ELEMENTS = {
  "h1": true,
  "i": true,
  "b": true,
  "div" :true
}

Then to check if some elements is allowed, the library may check for existence of ALLOWED_ELEMENTS[element]. This approach is easily exploitable via prototype pollution; since if we pollute the prototype the following way:

Object.prototype.SCRIPT = true;

Then ALLOWED_ELEMENTS["SCRIPT"] returns true.

List of analyzed sanitizers

I searched for HTML sanitizers in npm and found three most popular ones:

sanitize-html with around 800k downloads per week
xss with around 770k downloads per week
dompurify with around 544k downloads per week

I also included google-closure-library, which isn’t very popular in npm, but is extremely commonly used in Google applications. And Google is my favourite bug bounty program so it was worth looking into.

In the next chapters I’ll give a short overview of all the sanitizers, and show how all of them can be bypassed with prototype pollution. I’ll assume that the prototype is polluted before the library is even loaded. I will also assume that all sanitizers are used in the default configuration.

sanitize-html

The invocation of sanitize-html is simple:

Optionally, you can pass second parameter to sanitizeHtml with options. But if you don’t, then default options are used:

sanitizeHtml.defaults = {
  allowedTags: ['h3', 'h4', 'h5', 'h6', 'blockquote', 'p', 'a', 'ul', 'ol',
    'nl', 'li', 'b', 'i', 'strong', 'em', 'strike', 'abbr', 'code', 'hr', 'br', 'div',
    'table', 'thead', 'caption', 'tbody', 'tr', 'th', 'td', 'pre', 'iframe'],
  disallowedTagsMode: 'discard',
  allowedAttributes: {
    a: ['href', 'name', 'target'],
    // We don't currently allow img itself by default, but this
    // would make sense if we did. You could add srcset here,
    // and if you do the URL is checked for safety
    img: ['src']
  },
  // Lots of these won't come up by default because we don't allow them
  selfClosing: ['img', 'br', 'hr', 'area', 'base', 'basefont', 'input', 'link', 'meta'],
  // URL schemes we permit
  allowedSchemes: ['http', 'https', 'ftp', 'mailto'],
  allowedSchemesByTag: {},
  allowedSchemesAppliedToAttributes: ['href', 'src', 'cite'],
  allowProtocolRelative: true,
  enforceHtmlBoundary: false
};

allowedTags property is an array, which means we cannot use it in prototype pollution. It’s worth noticing, though, that iframe is allowed.

Moving forward, allowedAttributes is a map, which gives an idea that adding property iframe: ['onload'] should make it possible to perform XSS via <iframe onload=alert(1)>.

Internally, allowedAttributes are rewritten to a variable allowedAttributesMap. And here’s the logic that decides whether an attribute should be allowed or not (name is the name of the current tag, and a is the name of the attribute):

// check allowedAttributesMap for the element and attribute and modify the value
// as necessary if there are specific values defined.
var passedAllowedAttributesMapCheck = false;
if (!allowedAttributesMap ||
  (has(allowedAttributesMap, name) && allowedAttributesMap[name].indexOf(a) !== -1) ||
  (allowedAttributesMap['*'] && allowedAttributesMap['*'].indexOf(a) !== -1) ||
  (has(allowedAttributesGlobMap, name) && allowedAttributesGlobMap[name].test(a)) ||
  (allowedAttributesGlobMap['*'] && allowedAttributesGlobMap['*'].test(a))) {
    passedAllowedAttributesMapCheck = true;
}

We will focus on checks on allowedAttributesMap. In a nutshell, it is checked whether the attribute is allowed for current tag or for all tags (when the wildcard '*' is used). Quite interestingly, sanitize-html has some sort of protection against prototype pollution:

// Avoid false positives with .__proto__, .hasOwnProperty, etc.
function has(obj, key) {
  return ({}).hasOwnProperty.call(obj, key);
}

hasOwnProperty checks whether an object has a property but it doesn’t traverse the prototype chain. This means that all calls to has function are not susceptible to prototype pollution. However, has is not used for wildcard!

(allowedAttributesMap['*'] && allowedAttributesMap['*'].indexOf(a) !== -1)

So if I pollute the prototype with:

Object.prototype['*'] = ['onload']

Then onload will be a valid attribute to any tag, which is proven below:

xss

Invocation of the next library, xss, looks quite similar:

It also can optionally accept a second parameter, called options. And the way it is processed is the most prototype-pollution-friendly pattern you could spot in JS code:

options.whiteList = options.whiteList || DEFAULT.whiteList;
options.onTag = options.onTag || DEFAULT.onTag;
options.onTagAttr = options.onTagAttr || DEFAULT.onTagAttr;
options.onIgnoreTag = options.onIgnoreTag || DEFAULT.onIgnoreTag;
options.onIgnoreTagAttr = options.onIgnoreTagAttr || DEFAULT.onIgnoreTagAttr;
options.safeAttrValue = options.safeAttrValue || DEFAULT.safeAttrValue;
options.escapeHtml = options.escapeHtml || DEFAULT.escapeHtml;

Prototype pollution – and bypassing client-side HTML sanitizers

Other Insights

HTML sanitization bypass in Ruby Sanitize

marginwidth/marginheight – the unexpected cross-origin communication channel

Art of bug bounty: a way from JS file analysis to XSS

Happy to get a call or email
and help!