Secure coding

A Guide to XML File Structure & External Entity (XXE) Attacks

Infosec
March 28, 2018 by
Infosec

Modern forms of markup languages such as HTML, XML and XHTML are mostly used in designing web pages.

XML, which stands for Extensible Markup Language, defines a set of rules for encoding documents in a format that can be read by both humans and machines.1

Learn Secure Coding

Learn Secure Coding

Build your secure coding skills in C/C++, iOS, Java, .NET, Node.js, PHP and other languages.

XML has some known advantages which makes it a universal and secure language. These advantages include the ability to store data independent of whether a system is software or hardware.

Another advantage of XML is its ability to present and make data available to devices such as smartphones and digital telephone booths. XML supports the consistent encoding and handling of text, allowing most data outputs in any written human language to be presented.

Web technologies have greatly advanced, presenting users with both benefits and vulnerabilities. XML, too, has its fair share. This article will introduce the basic structure of XML and then shed some light on the external entity attack.

XML File Document Structure

The Extensible Markup Language (XML) 1.0 fifth edition, which is a W3C recommendation, states each XML document has a physical and a logical structure.

  • Physical structure: The physical structure of the document is made up of storage units called entities. Each entity has a name, content and may refer to other entities to cause their inclusion in the document.
  • Logical structure: The logical structure of the document is made up of comments, elements, declaration and processing instruction.

XML File Declaration

Mostly, if used, the first line of an XML file is the declaration. This identifies the document as being XML and allows it to be parsed by the processor — a software module used to read the document.

XML Declaration Syntax

The syntax is: <?xml version="1.0" encoding=" ISO-10646-UCS-4 " standalone="no" ?>

Understanding XML Declaration File Parts

Information Possible Information Value Description

<?xml The first five characters notify the processor this line in the file is an XML declaration and not the start of another content.

Version 1.0 The version number is part of the declaration in anticipation of a time when there will be more than one version.

Encoding UTF-8, UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9, ISO-2022-JP, Shift_JIS, EUC-JP These are the encoding names of the most popular character sets in use now. An extensive list is available at: https://www.iana.org/assignments/character-sets/character-sets.xml

Standalone Yes or no This declaration determines whether the document contains any external entities. If there are no external entities, the value is set to "yes" and then "no" when there are dependencies on external entities.

What Is a Document Type Declaration (DTD)?

The document type declaration immediately follows an XML declaration. It checks the validity of the structure of XML documents against some predefined rules of an XML language. It also provides the location of the DTD. The declaration is known as internal when the backbone of an XML document is declared within the XML files and external when the same backbone is declared outside the XML files.

What Does an Internal DTD Look Like?

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

<!DOCTYPE geolocation [

<!ELEMENT geolocation (latitude, longitude, temperature)>

<!ELEMENT latitude (#PCDATA)>

<!ELEMENT longitude (#PCDATA)>

<!ELEMENT temperature (#PCDATA)>

]>

<geolocation>

<latitude>14.5° North</latitude>

<longitude>20.8° West</longitude>

<temperature>39°Celcius</temperature>

</geolocation>

What Does an External DTD Look Like?

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

<!DOCTYPE geolocation SYSTEM "geolocation.dtd">

<geolocation>

<latitude>14.5° North</latitude>

<longitude>20.8° West</longitude>

<temperature>39°Celcius</temperature>

</geolocation>

The content of the DTD file address geolocation.dtd is shown below:

<!ELEMENT geolocation (latitude, longitude, temperature)>

<!ELEMENT latitude (#PCDATA)>

<!ELEMENT longitude (#PCDATA)>

<!ELEMENT temperature (#PCDATA)>

What Is an XML External Entity Attack?

An XML external entity attack is a type of attack against an application that parses XML input. This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser.2

The attack may lead to the exposure of sensitive and confidential data, or access to free or usable TCP/UDP ports.

How Does an XML External Entity Attack Work?

  1. The attacker prepares an XML message together with a DTD as shown below. This message commonly includes an XXE that reads a locally stored file, for example "/etc/geofile"
  2. The attacker sends the prepared XML message to the web application on the IP address 192.168.0.2
  3. The web application processes the incoming XML message. It parses the DTD, resolves the XXE and then deals with the resulting XML
  4. The web application sends an HTTP response to the attacker at 192.168.0.1. For a successful XXEA, this response must somehow contain the content of the locally stored file, for example, the "/etc/geofile" file.



How Can I Prevent XXE Attacks?

Considering the number of possibilities to mount an external entity attack, its prevention may not be that easy. Continuous training and use of pentesting tools may be a good way to keep developers updated on new attacks. Deploying web application firewalls may also act as an effective way to monitor and prevent these attacks.

This is, for instance, the case for the default Java XML parser. To mitigate XXE attacks, one has to configure the parser as follows:3

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();3

// not important to prevent XXEA

dbf.setNamespaceAware(true);3

// validate document while parsing it.

dbf.setValidating(true);3

// do not expand entity reference nodes

dbf.setExpandEntityReferences(false);3

// validate document against DTD.

dbf.setFeature("http://xml.org/sax/features/validation", true);3

// do not include external general entities

dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);3

// do not include external parameter entities or the external DTD subset

dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);3

// build the grammar but do not use the default attributes and attribute types information it contains

dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);3

// ignore the external DTD completely

dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-ext.3

Learn Secure Coding

Learn Secure Coding

Build your secure coding skills in C/C++, iOS, Java, .NET, Node.js, PHP and other languages.

Conclusion

Leaving settings as default for most XML parsers will expose web applications to XML external entity attacks. This vulnerability can be used to unveil sensitive data and cause denial of service. Performing analysis and sometimes deep inspection of packets may reveal attempts to exploit this vulnerability. A good understanding of XML and the XXE attack will help system administrators protect their applications against this vulnerability.

Sources

  1. Understanding Compression: Data Compression for Modern Developers By Colt McAnlis, Aleks Haecky
  2. https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing
  3. https://web-in-security.blogspot.com/2014/11/detecting-and-exploiting-xxe-in-saml.html
Infosec
Infosec