The XML (Extensible markup language) is a buzzword over the internet, rapidly maturing technology with powerful real world application, especially for management, organization, and exhibition of data. XML technology is solely concerned with the structure and description of data that are typically transported across the network in a bid for easily sharing between diverse computer platforms. However, in this context as a demerit of this technology, sensitivity of data might be compromised, since it travels across various mediums and hacker could easily manipulate it. Thus, increasing the exposure of your data requires careful planning to secure that data. In this article, you’ll therefore, explore various XML-specific attacks along with testing of non-xml loopholes for instance buffer overflow, spoofing, and HTML scripting when the data is inputted in XML format, as well as how to stringent the security of those applications that repeatedly interacts with XML.


The researcher is supposed to have a thorough understanding of XML manipulation in terms of programming, along with competencies in web technologies including ASP.NET or PHP on different-2 platforms (windows, Linux etc..), as well as understanding scripting attacks.


In early attempts, computer scientists were striving to exchange essential data between computers across diverse platforms in universal format. They finally devised a mechanism in form of SGML, which was a universal interchangeable data format with rich information storage capabilities. Henceforth, SGML served the purpose of data marking-up and encroaches in large management system swiftly but consider less competent in huge amount of complex data. As a result, XML (“self-describing”) was born. It was, in fact, a subset of SGML, with same goals but with as much of the complexities eliminated as possible. However, XML was designed to be fully compatible with SGML. Moreover, it is important to realize that XML is not a language at all. Instead, a standard for creating a language that describes syntax in your own language.

Therefore, XML specification is a set of norms, defined by the W3C. Unlike HTML, XML does not have fixed number of pre-defined tags. Instead, it is a meta-language that allows for creation of other markup language. For example, the following specimen XML file demonstrates a product catalog as:

You are free to use whatever elements names describes your data best as in this example elements such <productCatalog>, <products>, and <name> indicates the documents structures in free way because of the extensibility and flexibility that’s the XML provide.

XML is popular due to its simplicity; and the rules of XML are much shorter that the rules of its predecessor. XML acts like the glue that enables diverse systems work together, streamline standardize business processes and transactions between organizations. Moreover, XML is not suited for data exchange but many other applications such as e-commerce site integration, web services, pricing system, and intranet applications integrate existing business application are held together by the exchange of XML documents. Hence, the following summarize its advantages as:

  • Extensibility: XML can appropriate any type of data and is cheaper to implement, as it neither imposes rules about data semantics, nor ties organization into propriety networks.
  • Adaptation: Today, organizations solely rely on XML to store data, since XML is ubiquitous. It is automatically shared by any platform application.
  • Related standard and tools: Developers have ready-made, easily accessible components and tools for creating, reading, parsing, validating, searching, and transforming XML documents one form to another.
  • Reduce Server Load: Web-based applications can use XML to reduce the load on the web server by keeping all information on the client as long as possible and sending their information to those servers in one huge XML file.
  • RPC: XML can be used as a means of sending data for Remote-procedure calls, which enable communication behind firewall that would normally block such calls.


XML was originally intended for use on Web site contents similar to Hypertext Markup Language (HTML) was. However, its potential for transforming and reusing data has placed it far beyond simply this use. As its widespread usage among distributed computing and e-commerce solution, attacker founds an another way to breach the data in form of XML specific attack where they typically controls XML input or input that is used to create XML. The following section discusses various aspects of XML and related functionality in addition to details of the associated security concerns.

XML Data Injection

XML is vulnerable to attacks similar to HTML script injection, where output contains attacker-supplied data and if that data stored as XML, it might be possible for a hacker to inject extra or malicious XML code that hacker not normally control. Let’s consider the following sample:

Listing: 1.1 sample XML document

<?xml version= “1.0” encoding= “utf-8” ?>

<USER role= “guest”> hacker supplied text </USER>

Here, the attacker could supply that extra code as hack1 </USER><USER role=”admin”> hack2</USER>, if programmer is not caution, they inadvertently allowing injection attack. As a result, the newly crafted file would be yield as following:

Listing: 1.2 XML data injection

<?xml version=”1.0″ encoding=”utf-8″ ?>

<USER role= “guest”> hack1 </USER>

<USER role= “admin”> hack2 </USER>

CDATA Manipulation

The CDATA section typically is not interpreted by the XML parser which open the doors to attacker of passing malicious extra code in a bid of accessing something important for which they are not entitled indeed. Let’s consider the following sample where hacker is executing extra code by owing to HTML img tag as:

Listing: 1.3 Malicious JavaScript code executions

<?xml version= “1.0” encoding= “utf-8” ?>

<! COMPUTER type= “laptop” > Computer Color <! [CDATA[<IMG SRC= “javascript:alert(document.domain)”>]]>


As many RSS reader renders items in HTML with script supportable engine, which might be run in an elevated security context.

External Entities

An entity can refer to the content of a file, which is specified by the URL in the XML file. Thus, hacker might be specifying an XML file that is processed under a different security context. In other words, he can stipulate files that they can’t access normally like Boot.ini or other system file where confidential data usually stored. You can verify this vulnerability by seeking an access to system file through XML input as:

Listing: 1.4 External entity samples

<?xml version= “1.0” encoding= “utf-8” ?>

<!DOCTYPE testing [


<! ENTITY aa SYSTEM “c:/boot.ini”>


<testing> &aa; </testing>

XML Bombing

XML bombs are a kind of decompression attack whew an entity can refer two or more additional entities that also reference several more entities. Let’s consider the following sample.

Listing: 1.5 sample XML bombing

<?xml version= “1.0” encoding= “utf-8” ?>

<!DOCTYPE testing [

<! ENTITY a0 “Director”>

<! ENTITY a1 “&a0;&a0;”>

<! ENTITY a2 “&a1;&a1;”>

<! ENTITY a3 “&a2;&a2;”>

<! ENTITY a4 “&a3;&a3;”>

<! ENTITY a5 “&a4;&a4;”>

<! ENTITY a6 “&a5;&a5;”>

<! ENTITY a7 “&a6;&a6;”>

<! ENTITY a8 “&a7;&a7;”>


<! ENTITY a100 “&a99;&a99;”>


<testing> &a100; </testing>

In the previously mentioned sample, XML first replace &a100 with &a99;&99; and so on. This replace chain would continue until the replacement string values became “Director”. Hence, this is a huge string and obviously fair amount processing occurred to settle it, which results in denial of service attack.

Infinite Entity Loops

In entities, user-defined names and replacement text can be created to offer an easy way to represent text of choice. The entity name is replaced with its replacement when an XML parser encounters an entity. Therefore, it is possible to create an infinite loop of entries referring to themselves. This can be used as another denial of service (DOS) attack against the XML parser. Consider the following XML that defines two entities named aa and bb.

Listing: 1.6 infinite entity loop sample

<! ENTITY % aa ‘&#a23;bb;’>

<! ENTITY % bb ‘&#a23;aa;’>


Here, XML causes %aa to become %bb; and then %bb becomes %aa, and so on. As a result, the entity conversion is now in an infinite loop.

XPath Injection

XPath enables querying an XML document in way similar fashion to SQL database queries. Attackers sometimes rely on this approach whenever they can’t access the XML data directly. They can construct the input to inject arbitrary queries to retrieve data that the hacker would not normally be allowed to view. Let’s consider the following XML file that is stored on server in which user name and password are stored as:

Listing: 1.7 Samples

<?xml version=”1.0″ encoding=”utf-8″ ?>











This XML file is stored on the server in a restricted zone. Only the user name is displayed when a web page hosted on the server that queries the XML. But an attacker can control parts of the XPath query by specifying x’)] |//*|//*[contains(name,’y as the data. As a result, the hacker can return the entire file contents through the crafted malicious input, which would be yield as:

Listing: 1.8 XPath attacking

//*[contains(name,’x’)] | //*| //*[contains(name,’y’)]/name

Malicious Large File Reference

XML files can reference other files specified by a URL but an attacker can send XML to the victim machine and reference additional file in that XML. Wherein, the additional files can be extremely huge in size, which consumes more resource on the victim computer.


XML practice is becoming popular in both client and server application day by day. Traditional Attacks are also applicable with XML input such spoofing, buffer overflow, scripting and many more. Testing of these types of loopholes requires encoding certain character so that the test case is seen by the parser as well-formed XML. Moreover, sensitive data encryption, digital certificates are the best way in which to secure any document that has to transverse the Internet.

Well-Formed XML

Overall, XML offers a mean of sharing data between any two application, whether they’re old or new, written in different language, hosted on different OS, or even built by distinct organization. However, it is a fairly strict standard to sustain broad compatibility. Otherwise, it would be difficult to distinguish between a harmless variance and a subtle error that’s leads to inconsistencies and might be leave security holes in application. To prevent this, XML parser performs sort of quality control checks to check whether the document meets pre-defines XML W3C standards or not. If the documents abide by these rules, it is deemed to be well formed. In this context, an XML document must meet these criteria:

  • A document can have only one-root element.
  • Every starting tag must have an end tag (/>). Whereas some tags in HTML have only a begin tag.
  • All attributes must have double quotes around the value.
  • Elements can’t be overlapped and nested properly.
  • An element can’t have two attributes with the same name.
  • Elements and Attributes are case-sensitive.
  • Comments can’t be place inside the tags.
  • Elements must abide by XML naming conventions.

The main reason for creating all these semantics about writing well-formed XML is so that we can create an application to read in the data, and easily tell markup from information. XML developers can apply a set of constraints that are employed while parsing the XML data known as logical schema (schema). Thus, there are three pillars of XML logical structures that make up a document. They are the XML Declaration, the Document Type Declaration, and the Document Element. The XML Declaration is responsible for defining the version of the standard with which the document is in compliance. The Document Type Declaration defines the rules and definitions the document is to adhere to. Finally, Document Element is the container for the document’s content.

Ethical Hacking Training – Resources (InfoSec)

XML Encryption

XML document element including the start and end tags, the content within an element between the start and end tags, encryption provides the security of those sensitive parts which contain special information on the server. The encrypted data is structured using the <EncryptedData> element that contains information relating to encrypting/decrypting the information, as well as also includes the encryption algorithm. Such key used for encryption, references to external data objects, and either the encrypted data or a reference to the encrypted data. Let’s consider the following XML document to be encrypted:

Listing: 1.9 unencrypted samples

<?xml version=”1.0″ encoding=”utf-8″ ?>









Following the encryption process laid out by the XML Encryption specification over the previous XM file, the result is shown.

Listing: 1.10 Encrypted XML

<?xml version=”1.0″ encoding=”utf-8″ ?>





xmlns:xenc=’http://www.w3.org/2000/11/temp-xmlenc’ Type=”Element”>






XML is a mechanism of describing data in a format that makes it intelligible to applications no matter what format the data needs to be read in, makes it possible to express the same data in multiple forms. In this article, you therefore got the taste of XML features with functional overview and key concepts associated with it, as well as why it is so useful by explaining its advantage over its predecessor technology SGML .You should develop an understanding of how XML can be leveraged in your web applications. Finally, discuss the risks associated with using XML improperly and how possibly to secure data manipulated by XML are also covered deeply.


[1] http://resources.infosecinstitute.com/xxe-attacks/

[2] www.w3.org/TR/REC-xml

[3] www.watchfire.com/securearea/whitepapers.aspx?id=9.

[4] http://www.soapui.org/security-testing/security-scans/xml-bomb.html

[5] https://www.owasp.org/index.php/Testing_for_XML_Injection_(OTG-INPVAL-008)

[6] https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing