Part 1: Introduction
The information revolution, which resulted in the Internet and in modern communication technologies, has pushed our society more and more toward the use and management of information in digital format. Thousands and thousands of data items currently are riding the Internet every day; their representation could be a continuous stream of data transiting through the entire globe. With the growth in quantity and especially in the importance of such information, the need to adopt systems designed to guarantee a good level of protection and security has also grown in proportion.
Among the most effective currently in use are cryptographic algorithms that are able to maintain the confidentiality of encrypted data both during transit from a source to a destination and when stored on magnetic media. The particularity of the encryption lies in the fact that, despite its being relatively complex to understand exactly what specific algorithm is applied to a specific data with a high entropy level, it is very easy for attackers to understand how such information is protected by cryptographic methods. This fact obviously goes to increase the level of attention to them and it almost always leads to the later stages of cryptanalysis and/or brute-force attacks aimed at obtaining clear data. Avoiding the possibility of these types of attacks, also on information hypothetically considered “safe,” is the ultimate intention of steganography, or, more generally, steganographic algorithms.
In fact, it’s quite easy to think that it is much more prudent to retain and/or disclose information through hedging digital formats such as images, audio, video files, and so on, so that the presence of “something” of importance is not immediately perceived, adding an additional layer of security to our movements. In sharp contrast to the steganographic algorithms just mentioned, we can find the steganalysis, which is nothing but the opposite side of the same coin. Steganography tries to “hide” potentially sensitive information inside cover media, while by contrast there is steganalysis, which tries to detect the presence of this hidden information with the lowest error rate possible. Steganography and steganalysis are very ancient and complex matters, on which many theses have been written and which have inspired endless approaches toward the implementation of new masking techniques and their analysis.
To better understand the magnitude of this argument, we have to consider that steganographic and steganalytical techniques are applicable not only to digital media in a broad sense (including apparently clear text), but also to transmission channels through the exploitation of some redundant fields inside the TCP/UDP/IP protocols. This document, however, discusses the most widely used techniques in steganography, as applied to common image formats, while also looking at the common techniques adopted by forensic analysts.
Surely, among the most widely used cover formats (and perhaps also well known to unskilled peoples) are the image formats.
Very often we hear (or have heard) that very sensitive information about possible terrorist targets or about recruitment campaigns for subversive groups are hidden within images shown in social networks that often depict landscapes or harmless subjects.
The use of such coverage techniques is currently very frequent and it is for this reason that very often it is important to understand that valuable information could be hidden within a picture. The ultimate goal of steganalysis is just that; in fact, talking about steganalysis as a method to recover hidden information would not be correct. Steganalysis aims simply to understand that some additional data could be hidden in a cover format. This usually occurs through two major categories of steganalytical techniques: specific and generic. As is easy to understand, the specific steganalytical techniques go to attack steganographic algorithms that are thought to have been used on the cover media, while general steganalytical techniques, even if we take into consideration several useful aspects to achieve the purpose, do not rely on the use of a specific algorithm.
Steganographic systems are very often based on the insertion of arbitrary data by algorithms that exploit the LSB (least significant bit) of a pixel image. This technique is one of those called “replacement techniques” because they are going to replace part of the original image with arbitrary content. These systems are very valid in a 24-bit bitmap file, i.e., one that has 24 bits per pixel.
In the images, in fact, each pixel is associated with a color, and each color is determined by the union of the red, green, and blue (the RGB triplet). As already mentioned, each pixel is composed of three bytes (24 bits). The steganographic technique that exploits the LSB, aims, simplistically, to replace the value of the last bit of each of the three bytes of the pixel inserting there the desired hidden data. This usually causes only a little change in intensity, which is usually not perceptible by human eyes in images having a good overview of colors. A good implementation of this technique in fact, certainly will not go to alter a monochromatic area of the cover media, since the alteration would be probably visible also by a naked eye.
In fact, the LSB to replace should be chosen carefully (so-called relative bits) in order to
guarantee a better result (i.e., “exact bits selection”), but the algorithms more commonly use one of the following LSB choices:
- Sequential selection (LSB selected in sequential order starting from a specific location of image)
- Pseudo-random selection (LSB selected by algorithms that aim to uniform and confuse as much as possible the secret text to insert within the cover image)
Eight-bit images are coded through the use of an indexed color palette. In this palette, 256 colors are indexed precisely, and each pixel in an image is expected to gain the value of a shade in the palette.
Simply, the pixels are then represented by pointers to this palette, instead of the RGB triplet. The exploiting LSB technique with eight-bit images (e.g., a GIF) aims to lower the number of colors used in the picture (obviously to a value less than 256) and to create a void in the palette (left by original colors) that will be filled with similar colors generated by modifying the LSB. It’s important to note that once the LSB is replaced, in general, we do not have an increase in the size of the image, and this represents a first characteristic that, in general, makes it difficult to recognize image artifacts, in addition to the fact that usually the forensic analyst does not have a model of the original image, through which it would be relatively easy to trace some sort of tampering.
However, we can always rely on steganalytical techniques to be quite effective in attacking these
mechanisms, often in close relation to the type of format to be analyzed. The LSB exploitation, in fact, undoubtedly has the advantage of being a very simple steganographic method to implement, and it is today the most widespread technique in the field, even if it presents little resistance to the eventual transformation and manipulation of cover media. In view of this, a visual attack is the shortest way to verify the presence of hidden content in the BMP or GIF format. This approach is in fact particularly effective in detecting the presence of content inserted through LSB algorithms on these formats, but it is not as effective against a slightly more sophisticated technique, such as, for example, the JSteg, which does not modify directly the LSB of the pixels of an image (spatial domain), but acts in the frequency domain of the same (this concept will be detailed below). For now, it suffices to say that, although this technique is particularly easy to implement and allows a nearly total recovery of hidden data, it is also very sensible when we go to manage the image by means of special filters, which makes it very easy to unmask.
By way of example, the images shown below represent just a quick comparison between two seemingly identical files:
The left image is an unaltered BMP.
The right image, by contrast, presents the hidden content inserted through sequential LSB replacement. We can note that there is a diversity of results by subjecting both images to a visual filtering attack (LSB0). In the right image, the bits that represent the hidden information are highlighted.
At this point, you should understand that the hidden message itself is beyond the scope of steganographic analysis; in some cases, it is even beyond cryptanalysis. Additional techniques are also worth mentioning that are to be used in parallel with LSB exploiting, which goes to increase its security and robustness. The first is simply to encrypt the steganographic text in such a way that, even if it’s fully recovered, understanding its content would certainly be an additional big challenge.
The second involves the generation of pseudo-random number sequences (pseudo-random selection) that will determine which pixels will suffer alteration of the LSBs in order to promote the spreading of the hidden message within the original image, thereby not excessively disrupting the statistical footprint of the original image (on which are based other steganalytical techniques, discussed afterwards).
In the second part of this document, we will discuss JPEG and other steganographic methods such as JSteg, F5, and OutGuess and also other steganalytical techniques, such as the χ² attack and statistical generic attacks.