Cryptography

Steganography and Steganalysis: Common Image Formats and LSB Part 2

Emanuele De Lucia
December 17, 2013 by
Emanuele De Lucia

JPEG

One of the most common image formats is JPEG. It surely deserves a particular discussion, and, in addition, it's very frequently used as Cover Media, in association usually with the following steganographic algorithms:

Learn Applied Cryptography

Learn Applied Cryptography

Build your applied cryptography and cryptanalysis skills with 13 courses covering hashing, PKI, SSL/TLS, full disk encryption and more.
  • Jsteg
  • F5
  • Outguess

The Jsteg is one of the most classical steganographic algorithms used. It is absolutely the first algorithm of its kind and perhaps it's also the most used. Its working routine does nothing more than consequentially replace the LSBs of DTC coefficients with bits of the data to be hidden. To better understand this, we need a quick explanation about DTC.

To explain what DTC is, we need to keep in mind that for each color component of an image, the JPEG format makes use of a mathematical function called Discrete Cosine Transformation, or simply DTC, in order to convert 8x8 pixel blocks (called also Canonical Bases) of an image in 64 corresponding DTC coefficients. Normally, 8x8 pixels of each block are transformed with the following formula:

This formula represents the transformation of a Canonical Base belonging to a spatial domain in a corresponding 8x8 block belonging to the frequencies domain. An 8x8 pixel block is more simplistically converted in a frequency spectrum formed only by pixels in black and white. The areas represented by a greater density of white pixels are those with higher frequencies that are not perceptible by the human eye, and are therefore expendable. It's with elimination of these areas that the compression algorithm is so efficient.

The JSteg exploits the LSB of DTC coefficients as redundant bits to insert hidden content within a cover media. It's useful to note that the modification of a single DTC coefficient expands on all 64 pixels in the block. However, because changes occur in the frequency domain rather than in the spatial domain, JSteg is not susceptible to visual attacks mentioned above. Below, for a clearer understanding of this algorithm, is a representation through simplified pseudo-code:

[c]

while (buffer to embed > 0) {

recover DTC coefficient from JPEG

if DTC != 0 and DTC != 1 {

recover bit of data to be hidden

replace LSB of DTC with bit of data to be hidden

}

Insert DTC inside cover media

}

[/c]

As is easily observable, the algorithm does not make use of an input secret key to share with whom you want to read the hidden message (stego-key). Anyone who knows it can recover the hidden message. Although this method ensures effective protection against visual attacks, it is quite vulnerable to statistical attacks. This peculiarity is mainly due to the fact that the data to be hidden is inserted consequentially within DTC coefficients, and not in pseudo-random order (or, more simply, "scattered"), causing evident statistics alterations that are visible in the histogram of the frequencies of the DTC coefficients. The F5, in fact, draws a steganographic
method that is not only much more complex than the JSteg, but even with respect to its predecessor (the F4, not treated here).

It is based on two distinct phases:

  1. Sorting sequence of DTC coefficients to be altered through pseudo-random numbers based on a secret key (which is used to distribute in "scattered" order the changes within the image).
  2. The use of so-called Matrix Encoding (useful for improving the efficiency of embedding, significantly decreasing the proportion of alterations needed to embed hidden data within an image).

The specific discussion of this fairly complex algorithm is beyond the scope of this article. However, it is important to remember that it strongly minimizes the number of DTC coefficients needed to store the secret information, consequently reducing the changes that occur in the cover image.

For these features, the F5 has a good resistance also against statistical attacks. However, in order to improve the capacity of detection of information hidden through F5, we can approach the problem ​​in a different way. For example, an approach worth mentioning is surely the one proposed by J. Fridrich. Among the various technicalities and mathematical relationships, the main steps that describe the attack of F5 proposed by the author can be summarized and simplified as follows:

  1. The cover image is decompressed into spatial domain.
  2. The image in the spatial domain thus obtained is split into blocks.
  3. The image is recompressed again.
  4. The image obtained now represents an estimate of the original image.
  5. Calibration of the image.
  6. Final analysis between the image obtained by the steps above and the suspicious one.

Below is shown a rapid representation of this:

L 'Outguess could be considered a middle ground between the JSteg and F5. It also includes a step of encoding by generating pseudo-random numbers (PRNG) based on a key, in order to select the DTC coefficients in which we insert the data to be hidden. Here, too, the LSB of the latter will be changed.

These just mentioned are the most commonly-used algorithms that rely on the modification of the DTC coefficients. These techniques do not affect the spatial domain of images, but the frequencies domain of the latter, acquiring therefore a great visual attack resistance (if you do not overplay with the changes). They ensure, with respect to the LSB sequential technique in the spatial domain, a greater strength to the transformation actions, for example to compression with loss of information. A downside of this technique, however, is characterized by a reduced ability for data containment.

Statistic Steganalysis

As the name suggests, the statistical attacks go to look and evaluate the properties of an image that "statistically" are altered when steganography is applied to them. Usually, these properties range from chromatic spectrum analysis to analysis of symmetry of color histograms over of course the internal entropy (so-called "noise"). Starting from "Hypothesis without content" (in other words, we start from a common base representing normal file characteristics), we search for anomalies in the file, applying algorithms that are generally very effective in detecting alterations and consequently hidden data.

Attacco χ²

The Chi-Square attack is usually applied to images that have been altered by sequential LSB replacement. It is perhaps the most well-known statistical attack against steganography. Given an image, data inserted through sequential LSB replacement causes changes the histogram of the color frequencies in a predictable way.

The χ² test is based on the fact that during the sequential replacement process of LSB, pairs of ​​interchangeable values, called PoV, are inevitably formed. Given for example the color index 110, in the moment in which the replacement algorithm is going to work on the latter, if the last significant bit is 0 and is meant to replace with 0, this of course will remain unchanged. In the moment in which we'll want to replace it with a changed value, and this is normally 0, 110 will become 111. Otherwise, 111 will become 110.

Statistically, for an original image, the frequencies of the two values ​​of each pair are quite different. At a time when these frequencies appear similar, probably the image contains hidden information inside. Obviously, the case studies change depending on the length of the messages. The following image shows the result of a Chi-Square attack against sequential LSB replacement:

The line next to the upper limit of the chart indicates a high probability to have hidden content. In contrast, a value close to the lower limit of chart indicates a low probability about the presence of hidden information. The "p-value", expressed as a percentage, represents the probability that a given image has an hidden content inside ("probability of embedding"). A variant of this attack is the analysis of the image content not embedded sequentially (so useful against non-sequential LSB replacement algorithms). Instead, therefore, to analyze the image in a sequential manner, "windows" of the latter are selected and then analyzed individually. For each window examined, the percentage of completion of course grows, and a "p-value" for each of them is recorded. At the end of the process, an absolute value of p is calculated, starting by all intermediate valued (rescaling).

Double Statistic Attack

This method is typically used on images embedded by a pseudo-random LSB replacement scheme. The dual statistical technique focuses its attention on the LSB-Plane (the LSB-Plane is basically a least-significant bit present at a given position) as well as on the position that a pixel occupies within the entire image. In practice, it tries to evaluate the LSB by combining it with the remaining seven of the entire byte itself, testing statistically the ability of the image to not lose information while it is being subjected to specific filters.

Blind Analysis

The blind analysis is a technique of generic analysis not associated with any specific steganographic algorithms. It's designed to be used against all the embedding techniques with all image formats. These algorithms are fundamentally based on the ability to detect changes in those characteristics that statistically are not preserved during the process of information embedding. These characteristics are processed and evaluated usually starting from a base of thousands of image samples, which are specifically subjected to steganographic algorithms in order to analyze the changes in their properties, varying both the embedded message length that the steganographic algorithms used. Through the composition of these "bases of behavior," these techniques are quite efficient in distinguishing a natural image from artificial one.

Hints of Advanced Steganography

Right now, it is easy to understand that, in parallel to many other branches of cyber security, the steganographic techniques are gradually evolving in order to be more effective. On one side, we try to get to a point where it's made ​​quite impossible to perform a reliable estimation of an image sample; on the other, we try continuously to improve those algorithms used in the discovery of anomalies.

Learn Applied Cryptography

Learn Applied Cryptography

Build your applied cryptography and cryptanalysis skills with 13 courses covering hashing, PKI, SSL/TLS, full disk encryption and more.

References

  1. http://www.dmi.unict.it/~battiato/CF1213/Steganografia e steganalisi nelle immagini il least significant bit - Casistica.pdf
  2. http://ricerca.mat.uniroma3.it/users/merola/critto/crittoinfo/STEGANOGRAFIA.pdf
  3. http://www.di.unisa.it/~ads/corso-security/www/CORSO-0203/steganografia.pdf
  4. http://cs.marlboro.edu/term/spring06/programming_workshop/students/dgg/work/week_six.attachments/breaking_f5.pdf
  5. http://cscjournals.org/csc/manuscript/Journals/IJCSS/volume6/Issue3/IJCSS-670.pdf
Emanuele De Lucia
Emanuele De Lucia

Emanuele is a passionate information security professional. He's worked as tier-two security analyst in the Security Operation Center (Se.O.C. or S.O.C.) of one of the largest Italian telecom companies, as well as a code security specialist in one of the world's largest multinational corporations.

Currently, he works as an information security manager at one of main facilities of an international organization. With a strong technical background, he specializes in offensive security, reverse engineering, forensic investigations, threats analysis and incident management.

He holds a Bachelors degree in Computer Science and a Masters in Computer Security and Forensic Investigations. He also holds the following professional certifications: CISSP, MCSE+Sec, C|EH, E|CSA/L|PT, CIFI, CREA, Security+ and CCNA+Sec.