JPEG

One of the most common image formats is JPEG. It surely deserves a particular discussion, and, in addition, it’s very frequently used as Cover Media, in association usually with the following steganographic algorithms:

  • Jsteg
  • F5
  • Outguess

The Jsteg is one of the most classical steganographic algorithms used. It is absolutely the first algorithm of its kind and perhaps it’s also the most used. Its working routine does nothing more than consequentially replace the LSBs of DTC coefficients with bits of the data to be hidden. To better understand this, we need a quick explanation about DTC.

To explain what DTC is, we need to keep in mind that for each color component of an image, the JPEG format makes use of a mathematical function called Discrete Cosine Transformation, or simply DTC, in order to convert 8×8 pixel blocks (called also Canonical Bases) of an image in 64 corresponding DTC coefficients. Normally, 8×8 pixels of each block are transformed with the following formula:

This formula represents the transformation of a Canonical Base belonging to a spatial domain in a corresponding 8×8 block belonging to the frequencies domain. An 8×8 pixel block is more simplistically converted in a frequency spectrum formed only by pixels in black and white. The areas represented by a greater density of white pixels are those with higher frequencies that are not perceptible by the human eye, and are therefore expendable. It’s with elimination of these areas that the compression algorithm is so efficient.

The JSteg exploits the LSB of DTC coefficients as redundant bits to insert hidden content within a cover media. It’s useful to note that the modification of a single DTC coefficient expands on all 64 pixels in the block. However, because changes occur in the frequency domain rather than in the spatial domain, JSteg is not susceptible to visual attacks mentioned above. Below, for a clearer understanding of this algorithm, is a representation through simplified pseudo-code:

while (buffer to embed > 0) {
recover DTC coefficient from JPEG
if DTC != 0 and DTC != 1 {
recover bit of data to be hidden
replace LSB of DTC with bit of data to be hidden
}
Insert DTC inside cover media
}

As is easily observable, the algorithm does not make use of an input secret key to share with whom you want to read the hidden message (stego-key). Anyone who knows it can recover the hidden message. Although this method ensures effective protection against visual attacks, it is quite vulnerable to statistical attacks. This peculiarity is mainly due to the fact that the data to be hidden is inserted consequentially within DTC coefficients, and not in pseudo-random order (or, more simply, “scattered”), causing evident statistics alterations that are visible in the histogram of the frequencies of the DTC coefficients. The F5, in fact, draws a steganographic
method that is not only much more complex than the JSteg, but even with respect to its predecessor (the F4, not treated here).

It is based on two distinct phases:

  1. Sorting sequence of DTC coefficients to be altered through pseudo-random numbers based on a secret key (which is used to distribute in “scattered” order the changes within the image).
  2. The use of so-called Matrix Encoding (useful for improving the efficiency of embedding, significantly decreasing the proportion of alterations needed to embed hidden data within an image).

The specific discussion of this fairly complex algorithm is beyond the scope of this article. However, it is important to remember that it strongly minimizes the number of DTC coefficients needed to store the secret information, consequently reducing the changes that occur in the cover image.

For these features, the F5 has a good resistance also against statistical attacks. However, in order to improve the capacity of detection of information hidden through F5, we can approach the problem ​​in a different way. For example, an approach worth mentioning is surely the one proposed by J. Fridrich. Among the various technicalities and mathematical relationships, the main steps that describe the attack of F5 proposed by the author can be summarized and simplified as follows:

  1. The cover image is decompressed into spatial domain.
  2. The image in the spatial domain thus obtained is split into blocks.
  3. The image is recompressed again.
  4. The image obtained now represents an estimate of the original image.
  5. Calibration of the image.
  6. Final analysis between the image obtained by the steps above and the suspicious one.

Below is shown a rapid representation of this:

Want to learn more?? The InfoSec Institute CISSP Training course trains and prepares you to pass the premier security certification, the CISSP. Professionals that hold the CISSP have demonstrated that they have deep knowledge of all 10 Common Body of Knowledge Domains, and have the necessary skills to provide leadership in the creation and operational duties of enterprise wide information security programs.

InfoSec Institute's proprietary CISSP certification courseware materials are always up to date and synchronized with the latest ISC2 exam objectives. Our industry leading course curriculum combined with our award-winning CISSP training provided by expert instructors delivers the platform you need in order to pass the CISSP exam with flying colors. You will leave the InfoSec Institute CISSP Boot Camp with the knowledge and domain expertise to successfully pass the CISSP exam the first time you take it. Some benefits of the CISSP Boot Camp are:

  • Dual Certification - CISSP and ISSEP/ISSMP/ISSAP
  • We have cultivated a strong reputation for getting at the secrets of the CISSP certification exam
  • Our materials are always updated with the latest information on the exam objectives: This is NOT a Common Body of Knowledge review-it is intense, successful preparation for CISSP certification.
  • We focus on preparing you for the CISSP certification exam through drill sessions, review of the entire Common Body of Knowledge, and practical question and answer scenarios, all following a high-energy seminar approach.

L ‘Outguess could be considered a middle ground between the JSteg and F5. It also includes a step of encoding by generating pseudo-random numbers (PRNG) based on a key, in order to select the DTC coefficients in which we insert the data to be hidden. Here, too, the LSB of the latter will be changed.

These just mentioned are the most commonly-used algorithms that rely on the modification of the DTC coefficients. These techniques do not affect the spatial domain of images, but the frequencies domain of the latter, acquiring therefore a great visual attack resistance (if you do not overplay with the changes). They ensure, with respect to the LSB sequential technique in the spatial domain, a greater strength to the transformation actions, for example to compression with loss of information. A downside of this technique, however, is characterized by a reduced ability for data containment.

Statistic Steganalysis

As the name suggests, the statistical attacks go to look and evaluate the properties of an image that “statistically” are altered when steganography is applied to them. Usually, these properties range from chromatic spectrum analysis to analysis of symmetry of color histograms over of course the internal entropy (so-called “noise”). Starting from “Hypothesis without content” (in other words, we start from a common base representing normal file characteristics), we search for anomalies in the file, applying algorithms that are generally very effective in detecting alterations and consequently hidden data.

Attacco χ²

The Chi-Square attack is usually applied to images that have been altered by sequential LSB replacement. It is perhaps the most well-known statistical attack against steganography. Given an image, data inserted through sequential LSB replacement causes changes the histogram of the color frequencies in a predictable way.

The χ² test is based on the fact that during the sequential replacement process of LSB, pairs of ​​interchangeable values, called PoV, are inevitably formed. Given for example the color index 110, in the moment in which the replacement algorithm is going to work on the latter, if the last significant bit is 0 and is meant to replace with 0, this of course will remain unchanged. In the moment in which we’ll want to replace it with a changed value, and this is normally 0, 110 will become 111. Otherwise, 111 will become 110.

Statistically, for an original image, the frequencies of the two values ​​of each pair are quite different. At a time when these frequencies appear similar, probably the image contains hidden information inside. Obviously, the case studies change depending on the length of the messages. The following image shows the result of a Chi-Square attack against sequential LSB replacement:

The line next to the upper limit of the chart indicates a high probability to have hidden content. In contrast, a value close to the lower limit of chart indicates a low probability about the presence of hidden information. The “p-value“, expressed as a percentage, represents the probability that a given image has an hidden content inside (“probability of embedding“). A variant of this attack is the analysis of the image content not embedded sequentially (so useful against non-sequential LSB replacement algorithms). Instead, therefore, to analyze the image in a sequential manner, “windows” of the latter are selected and then analyzed individually. For each window examined, the percentage of completion of course grows, and a “p-value” for each of them is recorded. At the end of the process, an absolute value of p is calculated, starting by all intermediate valued (rescaling).

Double Statistic Attack

This method is typically used on images embedded by a pseudo-random LSB replacement scheme. The dual statistical technique focuses its attention on the LSB-Plane (the LSB-Plane is basically a least-significant bit present at a given position) as well as on the position that a pixel occupies within the entire image. In practice, it tries to evaluate the LSB by combining it with the remaining seven of the entire byte itself, testing statistically the ability of the image to not lose information while it is being subjected to specific filters.

Blind Analysis

The blind analysis is a technique of generic analysis not associated with any specific steganographic algorithms. It’s designed to be used against all the embedding techniques with all image formats. These algorithms are fundamentally based on the ability to detect changes in those characteristics that statistically are not preserved during the process of information embedding. These characteristics are processed and evaluated usually starting from a base of thousands of image samples, which are specifically subjected to steganographic algorithms in order to analyze the changes in their properties, varying both the embedded message length that the steganographic algorithms used. Through the composition of these “bases of behavior,” these techniques are quite efficient in distinguishing a natural image from artificial one.

Hints of Advanced Steganography

Right now, it is easy to understand that, in parallel to many other branches of cyber security, the steganographic techniques are gradually evolving in order to be more effective. On one side, we try to get to a point where it’s made ​​quite impossible to perform a reliable estimation of an image sample; on the other, we try continuously to improve those algorithms used in the discovery of anomalies.

References

  1. http://www.dmi.unict.it/~battiato/CF1213/Steganografia e steganalisi nelle immagini il least significant bit – Casistica.pdf
  2. http://ricerca.mat.uniroma3.it/users/merola/critto/crittoinfo/STEGANOGRAFIA.pdf
  3. http://www.di.unisa.it/~ads/corso-security/www/CORSO-0203/steganografia.pdf
  4. http://cs.marlboro.edu/term/spring06/programming_workshop/students/dgg/work/week_six.attachments/breaking_f5.pdf
  5. http://cscjournals.org/csc/manuscript/Journals/IJCSS/volume6/Issue3/IJCSS-670.pdf