File carving is a process used in computer forensics to extract data from a disk drive or other storage device without the assistance of the file system that originality created the file. It is a method that recovers files at unallocated space without any file information and is used to recover data and execute a digital forensic investigation. It also called “carving,” which is a general term for extracting structured data out of raw data, based on format specific characteristics present in the structured data.
As a forensics technique that recovers files based merely on file structure and content and without any matching file system meta-data, file carving is most often used to recover files from the unallocated space in a drive. Unallocated space refers to the area of the drive which no longer holds any file information as indicated by the file system structures like the file table. In the case of damaged or missing file system structures, this may involve the whole drive. In simple words, many filesystems do not zero-out the data when they delete it. Instead, they simply remove the knowledge of where it is. File carving is the process of reconstructing files by scanning the raw bytes of the disk and reassembling them. This is usually done by examining the header (the first few bytes) and footer (the last few bytes) of a file.
File carving is a great method for recovering files and fragments of files when directory entries are corrupt or missing. This is especially used by forensics experts in criminal cases for recovering evidence. In certain cases related to child pornography, law enforcement agents are often able to recover more images from the suspect’s hard disks by using carving techniques. Another example is the hard disks and removable storage media that U.S. Navy Seals took from Osama Bin Laden’s campus during their raid. Forensic experts used file carving techniques to squeeze every bit of information out of this media.
Difference between file recovery and file carving
After reading the above, I think you might be confused: If file carving is a method of file recovery, then what is the difference between file recovery and file carving?
Modern operating systems do not automatically eradicate a deleted file without prompting for the user’s confirmation. Deleted files are recoverable by using some forensic programs if the deleted file’s space is not overwritten by another file. A damaged file can only be recovered if its data is not corrupted beyond a minimal degree. File recovery is different from file restoration, in which a backup file stored in a compressed (encoded) form is restored to its usable (decoded) form. So there is a difference between the techniques. File recovery techniques make use of the file system information and, by using this information, many files can be recovered. If the information is not correct, then it will not work.
File carving works only on raw data on the media and it is not connected with file system structure. File carving doesn’t care about any file systems which is used for storing files.In the FAT file system for example, when a file is deleted, the file’s directory entry is changed to unallocated space. The first character of the filename is replaced with a marker, but the file data itself is left unchanged. Until it’s overwritten, the data is still present.
File systems overview
Windows File systems: Microsoft Windows simply uses two types of files system FAT and NTFS.
A) FAT, which stands for “file allocation table,” is the simplest file system type. It consists of a boot sector, a file allocation table, and plain storage space to store files and folders. Lately, FAT has been extended to FAT12, FAT16, and FAT32. FAT32 is compatible with Windows-based storage devices. Windows can’t a create FAT32 file system with a size of more than 32GB.
B) NTFS, or “new technology file system,” started when Windows NT introduced in market. NTFS is the default type for file systems over 32GB. This file system supports many file properties, including encryption and access control.
Linux File systems: We already know that Linux is an open source operating system. It was developed for testing and development and aimed to use different concepts for file systems. In Linux there are varieties of file systems.
A) Ext2, Ext3, Ext4—This is the native Linux file system. Generally, the file system is called the root file system for all Linux distribution. Ext3 file system is just an upgraded Ext2 file system that uses transactional file write operations. Ext4 is further development of Ext3 that supports optimized file allocation information and file attributes.
B) ReiserFS—This file system is designed for storing huge amount of small files.
It has a good capability for searching files and it enables allocation of compact files by storing file tails or small files along with metadata in order not to use large file system blocks for this purpose.
C) XFS—This file system used in the IRIX server which is derived from the SGI company.
The XFS file system has great performance and is widely used to store files.
D) JFS—This is the file system currently used by most modern Linux distributions. It was developed by IBM for powerful computing systems.
MacOS File systems: Apple Macintosh OS uses only the HFS+ file system, which is an extension of the HFS file system. The HFS+ file system is applied to Apple desktop products, including Mac computers, iPhones, iPods, and Apple X Server products. Advanced server products also use the Apple Xsan file system, a clustered file system derived from StorNext or CentraVision file systems.
This file system, in addition to files and folders, also stores finder information about directories view, window positions, etc.
File Carving Techniques: During digital investigations, various types of media have to be analyzed. Relevant data can be found on various storage and networking devices and in computer memory. Various types of data such as emails, electronic documents, system logs, and multimedia files have to be analyzed. In this article, we focus on the recovery of multimedia files that are stored either on storage devices or in computer memory using the file carving approach. File carving is a recovery technique that merely considers the contents and structures of files instead of file system structures or other meta-data which is used to organize data on storage media. The below figure summarizes the file carving terminology.
The most common general file carving techniques are:
1. Header-footer or header-“maximum file size” carving—Recover files based on known headers and footers or maximum file size
- JPEG—”xFFxD8″ header and “xFFxD9” footer
- GIF—”x47x49x46x38x37x61″ header and “x00x3B”
- PST—”!BDN” header and no footer
- If the file format has no footer, a maximum file size is used in the carving program,
2. File structure-based carving
- This technique uses the internal layout of a file
- Elements are header, footer, identifier strings, and size information
3. Content-based carving
- Content structure is loose (MBOX, HTML, XML)
- Content characteristics
- Character count
- Text/language recognition
- White and black listing of data
- Statistical attributes (Chi^2)
- Information entropy
Tools widely used for file carving: Data recovery tools play an important role in most forensic investigations because smart malicious users will always try to delete evidence of their unlawful acts. Some important data recovery tools are:
- Magic Rescue
Carving Tutorial: In this section I will show you how to carve a file without using a carving tool and with a carving tool.
First, we are going to see how simple file carving happens. Before beginning at first we will have a look at a jpeg file structure. As an example, I am opening an image in hex editor.
Basically a JPEG file starts with FFD8FFE0, which is called a header.
And it ends with FFD9, which is called a trailer.
The rest of the JPEG file itself.
So if we have any kind of document file that contains an image, if we locate the header and trailer, we can recover that image from the document.
So here the scenario is that I have a Microsoft Word file and there is an image in that file, so we have to carve that image out from the Word file.
First open Hex editor and open this word file with hex editor: HxD > File > Open > your word file.
In the above figure, we can see the raw hexadecimal data that forms the Word document. Within this block of raw data, we can search for the JPG file signature to show us the location of the first JPG image. As we already know, any JPG file starts from header with value of FFD8FFE0
HxD > Search > File (or Ctrl + F)
As mentioned previously, the hexadecimal file signature for a jpg is FF D8 FF E0. Remember to select the “Hex-values” datatype and also select the first byte of the document so the search function searches down the file.
You should find a JPG header signature at offset 14FD. This location is very important and should be noted for future reference.
So now that we have our file header, we need to find the file trailer. The same method is applied to find the trailer.
HxD > Search > File (or Ctrl + F)
The JPG trailer should be located as offset 4FC6(h). Note that the offset value is not in the same place as it is for the file header. This is because we want to know the offset of the end of the bytes and not the beginning.
Now we have the header and trailer of a jpeg file and, as we previously said, between the header and trailer is the data of a jpeg file. Now we copy the whole block of data with header and trailer and store it as a new file.
HxD > Edit > Select Block (Or Ctrl + E)
File Header offset – 14FD
File Trailer offset – 2ADB
The entire jpg file will be highlighted in blue. This block of data now needs to be copied into the clipboard so that it can be stored as a separate file.
HxD > Edit > Copy or Ctrl + C or Right Click > C
Now start a new file in hex editor by clicking File > New or (Ctrl + N) and paste the contents to new file.
After that it will prompt you to confirm that you want to proceed. This is used to prevent accidental data changes when using hex editor to view files. Just click on OK.
Now we are ready to save the file; click on File > Save as.
Here I am saving that recovered image file by giving a name recover_image.jpg in my shell folder.
And that’s it! You can view the image using any photo viewer to confirm it is same as the image found in the Evidence.doc file.
This is the basic carving technique for a media format file without using any file carving tool.
Now I am going to use a file carving tool, PhotoRec, for recovering files from a flash drive.
PhotoRec is open source recovery software designed to recover lost files, including video, documents, and archives from hard disks, CD-ROMs, and lost pictures (thus the “photo recovery” name) from digital camera memory. PhotoRec ignores the file system and goes after the underlying data, so it will still work even if your media’s file system has been severely damaged or reformatted. It is available for the Windows, Linux, and MAC operating systems. You can download this software from: http://www.cgsecurity.org/testdisk-6.14.win64.zip
I have an 8GB flash drive that is formatted and now will see how we recover image files by using PhotoRec.
Here we can see our USB drive, which is showing as FLASH on K: drive. Now run the photorec_win.exe program.
After opening the program, you can see your all drive partitions, including your external media. Select the partition from which you want to recover your data.
I selected my external USB drive of 8GB, which is showing as PhysicalDrive1 and chose “Proceed.”
After that, it shows the drive file system and name; my drive name is FLASH and file system is FAT32.
In the above figure, four options are presented.
- Search after selecting the partition that holds the lost files to start the recovery.
- Options to modify the options.
- File Opt to modify the list of file types recovered by PhotoRec.
- Quit for stopping the process.
Here we have selected Options:
- Paranoid By default, recovered files are verified and invalid files rejected.
Enable brute force if you want to recover more fragmented JPEG files; note that is a very CPU-intensive operation.
- “Allow partial last cylinder” modifies how the disk geometry is determined—only non-partitioned media should be affected.
- The “Expert mode” option allows the user to force the file system block size and the offset. Each file system has its own block size (a multiple of the sector size) and offset (0 for NTFS, exFAT, and ext2/3/4); these values are fixed when the filesystem has been created/formatted. When working on the whole disk (i.e., the original partitions are lost) or a reformatted partition, if PhotoRec has found very few files, you may want to try the minimal value that PhotoRec lets you select (it’s the sector size) for the block size (0 will be used for the offset).
- Enable “Keep corrupted files” to keep files, even if they are invalid, in the hope that data may still be salvaged from an invalid file using other tools.
- Enable “Low memory” if your system does not have enough memory and crashes during recovery. This may be needed for large file systems that are heavily fragmented. Do not use this option unless absolutely necessary.
Let’s check the File Opt options:
This option is for selecting the file types to be recovered. Press S to disable all file type format selections. Here we will recover only jpeg file types because it will take a long time to recover all types of file.
Select only “JPG picture” and press “b” to save the settings.
Now go back to the main option and select the file system; here I am selecting “Other” because Windows-type file systems will be found there.
Now choose the recovery type option you want. (I am selecting “Whole”.) Choose either:
- from the whole partition (useful if the filesystem is corrupted) or
- from the unallocated space only (available for ext2/ext3/ext4, FAT12/FAT16/FAT32 and NTFS). With this option, only deleted files are recovered.
Now select the location where you want to save the recovered files. After choosing the directory location, press “C.”
After that, the recovery process will start.
After some time, when your recovery is finished, it will show the recovered file locations, as shown in the figure below.
Three files are saved in recup_dir folder. Let’s see.