The number of instances and severity of computer-based attacks such as viruses and worms, Trojan horses, logic bombs, and plagiarism of software source code has become of increasing concern. To deal with these problems, Forensic Analysts suggested that methods for determining the authorship of computer programs were necessary. This field is referred to as Software Forensics.
Moreover, Software Forensics is the field of Software Science aimed at authorship analysis of computer source code for legal purposes. It involves the areas of author identification, discrimination, and characterization. Authorship analysis, whether it is applied to written word or source code, is based on the assumption that authors develop an approach and style that is identifiable. Though there is no formal proof that a computer program has embedded within the characteristics of the author, it can be seen merely by looking at two code fragments that each author has his own style and methods.
Forensic Specialists attempt to determine if two or more code fragments are authored by the same programmer. This is certainly valuable information if security breaches are frequent. In this case, Software Forensics can assist to find the culprit.
For authorship disputes, the legal expert is needed who provides empirical evidence to demonstrate that two or more programs are written by the same programmer.
What are file formats needed to know for the exam?
In a practical programming system, there are some important file formats that a forensic analyst must know. Microsoft Windows operating systems offer a Portable Executable (PE) file format for its executable files.
A Typical PE’s .EXE File layout: Forensic Specialist can extract useful information (application metadata) by investigating the contents of .EXE file—its sections, headers, and binary block. Metadata consists of application name, version, release date, etc. This information is very helpful in digital investigation.
ASCII and UNICODE Extraction: Every programming language has a unique file extension for its source code file. For example, the source code written in C++ has a file type CPP, C has C, and Pascal has PAS. These files will certainly be in ASCII or UNICODE text formats. For header and configuration system, various systems use ASCII text format of type H.
Practically, most compilers don’t compile the source code directly to computer’s object code, but create a semi-compiled file format that allows the linking of this semi-compiled file to a library and other files before the creation of the final executable application. These semi-compiled files frequently have the OBJ file type. Library manager software is used to group OBJ files into library files of type LIB. The linker produced the final executable program of file format EXE or COM.
ASCII and Unicode strings are located within a separate text files of the program. During the investigation of such files, it is wise to extract all the readable ASCII and UNICODE contents from the file by using utilities such as strings. Often, text strings contain many pieces of useful information (such as the comments of a programmer) in a binary file.
Microsoft Corp v. Digital Research Inc. (DRI): Gary Kildall, software engineer in Digital Research Incorporated (DRI), created the operating system CP/M (Control Program for Microcomputers) in 1974 for personal computers even before the introduction of IBM and Apple machines. Allegedly, Microsoft Corporation copied the code of CP/M operating system for its DOS operating system. There were no forensic techniques available when this alleged theft had taken place. But now, it has been proved that no copying of code has occurred. The forensic analysis is made by comparing code, MS-DOS binary to CP/M source, with scientifically tested and advanced software forensic tools such as CodeSuite.
What are the types of traces/remnants and application debris in software forensics?
Eugene Spafford and Stephen Weeber, professors at Purdue University, propose that it might be feasible to analyze the remnants of software, typically the remains of a Trojan horse or virus, to identify its author. These remnants may take many forms, including programming language source files, shell scripts, executable code, object files, changes made to existing programs, or even a text file written by the attacker. Some remnants and application debris in software forensics, which are helpful for CCFE exam, are Registry Entries, Temporary Files, Spool Files, and Page Files.
Registry Entries: Various Registry values and settings could impact the examination and forensic analysis. Registry holds configuration information, recently accessed files, license data, and a wide range of other details about the installed software and system.
Temporary Files: When a user runs a program, for instance, a word processor, data may be temporarily stored on the hard disk. For example, Microsoft Word saves changes to a document at set intervals in a separate, temporary recovery file when the Auto Recover feature is turned on. These temporary files, not saved by a user, are useful for forensic analysts as they provide access to documents.
Spool Files: When a computer prints files, two files, spool files and shadow files, are created for each print job. On Windows XP/2003/Vista/2007/8, these files are located in Windows\system32\spool\printers. Both spool and shadow files contain information useful to forensic analysts.
Page file: When an operating system runs out of RAM, it writes some of the data that is in RAM to a file whose purpose is to cache RAM memory. This file is called a Page file and its name is pagefile.sys. When examining system, investigators check this file to find evidence of software security breach.
What types of Software Analysis are commonly used in forensics?
Software analysis is the preservation and study of computer-based evidence for discussion in Court. It includes the analysis of existing software products to verify whether their security is breached. Some necessary software analysis techniques helpful for the CCFE exam are:
Hash Analysis: A software forensics analysts run files through hash algorithm, a one-way formula that calculates a unique value—in a sense creating a digital fingerprint uniquely identifying a particular file. The most common hash algorithms are SHA-1 (Secure Hash Standard) and MD5 (Message Digest 5). These hash algorithms are used to derive hash values of individual files and compare them to known databases of hash values. In this way, forensic specialists can identify known files by their SHA-1 or MD5 hash. If they are known files, such as program files, they can be removed from further analysis. On the other hand, if they are known contraband files, they are quickly identified and bookmarked.
Signature Analysis: Most files have a unique signature or header that can be used by the application program or operating system to identify a file. Frequently, files have filename extensions to identify them as well, specifically in a Windows operating system. In many cases, signatures and file extensions should match, though there are a variety of circumstances and exceptions where there is a no match, anomalous results, or unknown information. Forensic specialists compare files, their extensions, and their headers to a known database of file signatures and extensions and report the outcomes.
Patterns: Software analysis can be carried out at many levels. Because programmers solve problems with regular patterns, forensic analysts could analyze the code’s semantic structure to find structures and repeating patterns. For instance, the actions that might be taken on the discovery that some fatal errors have occurred might vary considerably from programmer to programmer and from program to program. Methods for defining the semantics of programming languages and programs, such as axiomatic semantics and denotation semantics, could be used to find identifying patterns in semantic structure or program logic.
Forensic Specialists can also find repeating patterns by analyzing the executable behavior of the program, finding data flow patterns, or by looking at user interfaces.
Computer Forensics Training
Statistical Analysis: Statistical techniques are usually used to discern trends, correlations and frequencies from data collected out of written text or source code in an attempt to establish authorship style. Data is gathered from an analysis of:
Mean Program line length (character per line)
The name length of mean local variable, mean global variable, and mean function.
Use of conditional compilation
Does the programmer employ comments that are nearly an echo of the code?
Type of function parameter declaration: Does the user use the standard format (ANSI C format)?