Supporting Live and Versatile Malware Analysis in Testbeds



We advocate for publicly accessible live malware experimentation testbeds. We introduce new advancements for high-fidelity transparent emulation and fine-grain automatic containment that make such experimentation safe and useful to researchers, and we are working on a complete, extensible live-malware experimentation framework. Our framework, aided by our new technologies, facilitates a qualitative leap from current experimentation practices. It enables specific, detailed and quantitative understanding of risk, and safe, fully automated experimentation by novice users, with maximum utility to the researcher.

Above Figure shows our proposed architecture for live malware experimentation on a public testbed. While the layout is similar to other frameworks, the functionalities of the highlighted components (shown in blue in the Figure) are novel and provide a qualitative advancement.

In our framework, all malware communication with the Internet is examined and contained by a dedicated Warden machine. The Warden sits between the Inmate Network where malware executes on testbed machines and the outside world. Using a fine-grained firewall and policy engine, the Warden chooses one of the following actions to take with malware traffic: (1) drop, (2) rewrite, (3) rate-limit, (4) forward on to the In- ternet and (5) redirect to Smart Impersonator services which mimic public Internet servers.

In addition to the firewall and policy engine, the Warden supports monitoring and data persistence functionalities. These functionalities are supported through continuous collection and storing of network traces to capture all communication exchanged between the framework and the Internet. The Warden also keeps experimental history for each experiment information such as the testbed user, the malware studied, the experimental environment, etc. The Inmate Network consists of a mix of machines, some of which run VM software (e.g., QEMU, VMWare, OpenVZ and Xen), while others are bare metal machines. Since malware limits its be- havior if it detects virtualization, machines that run VM images would also run our Hi-Fidelity Emulator (HFE) to defeat virtualization checks.

Hi-Fidelity Emulation

The challenges for hi-fidelity emulation are:

  1. create a comprehensive list of differences between a VM and a bare metal machine.
  2. create "lies" to hide these differences.
For more information, take a look at Cardinal Pill Testing of System Virtual Machines

Managing Malware's External Communication

Malware's external communication must be tightly managed to balance the utility of experimentation to the researcher with the risk external communication poses to the Internet. To support a range of testbed users, with differing experimental needs, traffic from each experiment needs to be subject to its own set of policies affecting external communication. But knowledge about malware behavior learned in one experiment, is shared between experiments. This allows for evolution of policies from more restrictive to permissive as testbed learns what to expect from each malware's communication. We observe that once we allow traffic out of our framework it is impossible to guarantee that there will be no risk to the Internet. Even the most benign looking traffic, such as a single HTTP GET message, can be malicious if generated by a multitude of machines, simultaneously, to overwhelm a victim destination. We manage this risk by using the following four-step containment approach to hand each malware communication attempt:

  1. Contain it and evaluate if it is a necessary communication for malware
  2. If necessary, redirect it to a Smart Impersonator. Try Random Impersonator at first. If that does not expose sufficient malware behaviors switch to a Custom Impersonator if available.
  3. If Custom Impersonator is not available, run Symbolic Execution Engine to build the Custom Impersonator.
  4. If Custom Impersonator cannot be built (i.e., malware communication is unforgeable) let the communication out to the Internet and observe.

To evaluate if a communication is necessary for malware we collect several measures of malware activity:

  1. Number of system calls.
  2. Number of unique system calls.
  3. Entropy of system calls.

For more information, take a look at: Malware Communicatoin Analysis