Malware Communication Analysis



Observing malware behavior is crucial to develop mitigation mechanism, for instance creating signatures. Beside the hardness of analyzing binary files, packing and different obfuscation techniques makes it nearly impossible to understand the semantic by looking at the file content only. For that analysts run specimens in an emulated environment that has to be properly contained to avoid harming others. Unfortunately, malware writers keep developing techniques to frustrate analysis, mostly detecting virtual machines. But this is not what we are trying to solve. Recent malware, e.g bots, behavior is triggered by command-and-control (C&C), therefore, access to internet is essential to exhibit malicious behavior or to download the melicious binaries. Changing emulator states or input values would not force the program to exhibit the malicious behavior since it is dependent on external information e.g binary code or decryption keys. This gives an idea on how essential the C&C communication is for correctly emulating samples, but it is too risky to allow all communication. Our goal is to devise an automated general method for generating accurate containment policies for malwares. By that we mean, only harmless network flows that is necessary for executing the binary are allowed to the external word.

Obvious solution is to deny all traffic, and impersonate any services samples need to reach. However, it is not uncommon for malicoius behavior to be remotly triggered. Even harder, malware writers make this even harder by encrypting portion of the code that deals with responses using a key derived from the response itself. Others would check for subtle differences such as a special banner and frustrate the analysis. Those communications to the master's that are important for behaving maliciously should be allowed.

Another popular technique is to start with denying all traffic and interactively allow traffic after manually investing each of them. This approach is very slow and error-prone. Distinguishing between attacks and C&C communication is not an easy task. Others, hypothesized that C&C hosts are intentionally not popular and used search engines to classify end-point's popularity. Although, a valid hypothesis, but C&C communication need not to be HTTP, they could be based on IRC or another protocol, maybe email. There are similar effort to detect intrusion in networks These solution requires learning and their goal is to defend against attack. In our case, we are creating malicious activities and we don't tolerate false-positives.

Soultion Description: Our approach combines dynamic and static analysis of samples to find correlation between connection flows and their effect on the execution flow. By analyzing the affect of network communications on the execution flow and program state, we can infer who if the content of the flow is controlled by the master and relation-ship between end-points. The goal is to give the illusion of running in the wild. Some bots need to contact their masters to receive commands or download malicious code. Others, do some connectivity checks before starting. There are two way to give such illusion: Impersonation and allowing harmless activity. We would impersonate the bot-master and any other remote services, the sample might try to reach. If we cannot do that , we allow harmless communications that malicious behavior would only revealed by them.

Once we allow traffic out of our framework it is impossible to guarantee that there will be no risk to the Internet.

We manage this risk by using the following four-step containment approach to hand each malware communication attempt:

  1. Contain it and evaluate if it is a necessary communication for malware
  2. If necessary, redirect it to a Smart Impersonator. Try Random Impersonator at first. If that does not expose sufficient malware behaviors switch to a Custom Impersonator if available.
  3. If Custom Impersonator is not available, run Symbolic Execution Engine to build the Custom Impersonator.
  4. If Custom Impersonator cannot be built (i.e., malware communication is unforgeable) let the communication out to the Internet and observe.

To evaluate if a communication is necessary for malware we collect several measures of malware activity:

  1. Number of system calls.
  2. Number of unique system calls.
  3. Entropy of system calls.

System Components:

  1. Activity Meter: Measures the progress of the execution by recording system call.
  2. Service Emulator: Emulate a number of common internet services: DNS, HTTP, IRC, SMTP.
  3. Symbolic-Execution Engine: Symbolic-Execution Engine: Engine follows the program assuming symbolic values instead of input received from network. The engine arrives at expression of those symbols instead of concerete values. Values that satisfies those expression tells which real values would satisfies several conditianl branches in the program.
  4. Gateway: All traffic passes through the gateway. The gatewayhas the ability to drop, rate-limit, or redirect traffic to the emulator.