Observing malware behavior is crucial to develop mitigation mechanism, for instance creating signatures. Beside the hardness of analyzing binary files, packing and different obfuscation techniques makes it nearly impossible to understand the semantic by looking at the file content only. For that analysts run specimens in an emulated environment that has to be properly contained to avoid harming others. Unfortunately, malware writers keep developing techniques to frustrate analysis, mostly detecting virtual machines. But this is not what we are trying to solve. Recent malware, e.g bots, behavior is triggered by command-and-control (C&C), therefore, access to internet is essential to exhibit malicious behavior or to download the melicious binaries. Changing emulator states or input values would not force the program to exhibit the malicious behavior since it is dependent on external information e.g binary code or decryption keys. This gives an idea on how essential the C&C communication is for correctly emulating samples, but it is too risky to allow all communication. Our goal is to devise an automated general method for generating accurate containment policies for malwares. By that we mean, only harmless network flows that is necessary for executing the binary are allowed to the external word.
Obvious solution is to deny all traffic, and impersonate any services samples need to reach. However, it is not uncommon for malicoius behavior to be remotly triggered. Even harder, malware writers make this even harder by encrypting portion of the code that deals with responses using a key derived from the response itself. Others would check for subtle differences such as a special banner and frustrate the analysis. Those communications to the master's that are important for behaving maliciously should be allowed. Another popular technique is to start with denying all traffic and interactively allow traffic after manually investing each of them. This approach is very slow and error-prone. Distinguishing between attacks and C&C communication is not an easy task. Others, hypothesized that C&C hosts are intentionally not popular and used search engines to classify end-point's popularity. Although, a valid hypothesis, but C&C communication need not to be HTTP, they could be based on IRC or another protocol, maybe email. There are similar effort to detect intrusion in networks These solution requires learning and their goal is to defend against attack. In our case, we are creating malicious activities and we don't tolerate false-positives. Soultion Description: Our approach combines dynamic and static analysis of samples to find correlation between connection flows and their effect on the execution flow. By analyzing the affect of network communications on the execution flow and program state, we can infer who if the content of the flow is controlled by the master and relation-ship between end-points. The goal is to give the illusion of running in the wild. Some bots need to contact their masters to receive commands or download malicious code. Others, do some connectivity checks before starting. There are two way to give such illusion: Impersonation and allowing harmless activity. We would impersonate the bot-master and any other remote services, the sample might try to reach. If we cannot do that , we allow harmless communications that malicious behavior would only revealed by them.