We Know It Before You Do: Predicting Malicious Domains

Today at the 2014 Virus Bulletin International Conference (VB2014) in Seattle, Palo Alto Networks is presenting a paper entitled “We Know It Before You Do: Predicting Malicious Domains.” We’re excited to share the key points of our paper and presentation here for everyone who couldn’t see it in person.

VB2014-dates-web
 

Malicious domains are key to the success of nearly all popular attack vectors, supporting malware distribution, command and control (C2) server hosting and traffic distribution. Most modern domain reputation systems are designed to detect and block malicious domains based on observation of suspicious activity. This activity can include detection of malicious content (e.g., malware, web pages with exploit code, web pages with drive-by downloads, etc.) and observed behavior (e.g., communication with infected hosts to collect private information, launch attacks, etc.).

To bypass such defenses, an increasing trend is that many malicious domains are only used for a very short period of time. There are two factors that make this practice appealing to an attacker:

  • Evading detection: In some cases, the malicious content/behavior is only present/exhibited for a short time, making it difficult for detection systems to flag it based on reputation. Another danger related to timeliness is that these domains may have already served most of their malicious purpose by the time their activity is detected; blocking the domain at that time has diminished value, as the attack was already successful at least once.
  • Low operational expense: The cost of registering a domain and setting up a server to host its content has decreased considerably over the years. For example, registering a common ‘.info’ domain with a domain name registrar such as GoDaddy.com costs only $10 USD per year.

To solve this problem, we propose a system that predicts the domain names that are most likely to be used for malicious purposes. Predicted malicious domains can then be proactively blocked prior to or at their initial malicious use. This approach leverages our knowledge of the malicious domain lifecycle and research into the connections and patterns exhibited by malicious domains.

As an example, we discovered that before a malicious domain can be used, attackers must complete multiple actions in order to activate the domain. These actions leave traces in different types of publicly available data feeds. By identifying the traces related to the preparation of or initial use of malicious domains, we can predict and/or provide early warning for malicious domains and apply effective proactive blocking.

Specifically, our paper addresses the following:

  • Proposal of a novel system to predict the domains that will be used by attackers for varying malicious purposes.
  • Design of a Domain Generation Algorithm (DGA) system to automate the prediction of future malicious domain names.
  • Analysis of patterns for the re-use of malicious domains.
  • Description of connections between malicious domains that were used by attackers across time.
  • Exploration of temporal patterns in DNS queries for malicious domains, which were exhibited prior to their use.
  • Application of the DGA system to evaluate its effectiveness in predicting malicious domains.

Our evaluation is based on a large and diverse set of data, and the results suggest that these methods can predict malicious domains and could be used to effectively prevent attacks through proactive measures. We’re excited to present our research at VB2014 and engaging the community further on this topic. If you would like to see more, you can access the full paper here.