Cyber-attacks are becoming stealthier and with the ability to remain dormant for long periods of time. Some of the ways malware stays hidden include changing their behavior dynamically and autonomously to avoid detection.It can dynamically generate new communication channels known to the malware and the attacker, or sacrifice compromised assets by triggering an alert and trick victimized companies and the security analysts into feeling safe — unaware that the threat persists on other more critical devices. These sophisticated attacks can only be identified through the use of more data. Unfortunately, the more data, the more difficult the analysis.
Analysts are overwhelmed as they wade through terabytes of logs to find, confirm, and mitigate cyber threats, so they often miss the signs and leave the organization open to attacks.

Cluster analysis, a type of unsupervised machine learning, enables companies to solve this problem. By uncovering hidden patterns and structures in large sets of data, it is able to identify indicators of compromise that remain hidden to analysts.

Using machine learning for cyber protection


Cluster analysis is like asking an analyst to build a report and cherry-pick all relevant evidence, logs, for each potential threat in all the terabytes of event-logs. By using a machine to build the report, you remove the constraints of the human analyst, and empower him. The machine can go over every cluster, cluster the complete picture, and quickly decide whether the pattern is malicious or benign. Applying clustering allows for more comprehensive data analysis than has ever been possible. Since each cluster contains the wealth of multidimensional information describing the full scope of the attack, we can analyze a cluster for indications of compromise, anomalies, policy violations, and much more. Cluster-wide analysis is far more accurate than other approaches. The machine learning clustering process is a continuous data-driven process that automatically builds, shuffles, and updates the cluster to be accurate, complete, and precise. The analysis and detection is more accurate since there’s far more context to work with: relevant data put in one place, with irrelevant and distracting data (noise) that leads to detection fatigue is carefully out of view. Putting it another way, detection algorithms are sensitive to “garbage in, garbage out” issues.

Clustering results using different algorithms on the same datasets


SecBI can discover many advanced threats that can only be detected at the cluster level. In one recent example, SecBI was able to detect a fragmented exfiltration of several infected devices, in which the attacker used multiple servers under their control to send small chunks of data to each server without crossing predefined server thresholds. During the attack, 5GB of data was extracted to multiple destinations. However, once deployed, SecBI was able to easily detect the attack due to the multiple indicators of compromise:

  • Large total upload in a single cluster
  • Multiple servers accessed by only a few machines, at a time when other machines didn’t access these servers at all
  • Beaconing behavior to multiple servers
  • Machine-like behavior to most destinations: similar upload size, similar response size, mostly direct connections, etc.

It’s only possible to detect these indicators by looking at the entire network using cluster analysis.


As we’ve shown in this post, cyber-attacks are becoming more and more sophisticated, making it harder to detect them. The cybersecurity industry must find new methods of detection to outsmart the attackers. By applying methods of data analysis such as machine learning and cluster-based analysis, it becomes possible to easily sift through huge volumes of data and identify where the new threats exist. SecBI has developed a solution that uses machine learning and cluster analysis to protect organizations from the next generation of cyber attacks.

Further Reading

To read the full white paper:” Using machine learning for Comprehensive Threat Detection” click here.