Extracting CTI From Cyberthreat News
Cyberthreat News reflects the activities of hackers, including their methods and tactics. This information is useful for IT organizations in creating effective tools to counter malicious attempts. However, the information portrayed in Cyberthreat News may not be complete.
As cyberattacks become more frequent, cybersecurity researchers and practitioners are increasingly publishing content to inform others. Such content can be found on social media, blogs, and forums. The focus is typically on ransomware attacks, but may also be overshadowed by allegations of Russian interference in the U.S. Elections and other issues.
Cyberthreat news and articles are often accompanied by a discussion on how to prevent attacks. These articles also provide insight into the threats’ distribution and the threat actor’s motives. By extracting CTI from these texts, security experts can gain a better understanding of the attack landscape.
In order to extract CTI, security researchers have explored several automated techniques. These include text clustering, topic modeling, and reinforcement learning. Clustering is the process of aggregating texts based on similarity. Text segments are aggregated based on similarity through a number of techniques such as k-means, ainity propagation, and hierarchical clustering.
Topic modeling is a technique that can identify abstract topics from text. Its benefits include generating topic words, identifying STIX vocabulary, and discovering trending cybersecurity topics.
Researchers can generate a threat alert based on the probability of occurrence of a certain term in a text. To do this, researchers can weight terms according to their frequency and time window.