Extracting CTI From Cyberthreat News
Increasingly, cyberthreat related news is being published by cybersecurity practitioners. These articles often cover cutting edge attack techniques, attack prevention guidelines, and malware descriptions. Increasingly, cybersecurity practitioners also post content on social media. These posts are frequently used for CTI extraction, cybersecurity-related keyword extraction, and event identification.
Researchers have explored automated extraction of CTI information from textual sources. They have used text clustering to combine text segments that have similarity, and they have used reinforcement learning to extract semantic relationships among cyberthreat entities. These techniques are important for identifying cutting-edge cyberattacks.
A survey of cybersecurity researchers investigated the methods used to extract CTI from textual sources. Researchers identified various cyberthreat actors, their tactics, and their resources. Researchers also explored techniques for identifying potential threats to critical infrastructures. These techniques include clustering, unsupervised learners, topic modeling, and reinforcement learning.
Clustering techniques include k-means, hierarchical, ainity propagation, and DBSCAN. Clustering techniques can be applied to aggregate texts, and they can be used to classify texts by categories. The ainity propagation technique is the most commonly used technique.
Reinforcement learning is a learning technique that involves trial and error by using user-generated feedback. Researchers can use the information to generate alerts. These alerts can be generated by aggregating information, using user-defined rules, or by generating thresholds.
Researchers can also apply clustering techniques to events of specific types. These events can be indentified with keyword weighting, time window, and probability of occurring. Using these techniques, researchers can analyze attacker techniques, resources, and motives.