With evolving threats, securing IT infrastructure remains a challenge. Traditional methods like Cyber Threat Intelligence (CTI) sharing and SIEM systems detect attacks but often rely on prior breaches. Yet, emerging threat discussions exist in open-source repositories, security advisories, blogs, and the dark web. Manually analyzing these is time-consuming and inefficient. Philipp Kühns dissertation tackles this by automating cybersecurity intelligence extraction using NLP, ML, and data mining. Key contributions include an LLM for classification, a CTI taxonomy, a prototype for novel threat sources, and dark web analysis. The research highlights clear web intelligence sufficiency, dark web barriers, and ML-based quality improvements. Clustering techniques reduce data overload by over 90%, enhancing efficiency. Automating intelligence tasks boosts security experts’ effectiveness, enabling proactive cybersecurity strategies.

On March 26, 2025, Philipp Kühn successfully defended his doctoral thesis, earning the title of Dr.-Ing. at the Department of Computer Science, Technical University of Darmstadt.

The entire PEASEC team extends its heartfelt congratulations to our new *Dr. -Ing.* Philipp Kühn!

His dissertation was supervised by Prof. Dr. Dr. Christian Reuter, who also served as the first referee. Prof. Dr. Harald Baier from the University of the Bundeswehr Munich acted as the second referee. The examination committee was chaired by Prof. Dr. Thomas Schneider, Cryptography and Privacy Engineering (ENCRYPTO) and included Prof. Dr. Iryna Gurevych, Ubiquitous Knowledge Processing (UKP).

Proactive Cyber Threat Intelligence: Automating the Intelligence Life Cycle based on Open Sources

Securing IT infrastructure is a tremendous task. With ever-changing infrastruc- tures and threat landscapes, there is a constant need to stay ahead of time. To cope with this problem different methods are implemented, e.g., Cyber Threat In- telligence (CTI) sharing or Security Information and Event Management (SIEM) systems which indicate current attack campaigns and threats. However, they usu- ally require prior breaches to yield actionable information and are otherwise just distractions which call for attention, while, the web already contains discussions of upcoming threats. This may be an issue in a mailing list or version control system of Open-Source Software (OSS), vendors’ security advisories, blogs or forums, or a rally in the dark web. When embedding CTI into the intelligence service’s intelligence cycle, there even remains a gap of necessary functionalities. This results in a constant search for current information, usually a tedious and time-consuming manual task, and underdeveloped analytical capabilities of CTI software. And to top it all of, when reliable sources are identified, the result is usually a constant stream of information, leading to information overload – while the search for new sources does not stop due to the web’s dynamics.

This thesis explores automated methods to extract actionable cybersecurity related information from official (usually structured) and unofficial (usually unstructured) sources, to foster defensive decisions. While CTI sharing is mainly reactive (a prior breach of partnering organizations is necessary), this thesis strives to explore proactive elements, i.e., how to systematically include open sources in the context of CTI to provide Cyber Situational Awareness (CSA). To do so, I make use of various methods from the research fields of Natural Language Processing (NLP), machine learning (ML), crisis informatics, and data mining.

My contributions for the field of CTI are three-fold and range from conceptual over information-based contributions to analysis-based ones. I proposed a large language model (LLM) for the field of CTI with the goal to ease NLP based downstream tasks like classifications, which surpassed the state-of-the-art. This is followed-up by a taxonomy for CTI information, a prototype implementation to identify novel CTI sources, and a qualitative dark web analysis. Both latter proposals show the trade-off of crawling openly available information sources. For CTI information the clear web usually suffices as the dark web implements technological hindrances like CAPTCHAs. My analysis of industry standard CTI sources outlines signs of bad information quality which can be improved by common ML methods. Those methods provide suggestions to experts, which guide them during vulnerability assessment, and works on openly available sources like blogs or security advisories. Information overload is addressed by clustering techniques to provide an initial overview for experts, achieving over 90 % dimensional reduction across datasets, with deep clustering reaching 0.88 homogeneity for security bug reports (SBRs) and hierarchical clustering attaining 0.51 for advisory data.

In conclusion, the automation of various tasks in the daily work of security experts is a viable proposition that could prevent overlooking critical information and the onset of occupational burnout. By automating monotone and repetitive tasks of their daily workload, it may also enhance the depth and complexity of their work.

 

Selected Publications within the PhD thesis:

  • Bayer, M., Kuehn, P., Shanehsaz, R., Reuter, C., “CySecBERT: A Domain- Adapted Language Model for the Cybersecurity Domain,” ACM Transac- tions on Privacy and Security, Mar. 15, 2024. doi: 10.1145/3652594
  • Kuehn, P., Bäumler, J., Kaufhold, M.-A., Wendelborn, M., Reuter, C., “The Notion of Relevance in Cybersecurity: A Categorization of Security Tools and Deduction of Relevance Notions,” in Workshop-Proceedings Mensch und Computer, Darmstadt: Gesellschaft für Informatik, 2022. doi: 10.18420/muc2022-mci-ws01-220
  • Kuehn, P., Nadermahmoodi, D., Bayer, M., Reuter, C., “Bandit on the Hunt: Dynamic Crawling for Cyber Threat Intelligence,” version 3, arXiv, p. 6, Jan. 17, 2025. doi: 10.48550/arXiv.2304.11960
  • Kuehn, P., Wittorf, K., Reuter, C., “Navigating the Shadows: Manual and Semi-AutomatedEvaluationoftheDarkWebforCyberThreatIntelligence,” IEEE Access, vol. 12, pp. 118903–118922, 2024. doi: 10.1109/ACCESS. 2024.3448247
  • Kuehn, P., Bayer, M., Wendelborn, M., Reuter, C., “OVANA: An Ap- proach to Analyze and Improve the Information Quality of Vulnerability Databases,” in ARES ’21: Proceedings of the 16th International Con- ference on Availability, Reliability and Security, ACM, 2021, p. 11. doi: 10.1145/3465481.3465744
  • Kuehn, P., Relke, D. N., Reuter, C., “Common vulnerability scoring system prediction based on open source intelligence information sources,” Com- puters & Security, 2023. doi: 10.1016/j.cose.2023.103286
  • Kuehn, P., Nadermahmoodi, D., Kerk, M., Reuter, C., “ThreatCluster: Threat Clustering for Information Overload Reduction in Computer Emer- gency Response Teams,” version 2, arXiv, Mar. 15, 2024. doi: 10.48550/ arXiv.2210.14067

 

Projects:

 

Further News about PhD Defences

Proactive Cyber Threat Intelligence: Automating the Intelligence Life Cycle based on Open Sources – Congratulations to *Dr. -Ing.* Philipp Kühn on His Successful Doctoral Defense