Conference Talk Preview: LLM-Powered Type Inference for Better Static Application Security Testing

By Lukas Seidel

Coding in dynamic languages like JavaScript and Python is fun and allows for fast iterations, but it comes with a cost. Without proper type information, developers are missing out on the ability to catch bugs early and get helpful IDE support. But the absence of properly typed variables makes life tricky not only for the people writing the code but also for security professionals and automated static analysis tools.

Enter CodeTIDAL5: The Game-Changer

We’ve built CodeTIDAL5, a Transformer-based machine learning model that predicts type annotations for JavaScript and TypeScript and fills in type information where it is missing. It beats the current best models by 7.85%, clocking an overall 71.27% accuracy rate. Unlike most approaches, it shines where type inference is most needed: in predicting user-defined types. We use CodeT5+ [1] as the base for our Large Language Model and fine-tune that on vast amounts of annotated TypeScript code. By learning how developers write their code, how they name their functions and variables and how they use their objects, the model builds an understanding of what kind of syntax implicates what types.

JoernTI: The Perfect Integration Our model doesn’t just sit in a lab; it’s integrated into Joern, our popular open-source static analysis tool and foundation of securing your applications. This combination, known as JoernTI, lets you use the inferred type information in your static analysis tasks, contributing to more effective and comprehensive results.

The Takeaway

CodeTIDAL5 offers state-of-the-art type inference, especially for user-defined types
JoernTI integrates this into practical static analysis workflows
Our approach significantly improves dataflow recovery, giving you a more complete understanding of your code’s behavior

Academic Publication Our work was accepted at the peer-reviewed 28th European Symposium on Research in Computer Security (ESORICS) in The Hague, one of the top cybersecurity conferences in the world, where we will present our results from the 25th to the 27th of September.

For those wanting to dive deeper, check out the full preprint of our academic paper where we get into the details of how we achieved these advancements: https://davidbakereffendi.github.io/assets/pdf/preprint_6676_ESORICS23.pdf

Or check out our reference implementation and try out JoernTI today! https://github.com/joernio/joernti-codetidal5

[1] https://github.com/salesforce/CodeT5

About Qwiet AI

Qwiet AI empowers developers and AppSec teams to dramatically reduce risk by quickly finding and fixing the vulnerabilities most likely to reach their applications and ignoring reported vulnerabilities that pose little risk. Industry-leading accuracy allows developers to focus on security fixes that matter and improve code velocity while enabling AppSec engineers to shift security left.

A unified code security platform, Qwiet AI scans for attack context across custom code, APIs, OSS, containers, internal microservices, and first-party business logic by combining results of the company’s and Intelligent Software Composition Analysis (SCA). Using its unique graph database that combines code attributes and analyzes actual attack paths based on real application architecture, Qwiet AI then provides detailed guidance on risk remediation within existing development workflows and tooling. Teams that use Qwiet AI ship more secure code, faster. Backed by SYN Ventures, Bain Capital Ventures, Blackstone, Mayfield, Thomvest Ventures, and SineWave Ventures, Qwiet AI is based in Santa Clara, California. For information, visit: https://qwiet.ai

About Qwiet AI

Subscribe to newsletter

Services

Company

Platform

Resources