By Lukas Seidel
Coding in dynamic languages like JavaScript and Python is fun and allows for fast iterations, but it comes with a cost. Without proper type information, developers are missing out on the ability to catch bugs early and get helpful IDE support. But the absence of properly typed variables makes life tricky not only for the people writing the code but also for security professionals and automated static analysis tools.
Enter CodeTIDAL5: The Game-Changer
We’ve built CodeTIDAL5, a Transformer-based machine learning model that predicts type annotations for JavaScript and TypeScript and fills in type information where it is missing. It beats the current best models by 7.85%, clocking an overall 71.27% accuracy rate. Unlike most approaches, it shines where type inference is most needed: in predicting user-defined types. We use CodeT5+ [1] as the base for our Large Language Model and fine-tune that on vast amounts of annotated TypeScript code. By learning how developers write their code, how they name their functions and variables and how they use their objects, the model builds an understanding of what kind of syntax implicates what types.
JoernTI: The Perfect Integration Our model doesn’t just sit in a lab; it’s integrated into Joern, our popular open-source static analysis tool and foundation of securing your applications. This combination, known as JoernTI, lets you use the inferred type information in your static analysis tasks, contributing to more effective and comprehensive results.
The Takeaway
- CodeTIDAL5 offers state-of-the-art type inference, especially for user-defined types
- JoernTI integrates this into practical static analysis workflows
- Our approach significantly improves dataflow recovery, giving you a more complete understanding of your code’s behavior
Academic Publication Our work was accepted at the peer-reviewed 28th European Symposium on Research in Computer Security (ESORICS) in The Hague, one of the top cybersecurity conferences in the world, where we will present our results from the 25th to the 27th of September.
For those wanting to dive deeper, check out the full preprint of our academic paper where we get into the details of how we achieved these advancements: https://davidbakereffendi.github.io/assets/pdf/preprint_6676_ESORICS23.pdf
Or check out our reference implementation and try out JoernTI today! https://github.com/joernio/joernti-codetidal5