Claude’s announcement about vulnerability detection capabilities got me thinking. As someone who uses Claude Code daily and has built deterministic analysis tools, I’ve seen both the incredible potential and the real limitations of LLMs in security work.
Let me be clear upfront: Claude Code has become indispensable in my workflow. The productivity gains are real. I can prototype security tools in hours that used to take days. However, regarding vulnerability detection specifically, we need to have an honest conversation about what these models can and can’t do.
The Context Window Challenge
Here’s a real scenario from last week. I was tracing a potential vulnerability through a microservices architecture. The data flow went something like:
User input → API Gateway → Auth service → Business logic (multiple services) → Database layer → Response transformation
Each hop involved different repositories, teams’ code, and coding styles. Even with 200k token context windows, loading all the relevant code paths means the model starts losing track of the connections. The research on attention degradation in large contexts isn’t theoretical—it shows up as missed edge cases in security analysis.
This isn’t a criticism of the technology – it’s just understanding its boundaries. When I use Claude Code for security work, I’ve learned to work with these constraints rather than against them.
The Code Generation Paradox (And How to Navigate It)
Yes, there’s something ironic about using the same technology that generates code to detect vulnerabilities in that code. My work on deterministic misalignment detection (sniff) came from recognizing this challenge. But here’s what I’ve learned: it’s not about choosing one or the other.
The workflow that works looks like this: LLM generates initial code → Static analyzers catch obvious issues → LLM helps fix them → Deterministic tools verify security properties → Human review for business logic. It’s collaborative, not competitive.
What’s Working in Production
After months of experimentation, here’s where LLMs truly shine in security work:
Pattern recognition in isolated code segments works remarkably well. Claude can spot SQL injection patterns faster than I can, especially in code I didn’t write. Test case generation has been a game-changer – I get comprehensive test suites that cover edge cases I might miss. The ability to explain complex vulnerabilities to developers who aren’t security experts? Invaluable.
We need traditional tools in complex taint analysis, temporal vulnerabilities like race conditions, and anything requiring deep semantic understanding of program state. But that’s fine – we don’t need LLMs to do everything.
The Token Economy Reality
Let’s also discuss the business side. Yes, every new feature means more token consumption. Comprehensive security scanning can get expensive quickly, but compared to the cost of a security breach or the salary of additional security engineers, it’s often a bargain. The key is being strategic about when and how you use these capabilities.
Using LLMs for initial triage and switching to specialized tools for deep analysis gives you the best ROI. You’re not burning tokens on problems that traditional tools handle better, but you’re also not missing the unique insights LLMs can provide.
A Practical Path Forward
The future of security tooling isn’t about choosing between LLMs and traditional approaches – it’s about intelligent integration. Here’s what’s working for me:
Use Claude Code to prototype security tools and generate initial implementations rapidly. Let it handle the pattern matching and obvious vulnerability detection. Bring in deterministic analyzers for semantic analysis and verification. Use LLMs again to explain findings and suggest remediations.
Most importantly, security analysis should be treated as a multi-tool job. No single approach—whether static analysis, dynamic testing, or LLM-based detection—catches everything. The magic happens when you combine them thoughtfully.
Embracing the Evolution
The security field has continually evolved with new tools and techniques. LLMs are just the latest addition to our toolkit. We must be cautious about their limitations, especially when dealing with complex vulnerability chains or semantic properties. But we shouldn’t let perfectionism prevent us from leveraging genuinely helpful capabilities.
Claude Code has made me a more effective security researcher. It helps me explore attack vectors faster, communicate findings more clearly, and build better tools. The key is understanding it as a powerful assistant rather than a replacement for security expertise.
As we move forward, the teams that will excel at security are those who can effectively combine LLM capabilities with traditional security tools and human expertise. It’s not about choosing sides – it’s about building better, more secure systems using every tool at our disposal.
What’s your experience? How are you integrating LLMs into your security workflow while maintaining appropriate caution? I’d love to hear about approaches that work (or not) in your environment.