Must-See Webinar: Agentic Pentesting with Reflectiz Offensive Hub
Ysrael Gurt is the CTO and co-founder of Reflectiz. In a new webinar hosted by Senior Product Marketing Manager Maayan Sulami, he explains that while hunting flaws in code is still important, exploring every possible set of circumstances under which it can execute is the crucial missing piece for your web security, and it’s what makes Offensive Hub so essential.
The Flaw That Wasn’t in the Code
Ysrael worked as an ethical hacker for many years, and it was here that the seeds for the new pentesting agent were sown. Early on, he found a way to read other people’s messages inside Facebook Messenger. But discovering such a severe flaw wasn’t nearly as surprising as discovering where it lived, because it wasn’t in the code at all. It was in the wiring, in the way one system passed a header to the next, which passed it to the next, until the combination produced an opening no single component was responsible for. The team chasing it had a hard time locating the root cause precisely because they kept looking in the code, and the code was fine. The vulnerability was something that could only show up in the live system.
Experiences like this made Ysrael wary of a comforting assumption that’s baked into a lot of security work: that if you read the source carefully enough, you’ll find the flaws. A growing class of AI tools still leans into this approach. The idea is that when you point them at a codebase, they will surface the bugs that a human reviewer would take days to spot.
While he is the first to admit that such tools have their place, he also realized that this approach has a limited scope. As he explains in the webinar, a real application doesn’t just run on a page. It runs behind a specific server, behind a specific firewall, threaded through proxies, load balancers, and third-party services, and talks to internal systems that weren’t in the file you reviewed.
So, just as wetness is an emergent property of water that you never could have anticipated by studying hydrogen and oxygen alone, the behavior that gets you breached is often an emergent property of your living web infrastructure. It’s the live interaction of those parts that can produce vulnerabilities, in ways you never could have anticipated just by checking lines of code. The exposures only appear when the system is running.
Why this breaks the way we usually test
If real-world behavior only emerges under real-world conditions, then testing has to happen against the live, running application, not its source, and not a sanitized staging clone that’s missing half the wiring. This kind of outside-in testing comes in two grades that the industry calls black-box and gray-box: attacking the app the way a real adversary would, with either no inside knowledge at all (black-box) or a little, such as valid login credentials (gray-box). What unites them, and separates them from white-box source review, is that they test the application as it actually runs.
This is the gap that traditional penetration testing and source-reading tools leave open, and LLMs are making it worse.
They can now write and ship code at a pace no quarterly review can keep up with, and increasingly, that code is written by people who aren’t developers at all. The result, as the webinar puts it, is a jungle of new endpoints and new logic appearing constantly, much of it never seen by a security team. A penetration test that takes four weeks to produce can already be past its use-by date before it’s even been delivered.
LLMs can also make experts of amateurs. AI coding assistants have upskilled everyone, which means a teenager with bad intentions and little experience can now find the one weakness in your website that your last test happened to miss. As Ysrael explains, this is why 80% coverage is no longer good enough. You may be defending with a similar tool, maybe Claude Code, and if it chases the vulnerabilities it finds interesting and skips the rest, someone with a slightly different prompt will walk straight through the gap. That’s why you need 100%, every time, and it’s why expert humans and a raw LLM like Claude Code can’t give that to you, but Offensive Hub agentic testing can.
What an agent does that a person can’t
A human pentester is brilliant and creative, but also tired, busy, expensive, hard to book, and quietly biased toward the attacks they personally enjoy tackling. Ysrael admits he always reached for cross-site scripting first, which means other categories got less of his attention. That’s not a failing; it’s just being human.
By contrast, an agent has no favorites, and it doesn’t have off days. Offensive Hub works the way a skilled human hacker does, exploring the live site, watching how it actually behaves, then attacking, but without the human limitations. It won’t quietly skip the dull attack types in favor of the interesting ones, it won’t tire on the hundredth endpoint, and it won’t hand your team a pile of maybes to sort through.
And that discipline isn’t theoretical. Offensive Hub is already running against live customer sites, and on one, it surfaced exactly the kind of flaw a time-boxed human review skips: a CAPTCHA that could be reused. It’s the kind of small flaw attackers can build into something bigger, and one that was missed by everyone until an agent that doesn’t went looking.
The mechanics behind Offensive Hub’s approach (how it forces full coverage, and how it proves a finding is real before anyone sees it) are well worth hearing Ysrael describe for himself in the webinar. But the capability that’s hardest to forget is the one that mirrors everything above: an agent can take several small, individually unremarkable findings and chain them into a single serious exploit. It’s emergence again, only this time, it’s working for you.
The point here isn’t that machines replace people. It’s that the systematic, exhaustive, full-coverage baseline, the part humans genuinely cannot sustain, is finally something you can run continuously, at the speed your code actually changes.
Watch the rest
There’s a lot more valuable discussion in the webinar. You’ll learn:
- Why even the best human pen-tester can’t give you what you need, and it’s not about skill.
- The secret sauce: how Offensive Hub is built to think and attack like a human hacker, not crawl your code like a scanner.
- Why traditional pen-testing is only ever “best effort,” never guaranteed coverage.
- How a deterministic process forces the agent through every endpoint and every attack type, so 80% coverage never quietly becomes your blind spot.
- Why you get confirmed and validated findings instead of a pile of false positives to wade through.
- What the four intelligence tiers (scanner, junior, senior, and expert) actually do, and when to reach for each.
- How custom rules let the agent interview you and aim a scan where it matters most: your checkout flow, your login page, and your newest code.
- The guardrails that let you turn an unpredictable agent loose on a live production site without it taking down your database.
- How offensive, security, and privacy testing feed a single dashboard for a 360° view of your web risk.
But if you take one idea away before you press play, make it this one: you cannot know what your application is capable of by only looking at its code. It takes the right conditions, and you’d rather be the one who creates those conditions first.
[Watch the full webinar with Ysrael Gurt]
FAQs
Can a raw LLM like Claude Code replace a penetration test?
No. A raw LLM follows its own interests when hunting vulnerabilities and delivers inconsistent coverage between runs. Expert humans face the same limitation of selective focus and limited speed. Neither can guarantee 100% coverage of a live application on every test.
How do AI coding assistants increase the risk of attack?
AI coding assistants have upskilled everyone, including attackers. A teenager with bad intentions and little experience can now find the one weakness your last test missed. This lowers the barrier to entry for exploitation and raises the standard your defenses must meet.
How do LLMs make traditional penetration testing less effective?
LLMs now write and ship code at a pace no quarterly review can match, and much of that code is written by people who are not developers. The result is a constant stream of new endpoints and new logic that security teams never see. A penetration test that takes four weeks to produce can be outdated before it is delivered.
What is the difference between black-box and gray-box testing?
Black-box testing attacks an application the way a real adversary would, with no inside knowledge at all. Gray-box testing uses limited inside knowledge, such as valid login credentials. Both test the application as it actually runs, which separates them from white-box source review that only examines code.
What makes Offensive Hub different from AI code review tools?
AI code review tools point at a codebase and surface bugs in the source. Offensive Hub tests the live, running application from the outside, the way a real attacker would. It covers the emergent behavior that only appears when all components interact in production, and it delivers full coverage on every run instead of a sample.
Who presented the Agentic Pentesting webinar?
Ysrael Gurt, CTO and co-founder of Reflectiz, presented the webinar. It was hosted by Senior Product Marketing Manager Maayan Sulami. Gurt draws on his years as an ethical hacker to explain why exploring every set of circumstances under which code executes is the missing piece in web security.
Why is 80% test coverage no longer enough?
An attacker only needs the one gap your test skipped. If you defend with a raw LLM tool that chases the vulnerabilities it finds interesting and ignores the rest, someone with a slightly different prompt will walk straight through the gap. Full coverage, every time, is the only standard that closes this risk.
Why should penetration testing happen on the live application instead of a staging environment?
Real-world behavior only emerges under real-world conditions. A sanitized staging clone is missing half the wiring, including the specific servers, proxies, and third-party connections that produce emergent vulnerabilities. Testing against anything other than the live, running application leaves those exposures undetected.
Subscribe to our newsletter
Stay updated with the latest news, articles, and insights from Reflectiz.
AI Has Changed The Web.
Are You Ready for What’s Next?
Third-party code shifts by the hour. Supply-chain compromises strike without warning. AI-driven web attacks now evolve faster than traditional security can ever keep up.
Reflectiz delivers the continuous, real-time visibility needed to expose the risks traditional tools miss entirely.
Zero code changes. Zero access to your data. Ultimate peace of mind.
