TechRisk #146: OpenAI agentic security researcher
Plus, AI can create voice using photo, Google Cybersecurity Forecast 2026 report, exfiltration through Claude API, AI agent session smuggling attack, and more!
Tech Risk Reading Picks
OpenAI Agentic security researcher: OpenAI has unveiled Aardvark, an autonomous “agentic security researcher” powered by GPT-5, designed to act like a human expert in scanning, understanding, and patching code. Currently in private beta, Aardvark integrates directly into software development pipelines to continuously analyze source code repositories, detect vulnerabilities, assess their exploitability, and propose targeted patches using LLM-based reasoning. Used internally and with select partners, Aardvark has already helped uncover multiple CVEs in open-source projects. Positioned alongside tools like Google’s CodeMender and XBOW, it reflects a growing trend toward AI-driven, continuous security analysis and patching. OpenAI describes Aardvark as a “defender-first” system that enhances security without hindering development speed. [more]
AI can create voice using photo: A new study by Australia’s national science agency reveals that just a photo of a person’s face can now be used to generate a convincing synthetic voice through a method called FOICE (Face-to-Voice), which predicts vocal traits like pitch and tone from facial features. This makes voice impersonation far easier, as photos are readily available online. The technique successfully fooled WeChat’s voice authentication system up to 100% of the time after several tries. Most existing deepfake detectors also failed to detect these new photo-based deepfakes. While retraining detectors with FOICE samples improved accuracy, it reduced their ability to recognize other types of fakes, highlighting a major limitation in current detection methods. [more][more-paper]
Microsoft’s new guide, 5 Generative AI Security Threats You Must Know About: Generative AI is transforming cybersecurity by accelerating threat detection and automation, but it’s also empowering attackers to evolve faster than defenses can adapt. According to Microsoft’s 2025 Digital Threats Report, nation-states like Russia, China, Iran, and North Korea have doubled their use of AI for cyberattacks and disinformation, leveraging it to craft convincing phishing messages, deepfakes, and adaptive malware. As organizations rush to deploy generative AI—66% building custom apps and 80% worried about data leakage—security leaders face new challenges across cloud vulnerabilities, data exposure, and unpredictable model behavior. These risks fuel emerging AI-specific threats such as data poisoning, evasion, and prompt injection attacks, which undermine model trust and integrity. Microsoft’s new guide, 5 Generative AI Security Threats You Must Know About, urges a unified, AI-aware security strategy to defend against this evolving threat landscape. [more][more-report]
Google Cybersecurity Forecast 2026 report: Google Cloud Security’s new Cybersecurity Forecast 2026 report warns that AI-driven cyberthreats and global extortion will surge next year, transforming both attacks and defenses. It predicts that attackers will fully weaponize AI through the use of multimodal generative tools to create realistic phishing, deepfakes, and impersonation campaigns. At the same time, prompt injection attacks will continue to rise in exploiting large language models. The report also flags the growing risk of “shadow agents,” unauthorized AI tools used by employees that create hidden data pipelines and compliance risks, calling for new AI governance frameworks to manage them. Beyond AI, ransomware, data theft, and multifaceted extortion are expected to become the most financially damaging forms of cybercrime, with cascading economic impacts. Virtualization infrastructure is emerging as a new vulnerability, where single breaches could compromise hundreds of systems. The report concludes that 2026 will mark a new era in cybersecurity where both attackers and defenders harness AI, making proactive, multi-layered defenses and strong AI governance essential. [more]
Increased malicious deployments through LLMs: Google warns that LLMs are moving from research curiosities to active tools for attackers, who are building adaptable, AI-powered malware that can generate code, rewrite itself, and evade detection mid-run. Its analysts documented multiple in-the-wild examples (from credential-stealers like QuietVault that use on-host AI tools to hunt secrets, to PromptSteal that queries Qwen for one-line commands, to reverse shells like FruitShell with prompts to bypass LLM-based defenses) alongside experimental projects such as PromptLock and PromptFlux that dynamically generate malicious scripts or rewrite their source via APIs. Google also found underground marketplaces selling illicit “AI as-a-tool” services and observed state-linked actors abusing Gemini and other LLMs for lure writing, tooling, and malware development, signaling a new phase where generative AI both amplifies skilled operators and lowers the barrier for less technical criminals. [more]
OpenAI stealth channel: Microsoft’s Detection and Response Team (DART) disclosed a novel backdoor called SesameOp (discovered in July 2025) that stealthily uses the OpenAI Assistants API as a command-and-control channel to fetch encrypted commands and return execution results. [more]
The implant’s infection chain includes a heavily obfuscated loader (
Netapi64.dll) and a .NET backdoor (OpenAIAgent.Netapi64) loaded via AppDomainManager injection. Attackers also used compromised Visual Studio utilities and internal web shells to maintain persistent, long-term access for likely espionage. Commands are relayed through the Assistants API using message descriptions likeSLEEP,Payload, andResult, enabling sleep timers, remote payload execution, and exfiltration of outputs.
Leaking personal data through indirect prompt injection attacks: Cybersecurity researchers from Tenable have uncovered seven vulnerabilities in OpenAI’s GPT-4o and GPT-5 models that could let attackers steal users’ personal data and chat histories through indirect prompt injection attacks. These flaws, some of which have been fixed, exploit how ChatGPT processes external content and include techniques like zero-click and one-click prompt injections, memory poisoning, and safety bypasses via trusted domains such as Bing. The findings highlight the broader risks of linking AI systems to external data sources, as large language models struggle to distinguish between genuine and malicious instructions. Similar prompt injection and model-poisoning attacks have recently been found affecting other AI systems like Claude, Microsoft 365 Copilot, and GitHub Copilot, revealing an expanding threat surface for AI agents. [more][more-2]
Exfiltration through Claude API: A security researcher discovered that attackers can exploit indirect prompt injections in Anthropic’s Claude to exfiltrate user data when the AI has network access, a feature enabled by default on some plans. The attack abuses Claude’s Files APIs by tricking the model into saving user data to its Code Interpreter sandbox and then uploading it to the attacker’s account using a malicious API key. Up to 30MB can be exfiltrated at once, and multiple files can be sent. The exploit begins when a user opens a malicious document, which hijacks Claude to harvest data (including chat conversations saved via the ‘memories’ feature) and send it to the attacker. [more][more-2]
AI agent session smuggling attack: Palo Alto Networks’ Unit 42 has identified a new AI attack technique called agent session smuggling, which enables a malicious AI agent to covertly inject harmful instructions into an ongoing cross-agent communication session, exploiting the stateful nature of the Agent2Agent (A2A) protocol. Unlike one-time prompt injection attacks, this method leverages agents’ built-in trust and memory across multi-turn conversations, allowing an attacker to manipulate a victim agent invisibly over time. Proof-of-concept demonstrations showed that a rogue agent could exfiltrate sensitive data or initiate unauthorized tool actions within a financial assistant system. While the A2A protocol itself is not vulnerable, its stateful design makes such manipulation possible in any multi-agent environment. Mitigations include enforcing human-in-the-loop (HitL) approvals for sensitive actions, cryptographic verification of agent identities, and context-grounding to detect off-topic instructions. [more]
Growing AI security frameworks and skills demand: AI adoption is rapidly outpacing security and governance across organizations, resulting in a surge of costly AI-related breaches and the rise of “shadow AI,” where untrained employees inadvertently leak sensitive data through unauthorized AI tools. Reports from Tenable, EY, IBM, and the Cloud Security Alliance reveal that most companies lack proper AI governance, access controls, and user training, leaving critical systems exposed. This growing risk has elevated AI governance to a boardroom priority, with more Fortune 100 boards integrating AI oversight and expertise into their risk management frameworks. In response, new frameworks like the CSA’s AI Controls Matrix aim to standardize responsible AI deployment, while cybersecurity professionals with AI-specific skills are seeing increased demand and higher salaries as organizations scramble to secure their rapidly evolving AI ecosystems. [more]
Google will integrate to deliver personalised AI experience - knowing everything about you: Google is developing a more personalised “AI Mode” for Search that will eventually integrate with services like Gmail, Drive, Calendar, and Maps to deliver highly customized results. As explained by Google’s Robby Stein, the goal is to let users opt into an experience where the AI can use personal data (such as emails, documents, and travel details) to provide tailored help, like summarizing flight info or planning schedules. [more]

