TechRisk #148: Claude orchestrated cyber-espionage tasks
Plus, attackers simply log in, second-order prompt injection attacks, flip tokens, and more!
Tech Risk Reading Picks
Claude used by APT to automated cyber espionage: Chinese state-sponsored actors conducted a first-of-its-kind automated cyber-espionage campaign (GTG-1002) in September 2025 by weaponizing Anthropic’s Claude Code and MCP tools to perform 80–90% of attack operations autonomously. Using Claude as an “agentic,” autonomous hacking system, the group targeted ~30 global organizations across tech, finance, chemicals, and government, succeeding in some intrusions. The AI handled reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data analysis, and exfiltration, while humans only approved key escalation steps. The attackers concealed malicious intent by framing prompts as routine technical tasks, enabling Claude to generate payloads, parse proprietary data, and document attacks for long-term use. [more][more-2]
Second-order prompt injection attacks: Malicious actors can exploit default settings in ServiceNow’s Now Assist AI platform to launch second-order prompt injection attacks that use agent discovery and agent-to-agent collaboration to perform unauthorized actions behind the scenes. According to AppOmni, attackers can embed crafted prompts in accessible content, causing a benign agent to unknowingly recruit more powerful agents to read or alter records, exfiltrate sensitive data, escalate privileges, or send emails—despite built-in protections. Because agents inherit the privileges of the initiating user and are discoverable and team-grouped by default, overlooked configurations create significant risk. ServiceNow clarified that this behavior is expected, underscoring the need for stronger AI-agent protections and mitigations such as supervised execution for privileged agents, disabling autonomous overrides, segmenting agent teams, and monitoring for suspicious activity. [more]
Flip tokens: HiddenLayer’s early-2025 research reveals EchoGram, a vulnerability affecting major LLMs such as GPT-5.1, Claude, and Gemini that allows attackers to bypass or corrupt AI safety guardrails using simple, specially crafted word or symbol sequences called flip tokens. By exploiting gaps in the training data of both classifier-based and LLM-as-a-judge defence models, these nonsensical tokens slip through filters while causing the guardrails to “flip” their verdicts. This will either let harmful requests through or falsely flag harmless ones. [more]
For example, when HiddenLayer researchers were testing an older version of their own defence system, a malicious command was approved when a random string “=coffee” was simply added to the end.
Exploiting coding assistant: A security audit by AI safety firm Mindgard uncovered four major vulnerabilities in the widely used Cline Bot coding assistant, showing how attackers could steal secret keys, bypass safety checks, execute malicious code, or even extract internal model details simply by hiding prompt-injection traps inside project files. Discovered within two days of testing in August 2025, the flaws reveal how overly trusting “helper” AIs can be weaponised when developers open compromised codebases and request analysis. Mindgard also obtained Cline Bot’s system prompt, demonstrating that knowing its exact instructions makes it easier to exploit behavioural loopholes. [more]
Attacking hidden MCP API in Comet browser: SquareX has uncovered a hidden and largely undocumented MCP API in Perplexity’s Comet browser that allows embedded extensions to run arbitrary local commands—power normally blocked by traditional browser security models. The API, accessible via Comet’s Agentic extension and triggerable by the perplexity.ai site, creates a covert channel that could let attackers gain full control of users’ devices if Perplexity or its supply chain is ever compromised. SquareX’s demo shows how a spoofed extension can chain through Comet’s embedded extensions to execute malware like WannaCry, a risk amplified by the fact that these extensions are invisible to users and cannot be disabled. [more]
Remote code execution bugs in AI: Researchers have uncovered widespread remote-code-execution flaws across major AI inference engines from Meta, Nvidia, Microsoft, vLLM, and SGLang. All are traced to a shared unsafe pattern of copy-pasted ZeroMQ sockets using Python pickle deserialization, dubbed “ShadowMQ.” This vulnerability originates from Meta’s Llama framework (CVE-2024-50050) and later replicated across multiple projects. The issue allows attackers to send malicious data over exposed ZMQ TCP sockets to execute arbitrary code, risking full cluster compromise, model theft, and malware deployment. [more]
AI-generated payloads used in global campaign: ShadowRay 2.0 is a global campaign hijacking exposed Ray clusters through an unfixed code-execution flaw (CVE-2023-48022), turning them into a self-spreading cryptomining and attack botnet. Threat actor IronErn440 uses AI-generated payloads to mine Monero, steal data and credentials, deploy DDoS attacks, and propagate across clusters via Ray’s unauthenticated Jobs API. With over 230,000 Ray servers exposed online, defenders are urged to firewall access, secure dashboard ports, and monitor AI clusters since no official patch exists. [more]
Attackers rarely break in anymore, they simply log in: Over the past decade, cloud migration has reshaped enterprise security, but attackers have adapted even faster, shifting toward identity-centric intrusions that quietly exploit credentials rather than technical vulnerabilities. The Elastic Global Threat Report 2025 shows that nearly 60% of cloud threats stem from identity-driven attacks, fuelled by infostealers that harvest browser-stored credentials, tokens, and cookies. With overprivileged accounts, weak identity governance, and logging gaps across platforms like Microsoft Entra, threat actors routinely escalate privileges, move laterally through federated cloud services, and maintain long-term persistence using legitimate authentication artefacts that bypass MFA and evade traditional security tools. As malware trends shift toward credential theft and simple AI-generated loaders, defenders struggle with fragmented visibility and outdated perimeter-based controls. The report underscores an urgent need for organisations to treat identity as a primary attack surface, adopt behavioural analytics, enforce Zero Trust, eliminate long-lived keys, harden developer workflows, and elevate browser security. Because in today’s cloud landscape, attackers rarely break in anymore; they simply log in. [more]
HackGPT: HackGPT Enterprise, developed by Yashab Alam, is a cloud-native security platform that automates large-scale vulnerability testing using AI and machine learning, integrating models like GPT-4 and Ollama to detect anomalies, patterns, and zero-day exploits. Following a six-phase penetration testing methodology, the platform prioritizes risks based on CVSS scores and business impact while mapping to compliance frameworks such as OWASP, NIST, and PCI-DSS. Built on a Docker and Kubernetes microservices architecture with AES-256 encryption, LDAP-based access control, and real-time dashboards powered by Prometheus and Grafana, HackGPT supports AWS, Azure, and GCP deployments. [more]

