TechRisk #130: Fading transparency of AI models
Plus, Grok-4 compromised within 48 hours, Denmark bans deepfake, tricking Google Gemini, Web3 GMX hacker took $5M bounty, and more!
Tech Risk Reading Picks
Fading transparency of AI models: Top researchers from rival AI firms OpenAI, Google DeepMind, Anthropic, and Meta have issued a joint warning that a fleeting opportunity to monitor AI reasoning through “chain-of-thought” (CoT) processes may soon vanish. Their paper emphasizes that current AI systems briefly allow human-readable internal reasoning, enabling early detection of harmful intentions, but this transparency is fragile and likely to disappear as models evolve toward more efficient, opaque architectures. Reinforcement learning, alternative training methods, and shifts to non-language-based reasoning threaten to undermine CoT visibility. While this technique currently helps catch AI misbehavior before it manifests, studies show models already sometimes conceal their true motivations, raising doubts about CoT reliability. The researchers urge coordinated industry action to preserve transparency, warning that without swift safeguards, the ability to understand and regulate AI decision-making may be lost just as it becomes most critical. [more][more-paper]
Grok-4 compromised within 48 hours: NeuralTrust researchers compromised Elon Musk’s Grok-4 AI within 48 hours. They compromised it using two advanced jailbreak methods (i.e. Echo Chamber and Crescendo) to bypass its safety filters. By subtly guiding Grok-4 through repeated exposure to harmful ideas (Echo Chamber) and gradually escalating benign conversations toward illicit topics (Crescendo), the team exploited flaws in the AI’s contextual understanding, prompting it to provide dangerous instructions, such as making Molotov cocktails. Their study exposed that Grok-4 responded with harmful outputs up to 67% of the time, highlighting critical vulnerabilities in current AI safeguards that rely too heavily on keyword filtering rather than comprehensive dialogue analysis. [more]
Law to ban deepfake: Denmark is pioneering a bold legal approach to combat the rising threat of deepfake technology by proposing a copyright law amendment that would grant individuals ownership of their own likeness. This move, supported by Denmark's Parliament, aims to empower citizens to demand the removal of unauthorized digital reproductions of their image and voice, addressing gaps in current legal frameworks that leave most people unprotected against deepfake misuse. The initiative responds to increasingly sophisticated digital fabrications (such as those involving Scarlett Johansson, Tom Cruise, and YouTube's CEO) that pose serious risks including fraud, coercion, and identity theft. By legally defining realistic digital representations and enabling preemptive protections, Denmark hopes to set a global standard, using its upcoming EU presidency to advocate for broader European adoption. [more]
Attack surface of AI agents: The rapid adoption of generative AI in enterprises, ranging from copilots in software development to agents in finance and customer support, has introduced a new, expansive threat surface. Each AI deployment becomes a potential entry point for attackers. Unlike traditional web apps, AI systems function like junior employees with deep access but no oversight, making identity-first security essential. Whether companies build in-house agents or integrate third-party tools, both paths carry risks, especially when access controls are weak or misconfigured. AI agents embedded in critical systems (like codebases, CRMs, and finance apps) can be exploited if identity, device posture, and permissions aren’t tightly enforced. To secure AI without hindering innovation, enterprises must adopt phishing-resistant MFA, real-time RBAC, and continuous device trust checks. Beyond Identity addresses this by providing zero-trust, passwordless access and real-time enforcement tied to user and device identity, ensuring AI systems are secure by design. [more]
GPU memory corruption attack: NVIDIA is warning users about GPUHammer, the first known RowHammer-style attack targeting its GPUs, which can silently corrupt GPU memory and AI models, even degrading model accuracy from 80% to as low as 0.1%. Researchers found that malicious actors in shared environments, like cloud ML platforms, could exploit this vulnerability to tamper with other users’ workloads by triggering bit flips in GPU memory. This poses a serious threat to AI reliability, data integrity, and regulatory compliance. To mitigate risk, NVIDIA urges enabling ECC (Error Correction Code) via
nvidia-smi
although there is a minor performance hit. Separately, there are built-in protections for newer GPUs like the H100. This attack underscores growing concerns about hardware-level threats to AI infrastructure and the need for updated GPU security postures. [more]Bypassing AI guardrail with guessing game: In a recent study, researchers exposed a method to bypass AI safety guardrails by disguising malicious prompts as a harmless "guessing game." By embedding sensitive requests (such as Windows 10 product keys) within HTML tags and setting deceptive game rules (e.g. “I give up” to trigger a reveal), the AI models were tricked into providing restricted information. This exploit worked by masking intent through playful framing and manipulating the AI’s logic flow, revealing weaknesses in current guardrail systems that rely on keyword filtering rather than contextual understanding. The case highlights the urgent need for AI developers to address social engineering tactics and prompt obfuscation in their safety designs. [more]
Potential phishing - tricking Google Gemini: A vulnerability in Google Gemini for Workspace allows attackers to craft emails containing hidden prompt-injection attacks. For example, malicious instructions embedded using invisible HTML/CSS styling. Gemini will read and follow these instructions when it is generating email summaries. Disclosed by researcher Marco Figueroa via Mozilla’s 0din bug bounty program, this technique can trick users into trusting fake warnings or support messages without relying on links or attachments, increasing the chance of bypassing security filters. Despite ongoing defenses and red-teaming efforts by Google, the attack remains effective, prompting recommendations for detecting hidden content and treating Gemini summaries with caution, especially when they contain security alerts. [more]
AI-enhanced RaaS: Cybersecurity researchers have uncovered a new ransomware-as-a-service (RaaS) operation called GLOBAL GROUP, believed to be a rebranding of the BlackLock and earlier Mamona ransomware strains, targeting various sectors across Australia, Brazil, Europe, and the U.S. since June 2025. Operated by a threat actor known as "$$$", the group leverages initial access brokers, compromised network access, and brute-force tools to infiltrate systems, using advanced features like AI-driven negotiation chatbots and customizable payload builders to attract affiliates with an 85% revenue share. GLOBAL GROUP’s malware, written in Go, supports attacks across multiple platforms including VMware ESXi and Windows. Despite a slight overall decline in ransomware incidents, researchers note that sophisticated tactics, increasing automation, and geopolitical instability are fueling a growing and evolving threat landscape. [more]
Web3 Cryptospace Spotlight
GMX hacker accepted bounty: A hacker who stole $42M from the decentralized exchange GMX returned the majority of the stolen cryptocurrency (about $40.5M) in exchange for a $5 million bounty offered by the platform. GMX had assured the hacker of no legal pursuit if the funds were returned, and the hacker eventually complied, transferring the funds in large chunks. GMX later confirmed the vulnerability used in the exploit had been patched, and stated users would be compensated via its bug bounty treasury. However, legal risks remain if the hacker’s identity is revealed, as shown by the prosecution of a similar case involving Mango Markets despite a prior repayment deal. [more]
The vulnerability of GMX exploited: The GMX exploit was caused by a re-entrancy vulnerability in the V1 OrderBook contract, allowing the attacker to bypass key price calculations and manipulate the average short price of BTC. This enabled them to inflate the GLP token price using a flash loan, resulting in fraudulent profits. Although the
nonReentrant
modifier was present, it failed to prevent cross-contract re-entrancy, which the attacker used to call the Vault’sincreasePosition
function directly. The issue doesn't exist in GMX V2 due to improved contract design. In response, GMX paused trading on Avalanche, coordinated with partners to track the stolen funds, and took steps to restrict minting/redemption of GLP tokens. [more]
$3.5M drained: DeFi Arcadia Finance was hacked for $3.5M due to a smart contract vulnerability. It has issued an ultimatum to the attacker to return 90% of the stolen funds within 24 hours and keep 10% as a white-hat bounty. Failling which, the attacker will face legal action and a public bounty. This negotiation tactic is increasingly common in crypto, with platforms like GMX recently recovering funds through similar deals. The hack, attributed to an arbitrary call vulnerability, occurred while Arcadia held over $21M in user deposits, causing its AAA token to plunge 46%. With 2025 shaping up as the worst year yet for crypto theft, Arcadia’s case underscores the ongoing security challenges in DeFi. [more]
Weaponised AI-assisted Web3 development tool stole $500K: A sophisticated cyberattack stole $500K in cryptocurrency from a Russian blockchain developer via a malicious extension targeting the Cursor AI IDE. Despite the victim's strict security measures, the attacker exploited search ranking algorithms to distribute a fake “Solidity Language” extension, which had 54,000 downloads before detection. This extension, lacking real functionality, acted as a dropper for a multi-stage malware chain that installed a legitimate remote access tool (ScreenConnect) to maintain persistent access. The attackers used the Open VSX registry’s relevance-based ranking system to outrank legitimate tools, showcasing a blend of social engineering and technical evasion. Further investigation revealed a broader campaign involving similarly infected npm packages and VS Code extensions, all connected to the same command and control infrastructure. [more]
Web3 malicious tool: A Web3 developer has been accused of distributing malware via a malicious script tool disguised as AI, gaming, and Web3 services, using tactics like fake LinkedIn profiles and social engineering to deceive users. The attack, which exploited a tool called Shellter, tricked users into verifying their wallets on compromised sites, allowing the script to silently steal sensitive data. This incident highlights the escalating cybersecurity threats in the Web3 ecosystem, where trust in emerging tech is being exploited. [more]