TechRisk #138: Why AI models hallucinate

Plus, risk of using Claude AI latest feature, free LLM testing tool, malware targeting crypto users on macOS, Windows, and Linux,and more!

Sep 14, 2025

Tech Risk Reading Picks

Why language models hallucinate: OpenAI highlights that while language models like GPT-5 have reduced hallucinations, they remain a core challenge because current training and evaluation methods reward guessing over acknowledging uncertainty. Hallucinations occur when models confidently generate false yet plausible answers, often due to the next-word prediction process, which lacks negative labels and struggles with arbitrary low-frequency facts. Accuracy-focused benchmarks worsen the problem by encouraging risky guesses rather than honest abstentions, inflating scores while increasing errors. OpenAI argues that evaluations should penalize confident errors more than uncertainty and give partial credit for admitting “I don’t know,” which would align incentives with reliability. Their research clarifies that hallucinations are not mysterious glitches, nor inevitable, but rather a predictable outcome of current practices. This is solvable through better calibration, improved scoring systems, and models trained to embrace humility. [more]
Risk of using Claude AI latest feature: Anthropic’s Claude AI has introduced the ability to create and edit Word documents, Excel spreadsheets, PowerPoint slides, and PDFs directly in its desktop and web apps. While this offers time-saving benefits, it raises serious security concerns. While the feature is currently limited to Claude subscribers, it will soon expand to Pro users. As Claude requires limited internet access to perform these tasks, it remains vulnerable to prompt injection and malicious code that could expose sensitive data. The risk remains even within its sandboxed environment. On that note, Anthropic advises users to monitor interactions closely, disable the feature when not needed, and apply organizational security controls. Importantly, the responsibility falls on users themselves. [more]
Cursor code editor can auto execute malicious tasks: A security flaw in the Cursor code editor, an AI-powered IDE forked from VS Code, exposes developers to automatic execution of malicious tasks when opening untrusted repositories due to its disabling of VS Code’s Workspace Trust feature. Researchers at Oasis Security discovered that attackers could exploit this by planting a malicious .vscode/tasks.json file, enabling arbitrary code execution that could leak credentials, steal API tokens, or facilitate supply-chain attacks—all without user interaction. While VS Code remains unaffected, Cursor chose not to change its default autorun behavior, arguing Workspace Trust interferes with AI features. Instead, Cursor advises users to manually enable Workspace Trust or use safer editors for unknown projects, while Oasis Security urges caution with unverified repositories and sensitive credential management. [more]
Model namespace reuse attack: A newly discovered security flaw called “Model Namespace Reuse” allows attackers to hijack AI models on platforms like Google Vertex AI, Microsoft Azure AI Foundry, and open-source repositories such as Hugging Face by re-registering deleted or transferred model names and uploading malicious versions in their place. Since many developers’ systems automatically pull models by name alone, this can secretly replace these trusted models with compromised ones. Researchers stress that relying on names as proof of authenticity is unsafe, urging developers to pin models to specific verified versions or store them internally after security checks. [more]
Google's Gemini AI products for children and teens are "High Risk”: In a new risk assessment, the nonprofit Common Sense Media found that Google's Gemini AI products for children and teens are "High Risk," despite some built-in safety features. The report notes that the "Under 13" and "Teen Experience" tiers are essentially adult versions of Gemini with limited safety guardrails, which can still provide inappropriate or unsafe information on topics like sex, drugs, and mental health. Common Sense Media's senior director of AI Programs, Robbie Torney, stated that AI products for kids should be designed specifically for their developmental needs from the ground up, rather than simply being a modified version of an adult product. This assessment, which follows similar "unacceptable" ratings for Meta AI and Character.AI, comes as Apple considers using Gemini to power its forthcoming AI-enabled Siri. In response, Google stated it has policies and safeguards in place to protect users under 18 and is actively working to improve them. [more][more-AI_risk_assessments]
Sensitive data continue to leak: A new Kiteworks report warns that employees are routinely uploading sensitive data, such as customer records, financials, and credentials, into public AI tools. Only 17% of the organisations have technical controls like upload blocking or scanning, with the rest relying on weak measures such as training or guidelines. This blind spot is worsened by executive overconfidence: one-third think their AI usage is tracked, but only 9% have effective governance. The gap poses major compliance risks, as regulations like GDPR, HIPAA, and SOX require visibility and audit trails that shadow AI use undermines. [more]
Verify authentic or AI-generated images: Google is rolling out C2PA Content Credentials on the Pixel 10 camera and Google Photos to help users verify whether images are authentic or AI-generated/edited. Every photo captured on the Pixel 10 will carry these credentials, which securely document how it was created and any edits applied, using cryptographic signatures, tamper-resistant storage, one-time-use keys, and on-device trusted timestamps to ensure privacy and integrity. The system, which works offline and protects anonymity, aims to combat the growing challenge of labeling synthetic media by offering verifiable provenance rather than simplistic AI labels. While initially limited to Pixel 10 devices, Google plans to expand the feature to more Android phones, calling for industry-wide adoption to build transparency and trust in the face of deepfakes and misinformation. [more]
Free LLM testing tool: Garak is a free, open-source vulnerability scanner for large language models (LLMs) that helps developers identify weaknesses such as hallucinations, prompt injections, jailbreaks, and toxic outputs. It is compatible with a wide range of platforms (including Hugging Face Hub, Replicate, OpenAI API, LiteLLM, REST-accessible systems, and GGUF models like llama.cpp). It generates detailed logs, including a persistent debug log, JSONL reports of probing attempts with status tracking, and a hit log of vulnerabilities found. [more][more-Garak_github]
A$800,000 pay deduction for Qantas executives after cyberattack: Qantas Airways docked CEO Vanessa Hudson and five senior executives a combined A$800,000 in pay after a cyberattack compromised data of 5.7 million customers, signaling tougher accountability measures. Hudson forfeited A$250,000 but still saw her total remuneration rise to A$6.31 million for the year, up from A$4.38 million. The board acted swiftly following criticism of Qantas’ culture and recent governance reforms, with Chair John Mullen stating the cuts underscored the seriousness of the breach despite management’s prompt response to contain it and support affected customers. [more]

Web3 Cryptospace Spotlight

Malware in ETH: Hackers are increasingly exploiting Ethereum smart contracts to spread malware, disguising malicious activity as normal blockchain interactions, with incidents like the GMX V1 and Bunni exploits exposing systemic vulnerabilities despite audits. In Q1 2025 alone, over $2 billion was lost to smart contract flaws, access control failures, and major heists such as ByBit’s $1.5 billion breach. Researchers warn that weaponized contracts, often distributed through npm packages, can bypass traditional defenses by leveraging blockchain’s decentralized nature. Experts urge developers to adopt advanced auditing, rigorous security checks, and AI-powered tools. Users should stick to reputable dApps and avoid suspicious contracts. [more]
Malware targeting crypto users on macOS, Windows, and Linux: A newly discovered malware called ModStealer is targeting crypto users on macOS, Windows, and Linux by stealing private keys, credentials, and browser-based wallet extensions. It has been operating stealthily in the background and evading major antivirus detection for weeks. Discovered by Apple-focused security firm Mosyle, the malware persists on macOS by registering as a background agent and is distributed via fake job recruitment ads targeting Web3 developers. Once installed, it captures clipboard data, screenshots, and executes remote commands. Security experts, including Hacken’s Stephen Ajayi, warn developers to verify recruiters, use disposable environments for test tasks, and maintain strict separation between development and wallet systems. [more]
$13.5M saved through good governance: 2 Sep 2025, Venus Protocol, a DeFi lending platform, successfully recovered $13.5 million stolen in a phishing attack linked to North Korea’s Lazarus Group. This marks DeFi’s first major fund recovery through emergency governance. The attackers exploited a major user’s Zoom client to gain delegated account control. Through that, the attackers drain stablecoins, wrapped Bitcoin, and other tokens. Upon detecting suspicious activity, Venus Protocol’s security partners, HExagate and Hypernative, flagged the transaction within minutes, prompting an emergency platform pause. This measure prevented further movement of funds and allowed Venus to investigate the incident. A subsequent emergency governance vote authorized the forced liquidation of the attacker’s wallet, enabling the recovery of stolen assets and their transfer to a secure recovery address. The entire process took less than 12 hours, marking the first major successful fund recovery in DeFi history using emergency governance powers. [more]
API hack led to $41M loss: SwissBorg, a Swiss crypto wealth platform, suffered a $41 million loss when hackers exploited a vulnerability in the API of its staking partner Kiln, draining 193,000 Solana (SOL) tokens from its Solana Earn program. The breach, which affected about 1% of users and 2% of total assets, did not impact SwissBorg’s app or other Earn products. The hackers manipulated Kiln’s API, which connects SwissBorg to Solana’s staking network, to siphon funds. SwissBorg confirmed it remains financially stable, will reimburse affected users, and is collaborating with international agencies, exchanges, and white hat hackers to investigate. Blockchain records show the stolen SOL was transferred to a wallet now labeled “SwissBorg Exploiter” on Solscan. [more]
Failure to fix known vulnerability led to $2.6M loss: Sui-based yield trading protocol Nemo suffered a $2.59 million loss due to a known vulnerability in unaudited code, specifically in a function meant to reduce slippage, which was deployed onchain without proper review or adherence to audit procedures. Although auditor Asymptotic had flagged the issue, Nemo’s team failed to address it in time. In addition, the deployment process allowed a single developer to push changes without confirmation. The vulnerability, present since January, was exploited on 7 Sep. The exploit forced Nemo to halt core protocol functions while working with security teams to create a patch that eliminated the vulnerable flash loan feature and implemented additional safeguards. The project is now implementing stricter controls, auditing new code, and the plan to compensate users. [more]

TECHRISK GURU

TechRisk #138: Why AI models hallucinate

Plus, risk of using Claude AI latest feature, free LLM testing tool, malware targeting crypto users on macOS, Windows, and Linux,and more!

Tech Risk Reading Picks

Web3 Cryptospace Spotlight