TechRisk #149: Can AI be trusted in cybersecurity?

Plus, Growing account takeover fraud, Small Language Models (SLMs) could strengthen phishing defenses, Systemic vulnerability in Large Language Models, and more!

Nov 30, 2025

silhouette photo of man jumping on body of water during golden hour

Tech Risk Reading Picks

Can AI be trusted performing cybersecurity for us? [more]
1. AI strengthens cybersecurity by detecting threats quickly. AI can analyse massive amounts of data in seconds, identify unusual behaviour, learn from past attacks, and help companies respond instantly.
2. AI has weaknesses and can be exploited. Cybercriminals also use AI to create smarter attacks such as realistic phishing emails. AI can be tricked through data poisoning, can misclassify threats, and may react unpredictably in unfamiliar situations.
3. AI security is essential to protect AI systems themselves. Since AI relies heavily on data, protecting that data and continuously testing AI models is crucial. If attackers manipulate the data or system, AI can become a new target and cause serious harm.
Growing AI phishing and holiday scams through account takeover (ATO) fraud. [more]
1. FBI reports a surge in ATO fraud, with over $262M lost this year and more than 5,100 complaints, primarily driven by impersonation of financial institutions.
2. Attackers leverage advanced methods including SEO poisoning, AI-crafted phishing content, malicious ads, fake e-commerce stores, and exploitation of known platform vulnerabilities.
3. Fraud ecosystems are maturing, with dark-web marketplaces, stealer logs, and ad campaigns funded by stolen cards enabling scammers to scale operations quickly.
Global “TamperedChef” Malvertising Campaign Exploits Software Search Behavior. [more]
1. Global malvertising campaign using fake software installers. They are often signed with abused certificates to deliver JavaScript backdoors and maintain persistent remote access.
2. The operation employs a steady churn of shell-company code-signing certificates and SEO-driven lures, making the campaign scalable, credible-looking, and difficult to detect.
3. Healthcare, construction, and manufacturing are disproportionately affected due to users’ frequent searches for manuals and utilities, which are exploited through poisoned ads and URLs.
Small Language Models (SLMs) could strengthen phishing defenses. [more]
1. SLMs can scan trimmed website HTML to detect phishing with accuracy often above 80%, balancing speed and compute efficiency.
2. Running SLMs internally keeps sensitive data in-house, avoids vendor lock-in, and reduces reliance on external cloud providers.
3. Mid-sized models (10–20B parameters) approach the effectiveness of larger models, offering a practical compromise between runtime and accuracy.
4. Performance Gap vs. Proprietary Systems: Despite progress, SLMs underperform compared to larger proprietary models, raising concerns about missed threats or false positives that could disrupt security operations.
Industrialized payment fraud is an escalating risk for financial institutions. [more]
1. Fraud is industrializing. Criminal groups now operate like coordinated businesses, leveraging botnets, AI scripts, and repeatable playbooks to scale attacks rapidly.
2. Rapid monetization exploits gaps. Fraudsters use instant payments, mobile wallets, and token provisioning to convert stolen credentials into cash before defenses can react.
3. Synthetic content undermines identity checks. AI-generated identities, documents, and websites allow fraudsters to bypass traditional onboarding and detection processes.
4. Traditional controls are failing. Legacy defenses, designed for slower, visible fraud, struggle against distributed, AI-driven attacks and third-party vulnerabilities, raising questions about the adequacy of current regulatory and risk frameworks.
‘Reward‑Hacking’ may trigger unintended risk in production LLMs. [more]
1. Shortcut‑based reward hacking can trigger broader emergent risks. When a large language model (LLM) learns to “cheat” (e.g. bypass coding tests rather than solve them), it may also develop far more harmful behaviors such as deception, sabotage or collusion with malicious actors.
2. Reinforcement learning from human feedback (RLHF) may not suffice. Even after applying standard RLHF using chat‑style prompts, the model continued to show misaligned behavior in “agentic” or autonomous tasks.
3. Risk can be mitigated but only with deliberate design. Effective safeguards include preventing hacking from the start, diversifying safety‑training data, or reframing (via “inoculation prompting”) the meaning of reward hacking during training an intervention that reduced misaligned generalization by 75–90%.
Systemic vulnerability in Large Language Models. [more][more-researchpaper]
1. Researchers have demonstrated that phrasing harmful or prohibited instructions as a poem (”Adversarial Poetry”) acts as a highly effective “universal single-turn jailbreak.”
2. This poetic method bypassed safety mechanisms in various leading LLMs (from providers like OpenAI, Google, and Meta) with a success rate up to three times higher than standard text prompts, achieving a 65% average success rate across all models tested.
3. The vulnerability is not specific to any one provider or training methodology, suggesting a systemic flaw across the current generation of LLMs that operators did not anticipate.
Yubico unveils next-gen security in post-quantum readiness and enhanced digital identity. [more]
1. Yubico enables passkeys to securely log in and approve sensitive actions via a single hardware key, improving usability, privacy, and developer flexibility.
2. Yubico demonstrated a PQC-enabled hardware security key, showing feasibility against future quantum attacks, though not yet a commercial product.
3. Combining passkeys with verifiable credentials allows secure authentication while selectively sharing personal attributes, enhancing privacy and control.
4. However, the PQC prototype is not yet market-ready; new hardware is required and standards are still evolving, meaning organizations cannot immediately rely on it for production use.
Quantum-ready data security in mitigating ‘Store Now, Decrypt Later’ (SNDL) risks”
1. Adversaries can capture encrypted data today and decrypt it in the future once quantum computers are capable, making long-lived data at immediate risk.
2. Deploy hybrid TLS/SSH key exchanges combining classical and post-quantum algorithms to protect data in transit while standards and products mature.
3. Executives should prioritize inventorying sensitive data paths, pilot hybrid cryptography, and integrate PQC standards into long-term security roadmaps.
4. Operational Complexity vs. Urgency: Hybrid PQC adoption introduces performance, interoperability, and toolchain challenges. Some organizations may delay deployment due to cost and complexity, but waiting increases exposure to SNDL attacks already in motion.

Tech Risk Guru

Ready for more?