TechRisk Notes#66: Many-shot GenAI attack + Red teaming GenAI
Plus, code hallucination, Solana buggy outage, NIST Web3 security guide and more!
Tech Risk Reading Picks
Many-shot jailbreaking attack on GenAI: Researchers at Anthropic discovered a new method called "many-shot jailbreaking" to bypass safety measures in AI models like those from OpenAI and Meta. These measures prevent the models from generating harmful or malicious content. The attack exploits the ability of newer AI models to process longer inputs, allowing attackers to input multiple examples containing prohibited content simultaneously. This confuses the model's in-context learning process, making it inadvertently generate forbidden content without realizing it. [more]
Red teaming GenAI guardrails: Security researchers, from Adversa AI, recently tested the robustness of safety measures surrounding popular AI models and found that Grok, a chatbot developed by Elon Musk's x.AI with a "fun mode," was the least secure. Using various methods, including linguistic manipulation, they pushed the boundaries of these models to gauge their safety. For example, they asked Grok how to seduce a child, and the chatbot provided a detailed response, highlighting a concerning vulnerability. [more][more-details-redteaming-experiment]
Code hallucination: A security researcher discovered that AI-generated code examples led to the creation of a nonexistent Python package called "huggingface-cli," which received over 35,000 downloads within three months. Many developers trust AI coding tools and blindly use the suggested code without verifying it, leading to widespread adoption of the fake package by major companies like Alibaba. Despite expectations, AI models haven't effectively tackled the issue of generating misleading code examples. [more]
On AI risk: Understanding the context is crucial in assessing risks, especially concerning reputation. Consider how AI implementation may affect customer expectations and potential impacts on health or finances. While AI can enhance user experience and efficiency, it may also introduce risks like providing false information or disrupting existing workflows. Organizations must prioritize both customer and team experiences when integrating AI technologies. [more]
AI risk on society: NTT and The Yomiuri Shimbun have issued a joint proposal cautioning against the unregulated use of generative AI, citing risks of significant and irreversible harm to society. While acknowledging its benefits like user-friendly interfaces and productivity enhancements, they stress concerns over the lack of human oversight. Potential issues include hallucinations, bias, toxicity, copyright infringement, and undermining incentives for accurate information. They warn of the erosion of trust in society if generative AI remains unchecked, urging for measures to ensure authenticity and trustworthiness. [more]
Web3 Cryptospace Spotlight
NIST Web3 Security publication draft: Titled “A Security Perspective on the Web3 Paradigm”, NIST goes into the evolution of the internet from its basic informational beginnings to the dynamic platform it is today, highlighting the shift towards Web3, where users own and manage their data in a decentralized system. It discusses the underlying concepts, technologies, and potential security and privacy concerns of Web3. NIST is seeking feedback on these aspects until May 27, 2024. [more][more-2]
Solana bugs: Solana co-founder Anatoly Yakovenko revealed on April 5 that a bug causing reduced functionality in the blockchain ecosystem has been identified and patched. The bug led to transaction failures and slowed block finalization. Yakovenko noted that while the Solana network glitch is fixed, implementing other key updates may not be straightforward. The glitch stemmed from the Version 1.14 upgrade. Yakovenko emphasized that addressing bugs is more complex than maintaining network activity and requires a full release and test pipeline. This announcement follows co-founder Raj Gokal's earlier acknowledgment of concerns regarding the Solana chain and the deployment of experts to resolve the issues. [more]
Repeat attacks: The attacker responsibled for the US$320 million Wormhole exploit in 2022 almost received US$50,000 worth of W tokens through an airdrop. However, a user named Pland noticed the error and alerted the Wormhole team. Their quick response stopped the hacker from claiming 31,642 W tokens. [more]