Anthropic dares you to try to jailbreak Claude AI

Commercial AI chatbot products like ChatGPT, Claude, Gemini, DeepSeek, and others have safety precautions built in to prevent abuse. Because of the safeguards, the chatbots won’t help with criminal activity or malicious requests — but that won’t stop users from attempting jailbreaks.

Some chatbots have stronger protections than others. As we saw recently, DeepSeek might have stunned the tech world last week, but DeepSeek is not as safe as other AI when it comes to offering help for malicious activities. Also, DeepSeek can be jailbroken with certain commands to circumvent the built-in censorship. The Chinese company will probably improve these protections and prevent known jailbreaks in future releases.

Meanwhile, Anthropic already has extensive experience dealing with jailbreak attempts on Claude. The AI firm has devised a brand-new defense against universal AI jailbreaks called Constitutional Classifiers that prevents Claude from providing help with nefarious activities. It works even when dealing with unusual prompts that might jailbreak some other AI models.

The system is so good that over 180 security researchers spent more than 3,000 hours over two months trying to jailbreak Claude. They were not able to devise a universal jailbreak. You can test your luck if you think you have what it takes to force Claude to answer 10 questions with your jailbreak.

Continue reading…

The post Anthropic dares you to try to jailbreak Claude AI appeared first on BGR.

Today’s Top Deals

Today’s deals: $23 space heater, $50 off Meta Quest 3S, Roku smart TVs from $170, $50 wireless CarPlay, more
Today’s deals: $679 Apple Watch Ultra 2, 30% off Ninja blender, Valentine’s chocolate sale, Nest Doorbell, more
Today’s deals: $249 iPad 9th-Gen, $1,200 off AOC gaming laptop, Super Bowl TV deals, $79 Roku Ultra, more
Today’s deals: $329 Apple Watch Series 10, $219 Bose soundbar, 40% off eufy video smart lock, more

Anthropic dares you to try to jailbreak Claude AI originally appeared on BGR.com on Tue, 4 Feb 2025 at 08:03:24 EDT. Please see our terms for use of feeds.

Leave a Reply

Your email address will not be published. Required fields are marked *