Canyon Crest Guide Newspaper Ads Canyon Crest CA

Stay informed with free updates

Artificial intelligence start-up Anthropic has demonstrated a new technique to prevent users from eliciting harmful content from its models, as leading tech groups including Microsoft and Meta race to find ways that protect against dangers posed by the cutting-edge technology.

In a paper released on Monday, the San Francisco-based start-up outlined a new system called “constitutional classifiers”. It is a model that acts as a protective layer on top of large language models such as the one that powers Anthropic’s Claude chatbot, which can monitor both inputs and outputs for harmful content.

Newspaper Ads Canyon Crest CACanyon Crest Guide Newspaper Ad

The development by Anthropic, which is in talks to raise $2bn at a $60bn valuation, comes amid growing industry concern over “jailbreaking” — attempts to manipulate AI models into generating illegal or dangerous information, such as producing instructions to build chemical weapons.

Other companies are also racing to deploy measures to protect against the practice, in moves that could help them avoid regulatory scrutiny while convincing businesses to adopt AI models safely. Microsoft introduced “prompt shields” last March, while Meta introduced a prompt guard model in July last year, which researchers swiftly found ways to bypass but have since been fixed.

Mrinank Sharma, a member of technical staff at Anthropic, said: “The main motivation behind the work was for severe chemical [weapon] stuff [but] the real advantage of the method is its ability to respond quickly and adapt.”

Anthropic said it would not be immediately using the system on its current Claude models but would consider implementing it if riskier models were released in future. Sharma added: “The big takeaway from this work is that we think this is a tractable problem.”

The start-up’s proposed solution is built on a so-called “constitution” of rules that define what is permitted and restricted and can be adapted to capture different types of material.

Some jailbreak attempts are well-known, such as using unusual capitalisation in the prompt or asking the model to adopt the persona of a grandmother to tell a bedside story about a nefarious topic.

To validate the system’s effectiveness, Anthropic offered “bug bounties” of up to $15,000 to individuals who attempted to bypass the security measures. These testers, known as red teamers, spent more than 3,000 hours trying to break through the defences.

Anthropic’s Claude 3.5 Sonnet model rejected more than 95 per cent of the attempts with the classifiers in place, compared to 14 per cent without safeguards.

Leading tech companies are trying to reduce the misuse of their models, while still maintaining their helpfulness. Often, when moderation measures are put in place, models can become cautious and reject benign requests, such as with early versions of Google’s Gemini image generator or Meta’s Llama 2. Anthropic said their classifiers caused “only a 0.38 per cent absolute increase in refusal rates”.

However, adding these protections also incurs extra costs for companies already paying huge sums for computing power required to train and run models. Anthropic said the classifier would amount to a nearly 24 per cent increase in “inference overhead”, the costs of running the models.

Bar chart of Tests conducted on its latest model showing Effectiveness of Anthropic’s classifiers

Security experts have argued that the accessible nature of such generative chatbots has enabled ordinary people with no prior knowledge to attempt to extract dangerous information.

“In 2016, the threat actor we would have in mind was a really powerful nation-state adversary,” said Ram Shankar Siva Kumar, who leads the AI red team at Microsoft. “Now literally one of my threat actors is a teenager with a potty mouth.”


Source link

California Estate Planning
Tiny House For Sale
Canyon Crest Guide Newspaper Ads Canyon Crest CA
Whether you're a startup or an established brand, business directories offer an affordable, yet powerful tool to elevate your brand recognition and reach. Sign up, stand out, and let your business soar to new heights, sign up to one of our directory websites:

Canyon Crest Directory
Riverside Ca Business Directory
The Riverside Coupon Directory
Content Writing Service

Newspaper Ads Canyon Crest CA

Click To See Full Page Ads

Click To See Half Page Ads

Click To See Quarter Page Ads

Click To See Business Card Size Ads

If you have questions before you order, give me a call @ 951-235-3518

or email @ canyoncrestnewspaper@gmail.com

Like us on Facebook Here
Canyon Crest Guide
5225 Canyon Crest Drive Ste.71 #854 Riverside CA 92507
Tony Ramos 951-235-3518
For great backlinks to your website sign up to one of our directory websites:
Canyon Crest Directory
Riverside Ca Business Directory
The Riverside Coupon Directory
Previous articleThe XI Code Unlocks Human Potential Through Quantum Coherence – OC Weekly
Next articleUSAID to be merged into State Department, 3 U.S. officials say
Article Content Writer We write content articles for all businesses. We produce content that can include blog posts,website articles, landing pages, social media posts, and more. Reach out for more information to canyoncrestguide@gmail.com, "Best to You" Tony.