OpenAI and Anthropic agree to send models to US government for safety evaluations

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

OpenAI and Anthropic signed an agreement with the AI Safety Institute under the National Institute of Standards and Technology (NIST) to collaborate for AI model safety research, testing and evaluation.

The agreement provides the AI Safety Institute with new AI models the two companies plan to release before and after public release. This is the same safety evaluation taken by the U.K.’s AI Safety Institute, where AI developers grant access to pre-released foundation models for testing.

“With these agreements in place, we look forward to beginning our technical collaborations with Anthropic and OpenAI to advance the science of AI safety,” said AI Safety Institute Director Elizabeth Kelly in a press release. “These agreements are just the start, but they are an important milestone as we work to help responsibly steward the future of AI.”

The AI Safety Institute will also give OpenAI and Anthropic feedback “on potential safety improvements to their models, in close collaboration with its partners at the U.K. AI Safety Institute.”

Collaboration on safety

Both OpenAI and Anthropic said signing the agreement with the AI Safety Institute will move the needle on defining how the U.S. develops responsible AI rules.

“We strongly support the U.S. AI Safety Institute’s mission and look forward to working together to inform safety best practices and standards for AI models,” Jason Kwon, OpenAI’s chief strategy officer, said in an email to VentureBeat. “We believe the institute has a critical role to play in defining U.S. leadership in responsible developing artificial intelligence and hope that our work together offers a framework that the rest of the world can build on.”

OpenAI leadership previously vocalized support for some sort of regulations around AI systems despite concerns coming from former employees that the company abandoned safety as a priority. Sam Altman, OpenAI CEO, said earlier this month that the company is committed to providing its models to government agencies for safety testing and evaluation before release.

we are happy to have reached an agreement with the US AI Safety Institute for pre-release testing of our future models.
for many reasons, we think it’s important that this happens at the national level. US needs to continue to lead!
— Sam Altman (@sama) August 29, 2024

Anthropic, which has hired some of OpenAI’s safety and superalignment team, said it sent its Claude 3.5 Sonnet model to the U.K.’s AI Safety Institute before releasing it to the public.

“Our collaboration with the U.S. AI Safety Institute leverages their wide expertise to rigorously test our models before widespread deployment,” said Anthropic co-founder and Head of Policy Jack Clark in a statement sent to VentureBeat. “This strengthens our ability to identify and mitigate risks, advancing responsible AI development. We’re proud to contribute to this vital work, setting new benchmarks for safe and trustworthy AI.”

Not yet a regulation

The U.S. AI Safety Institute at NIST was created through the Biden administration’s executive order on AI. The executive order, which is not legislation and can be overturned by whoever becomes the next president of the U.S., called for AI model developers to submit models for safety evaluations before public release. However, it cannot punish companies for not doing so or retroactively pull models if they fail safety tests. NIST noted that providing models for safety evaluation remains voluntary but “will help advance the safe, secure and trustworthy development and use of AI.”

Through the National Telecommunications and Information Administration, the government will begin studying the impact of open-weight models, or models where the weight is released to the public, on the current ecosystem. But even then, the agency admitted it cannot actively monitor all open models.

While the agreement between the U.S. AI Safety Institute and two of the top names in AI development shows a path to regulating model safety, there is concern that the term safety is too vague, and the lack of clear regulations muddles the field.

Ah yes…the vague and loosely defined concept of “safety” being thrown around again. I can’t help but reflect how many times in human history that “safety” has been used as a pretext for the worst policies and decisions ever made. But, I’m sure it will be different this time.
— Lucas Baker (@lucasbaker) August 29, 2024

OpenAI and Anthropic have signed memoranda of understanding with the US AI Safety Institute to do pre-release testing of frontier AI models.
I would be curious to know the terms, given that these are quasi-regulatory agreements.
What happens if AISI says, “don’t release”? https://t.co/on28rf0hYP
— Dean W. Ball (@deanwball) August 29, 2024

Groups looking at AI safety said the agreement is a “step in the right direction,” but Nicole Gill, executive director and co-founder of Accountable Tech said AI companies have to follow through with their promises.

“The more insight regulators can gain into the rapid development of AI, the better and safer the products will be,” Gill said. “NIST must ensure that OpenAI and Antrhopic follow through on their commitments; both have a track record of making promises, such as the AI Election Accord, with very little action. Voluntary commitments from AI giants are only a welcome path to AI safety process if they follow through on these commitments.”

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Collaboration on safety

Not yet a regulation

Leave a Comment Cancel reply