With Evals, OpenAI hopes to crowdsource AI model testing

Share Story on

Alongside GPT-4, OpenAI has open-sourced a software framework to evaluate the performance of its AI models. Called Evals, OpenAI says that the tooling will allow anyone to report shortcomings in its models to help guide improvements.

It’s a sort of crowdsourcing approach to model testing, OpenAI explains in a blog post.

“We use Evals to guide development of our models (both identifying shortcomings and preventing regressions), and our users can apply it for tracking performance across model versions and evolving product integrations,” OpenAI writes. “We are hoping Evals becomes a vehicle to share and crowdsource benchmarks, representing a maximally wide set of failure modes and difficult tasks.”

OpenAI created Evals to develop and run benchmarks for evaluating models like GPT-4 while inspecting their performance. With Evals, developers can use data sets to generate prompts, measure the quality of completions provided by an OpenAI model and compare performance across different data sets and models.

Evals, which is compatible with several popular AI benchmarks, also supports writing new classes to implement custom evaluation logic. As an example to follow, OpenAI created a logic puzzles evaluation that contains ten prompts where GPT-4 fails.

It’s all unpaid work, very unfortunately. But to incentivize Evals usage, OpenAI plans to grant GPT-4 access to those who contribute “high-quality” benchmarks.

“We believe that Evals will be an integral part of the process for using and building on top of our models, and we welcome direct contributions, questions, and feedback,” the company wrote.

With Evals, OpenAI — which recently said it would stop using customer data to train its models by default — is following in the footsteps of others who’ve turned to crowdsourcing to robustify AI models.

In 2017, the Computational Linguistics and Information Processing Laboratory at the University of Maryland launched a platform dubbed Break It, Build It, which let researchers submit models to users tasked with coming up with examples to defeat them. And Meta maintains a platform called Dynabench that has users “fool” models designed to analyze sentiment, answer questions, detect hate speech, and more.

With Evals, OpenAI hopes to crowdsource AI model testing by Kyle Wiggers originally published on TechCrunch



Search By Category

Recent News

You May Also Like

Ahmedabad Celebrates New Retail Destination and Marks Anniversary in Style

Ahmedabad Celebrates New Retail Destination and Marks Anniversary in Style

Ahmedabad (Gujarat) [India], February 24:  Ahmedabad was in for a delicious surprise as Palladium Ahmedabad celebrated a dual milestone: the grand opening of their much-anticipated

Better health of healthcare workers in India: A Closer Look Insights by Dr. Gopal Sharan MD & CEO of TRLS Healthcare Consultancy

Better health of healthcare workers in India: A Closer Look Insights by Dr. Gopal Sharan MD & CEO of TRLS Healthcare Consultancy

New Delhi (India), February 24: In the noble profession of healthcare, where the real human touch is felt, dedicated healthcare workers render invaluable services to

Gripping poster of Gujarati movie “31st” unveiled

Gripping poster of Gujarati movie “31st” unveiled

Ahmedabad (Gujarat) [India], February 24: Excitement is mounting as the much-anticipated poster of the Gujarati film “31st” was unveiled in Ahmedabad on Wednesday, offering a tantalising

Akshaya Patra and The Breakfast Revolution Collaborate With USA Based ‘Share Our Strength’ To Impact 10,000 Children For Nutritious Meals

Akshaya Patra and The Breakfast Revolution Collaborate With USA Based ‘Share Our Strength’ To Impact 10,000 Children For Nutritious Meals

Culinary maestro Sanjeev Kapoor, and Chefs Asma Khan, Manish Mehrotra, Prateek Sadhu and Anil Rohira come together to provide a special dining experience to guests

Varmora Granito Unveils Flagship Showroom in Dewas, Madhya Pradesh

Varmora Granito Unveils Flagship Showroom in Dewas, Madhya Pradesh

Dewas (Madhya Pradesh) [India], February 24: Varmora Granito, a leading manufacturer of Tiles, Sanitaryware, and Bathware, is excited to announce the grand opening of its

Vivanza Biosciences Ltd reports Net Profit Jump of 37 per cent in 9MFY24; Total Income more than doubles to Rs. 23.50 crore

Vivanza Biosciences Ltd reports Net Profit Jump of 37 per cent in 9MFY24; Total Income more than doubles to Rs. 23.50 crore

Ahmedabad (Gujarat) [India], February 24: Gujarat based Vivanza Biosciences Ltd (BSE – 530057) – engaged in the business of pharmaceuticals and agro trading has reported