With Evals, OpenAI hopes to crowdsource AI model testing

Share Story on

Alongside GPT-4, OpenAI has open-sourced a software framework to evaluate the performance of its AI models. Called Evals, OpenAI says that the tooling will allow anyone to report shortcomings in its models to help guide improvements.

It’s a sort of crowdsourcing approach to model testing, OpenAI explains in a blog post.

“We use Evals to guide development of our models (both identifying shortcomings and preventing regressions), and our users can apply it for tracking performance across model versions and evolving product integrations,” OpenAI writes. “We are hoping Evals becomes a vehicle to share and crowdsource benchmarks, representing a maximally wide set of failure modes and difficult tasks.”

OpenAI created Evals to develop and run benchmarks for evaluating models like GPT-4 while inspecting their performance. With Evals, developers can use data sets to generate prompts, measure the quality of completions provided by an OpenAI model and compare performance across different data sets and models.

Evals, which is compatible with several popular AI benchmarks, also supports writing new classes to implement custom evaluation logic. As an example to follow, OpenAI created a logic puzzles evaluation that contains ten prompts where GPT-4 fails.

It’s all unpaid work, very unfortunately. But to incentivize Evals usage, OpenAI plans to grant GPT-4 access to those who contribute “high-quality” benchmarks.

“We believe that Evals will be an integral part of the process for using and building on top of our models, and we welcome direct contributions, questions, and feedback,” the company wrote.

With Evals, OpenAI — which recently said it would stop using customer data to train its models by default — is following in the footsteps of others who’ve turned to crowdsourcing to robustify AI models.

In 2017, the Computational Linguistics and Information Processing Laboratory at the University of Maryland launched a platform dubbed Break It, Build It, which let researchers submit models to users tasked with coming up with examples to defeat them. And Meta maintains a platform called Dynabench that has users “fool” models designed to analyze sentiment, answer questions, detect hate speech, and more.

With Evals, OpenAI hopes to crowdsource AI model testing by Kyle Wiggers originally published on TechCrunch



Search By Category

Recent News

You May Also Like

Rewards and Loyalty Gateway Benepik nears 100 Crore Revenues in FY 22-23

Rewards and Loyalty Gateway Benepik nears 100 Crore Revenues in FY 22-23

Gurugram (India), March 25: Benepik, a Rewards, Loyalty and Engagement Gateway to Businesses announced that it has reached a major milestone, nearing INR 100 Crores

Data Intelligence Firm, Near, to Debut on Nasdaq Under Ticker “NIR”

Data Intelligence Firm, Near, to Debut on Nasdaq Under Ticker “NIR”

Near Intelligence Holdings Inc. and KludeIn I Acquisition Corp. Announce Closing of Business Combination PASADENA, CA, March 25: Near, a global leader in privacy-led data intelligence on people, places

Producer Akshai Puri’s next “Gaslight” will unlock the royal world of deep dark secrets, a murder mystery and a classic whodunit

Producer Akshai Puri’s next “Gaslight” will unlock the royal world of deep dark secrets, a murder mystery and a classic whodunit

New Delhi (India), March 25: Akshai Puri, Producer, 12th Street Entertainment gears up to redefine thriller genre for the audience, and at the same time

Launched in Pune, MediCtrl Hospitals to Revolutionize Healthcare in India

Launched in Pune, MediCtrl Hospitals to Revolutionize Healthcare in India

Pune (Maharashtra) [India], March 25: MediCtrl launched two Hospitals in Pune, namely MediCtrl Apple Hospital and MediCtrl Shree Hospital. The chain of Hospitals is aiming

Tyrant Sports Club announces the Sixth Edition of their Tyrant Premier League, 25th March – 15th April ,2023 at Catholic Gymkhana, Mumbai

Tyrant Sports Club announces the Sixth Edition of their Tyrant Premier League, 25th March – 15th April ,2023 at Catholic Gymkhana, Mumbai

–           Tyrant Premier league will be held from 25th March to 15th April at Catholic Gymkhana, Mumbai – 54 matches will be played over 3

Offering ultimate summer chill to consumers, Havmor Ice Cream becomes official ice cream partner for Gujarat Titans Team and ropes in Hardik Pandya as the brand ambassador   

Offering ultimate summer chill to consumers, Havmor Ice Cream becomes official ice cream partner for Gujarat Titans Team and ropes in Hardik Pandya as the brand ambassador  

L-R: Mr. Youngdong Jin – COO-HavmorIce Cream, Wrridhiman Saha, Rahul Tewatia , Shivam Mavi, Mr. Komal Anand, Managing Director,HavmorIce Cream, KS Bharat, Pradeep Sangwan, and