Artificial Intelligence
Ai2’s RewardBench 2 Is a Tougher Benchmark for Testing How Well AI Models Reflect Human Judgment
Ai2 has released an update to its RewardBench benchmark, making it more capable of evaluating reward models. The next-generation test is built using new, more complex examples to assess how accurately AI models can produce answers that are as accurate as those of a human. In its first round of testing, RewardBench 2 ranks Google’s […]