Agent Eval Harness Finder
Pricing
Pay per usage
Agent Eval Harness Finder
Catalog open-source agent eval harnesses & benchmarks (SWE-Bench, AgentBench, ToolBench, BIRD, GAIA, MAST, WebArena). Combines GitHub search + curated seed list, scores by quality signals (stars, recency, license), parses README scope and sample model scores.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Yanlong Mu
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
10 days ago
Last modified
Categories
Share