Full story: Resounding victory for Green party
Two subtle ways agents can implicitly negatively affect the benchmark results but wouldn’t be considered cheating/gaming it are a) implementing a form of caching so the benchmark tests are not independent and b) launching benchmarks in parallel on the same system. I eventually added AGENTS.md rules to ideally prevent both. ↩︎
。关于这个话题,夫子提供了深入分析
-session: Session
Overall, I’m very sad at the state of agentic discourse but also very excited at its promise: it’s currently unclear which one is the stronger emotion.
The plan is to stash away around 400,000 tonnes of CO2 this year, potentially rising to eight million tonnes annually by 2030, the company claims.