🎁 Welcome to The Byte 2.0!

The Byte is The AI Collective’s leading publication for thought leaders sharing about the incredible discoveries they’ve made and projects they are working on.

❝

In this week’s edition of The Byte, Di Jin, Co-Founder of EigenAI explores how AI agents mature through experience rather than accumulation. He introduces Hierarchical Cognitive Caching: a memory system that helps agents distill messy work traces into reusable skills, enabling stronger long-horizon performance on real-world ML engineering tasks.

For more information or how to get in touch with The Byte editorial team, send us a message at [email protected].

~ Josh Evans, Managing Editor

Global Leader in Machine Learning Coding Agent: ML-Master 2.0 Tops OpenAI’s MLE-Bench

The Eigen AI team has officially unveiled ML-Master 2.0, an autonomous AI agent for machine learning engineering. Powered by the open-source DeepSeek model, the agent has achieved a new State-of-the-Art (SOTA) on OpenAI’s MLE-bench, surpassing established benchmarks from Google, Meta, and Microsoft. The system is now opening its waiting list on the SciMaster platform for researchers worldwide.

From Science Fiction to Reality

Humanity has long envisioned agents capable of autonomous exploration—from the “Sophons” in The Three-Body Problem interfering with fundamental physics, to “HAL” in 2001: A Space Odyssey, and Asimov’s reasoning robots. The core question remains:

❝

What happens when intelligent agents evolve from mere tools into autonomous engineers capable of long-term hypothesis testing and self-correction?

As Large Language Model (LLM) capabilities advance, this is moving from imagination to a technical imperative. Researchers now recognize that the true milestone is not whether an AI can “solve a puzzle,” but whether it can distill signals out of noise in iteratively finding the “optimal solutions” in long-term research.

Google DeepMind’s AlphaEvolve seeks to refine strategies through long-term evolution.
OpenAI’s Frontier Science focuses on iterative performance in real scientific tasks.
The Genesis Mission (often referred to as the “AI Manhattan Project”) aims to integrate AI into national-level scientific frameworks systematically.

These diverse paths converge on the single consensus that the AI that truly drives progress must withstand long-term trial and error in real research environments. This has accelerated the AI4AI (AI for AI) movement, where the focus is on AI driving its own growth to support increasingly complex scientific tasks.

The Challenge of Machine Learning Engineering

OpenAI’s MLE-bench targets Machine Learning Engineering (MLE) precisely because it mirrors the reality of research. Unlike idealized Q&A, real MLE involves cycles of experimental design, coding, debugging, and analysis that can last dozens of hours. This makes MLE-bench one of the few benchmarks capable of reflecting an AI’s capacity for long-term scientific evolution.

ML-Master 2.0, developed by a joint team from the School of Artificial Intelligence at Shanghai Jiao Tong University (SJTU), the Shanghai Institute for Algorithms and Innovation, and DP Technology, was built specifically for this mission. Supported by Eigen AI’s high-performance infrastructure and built on the DeepSeek-V3.2-Speciale open-source model, the agent achieved the global #1 spot on MLE-bench, outperforming agents developed by teams at Google, Meta, and Microsoft.

More importantly, ML-Master 2.0 is already being utilized in laboratories for frontier applications, including Embodied AI training and Theoretical Physics simulations.

❝

Building for Ultra-Long-Horizon Autonomy: Hierarchical Cognitive Caching (HCC)

Real-world research is rarely about getting it right on the first try; it is a cycle of hypothesis, failure, and revision. A key design element for ML-Master 2.0 is based on Ultra-Long-Horizon Autonomy, emphasizing three key capabilities: persistence, learning from failure, and knowledge transfer.

To manage these long-term tasks without “context explosion,” ML-Master 2.0 introduces Hierarchical Cognitive Caching (HCC). This treats context as a living asset rather than disposable data:

Experience: Immediate execution paths for current decisions.
Knowledge: Stable conclusions verified through repeated testing.
Wisdom: High-level strategies and cognitive prototypes are reusable across different tasks.

By filtering and promoting valuable insights through these layers, the system maintains a stable research rhythm, effectively managing the “memory” of the scientific process.

The results on MLE-bench—a 56.44% medal rate, marking a 28.3% improvement over previous leading models—demonstrate that treating cognitive processes as evolvable resources is the key to autonomous discovery.

Our Hopes and Takeaways

If you’re building agents for coding, research, ops, or anything where tasks sprawl, here are three design lessons we think travel well:

1) Make retrospectives a first-class primitive

Don’t only “think longer.” Pause, distill, and promote.

2) Separate “what happened” from “what matters”

Keep raw traces accessible, but operate on compact, validated knowledge most of the time.

3) Optimize for transfer

The best memory is the one that makes the next task cheaper.

ML-Master 2.0 shows that long-horizon machine learning engineering is no longer a “prompting” problem—it’s an autonomy problem: persist, learn from failure, and transfer knowledge across iterations. By turning context into an evolvable asset via Hierarchical Cognitive Caching (HCC), the agent sustains 10+ hour research loops without losing the plot, translating directly into SOTA performance on OpenAI’s MLE-bench.

As AI4AI accelerates, systems like ML-Master 2.0 offer a concrete blueprint for agents that don’t just answer once, but improve through doing.

❝

The core code for ML-Master is currently open-sourced on GitHub, which you can access using the button below.

Access the Model

Thanks for reading The Byte!

The Byte is The AI Collective’s insight series highlighting non-obvious AI trends and the people uncovering them, curated by Josh Evans and Noah Frank. Questions or pitches: [email protected].

➡️ Before You Go

Partner With Us

Launching a new product or hosting an event? Put your work in front of our global audience of builders, founders, and operators — we feature select products and announcements that offer real value to our readers.

❝

👉 To be featured or sponsor a placement, reach out to our team.

🤝 Partner With Us

The AI Collective is a community of volunteers, made for volunteers. All proceeds directly fund future initiatives that benefit this community.

Stay Connected

💬 Slack: AI Collective
🧑‍💼 LinkedIn: The AI Collective
𝕏 Twitter / X: @_AI_Collective

Get Involved

About the Authors

About Di Jin

Di Jin is a co-founder of Eigen AI and an accomplished NLP researcher with over 60 top-conference papers and 5,000+ citations, currently working on human-aligned training and long-horizon intelligence for large language models.

🗯️ The Byte: Turning Work Traces into Reusable Skills [Exclusive]

🎁 Welcome to The Byte 2.0!

Global Leader in Machine Learning Coding Agent: ML-Master 2.0 Tops OpenAI’s MLE-Bench

From Science Fiction to Reality

The Challenge of Machine Learning Engineering

Building for Ultra-Long-Horizon Autonomy: Hierarchical Cognitive Caching (HCC)

Our Hopes and Takeaways

1) Make retrospectives a first-class primitive

2) Separate “what happened” from “what matters”

3) Optimize for transfer

Thanks for reading The Byte!

➡️ Before You Go

Partner With Us

Stay Connected

Get Involved

About the Authors

Add Your Thoughts

Keep Reading

Your new favorite newsletter.
Welcome to the Human Side of AI.

🗯️ The Byte: Turning Work Traces into Reusable Skills [Exclusive]

🎁 Welcome to The Byte 2.0!

Global Leader in Machine Learning Coding Agent: ML-Master 2.0 Tops OpenAI’s MLE-Bench

From Science Fiction to Reality

The Challenge of Machine Learning Engineering

Building for Ultra-Long-Horizon Autonomy: Hierarchical Cognitive Caching (HCC)

Our Hopes and Takeaways

1) Make retrospectives a first-class primitive

2) Separate “what happened” from “what matters”

3) Optimize for transfer

Thanks for reading The Byte!

➡️ Before You Go

Partner With Us

Stay Connected

Get Involved

About the Authors

Add Your Thoughts

Keep Reading

Your new favorite newsletter. Welcome to the Human Side of AI.

Your new favorite newsletter.
Welcome to the Human Side of AI.