Over the weekend, Andrej Karpathy—the influential former Tesla AI lead and co-founder and former member of OpenAI who coined the term "vibe coding"— posted on X about his new open source project, autoresearch . It wasn't a finished model or a massive corporate product: it was by his own admission a simple, 630-line script made available on Github under a permissive, enterprise-friendly MIT License...
Over the weekend, Andrej Karpathy—the influential former Tesla AI lead and co-founder and former member of OpenAI who coined the term "vibe coding"— posted on X about his new open source project, autoresearch . It wasn't a finished model or a massive corporate product: it was by his own admission a simple, 630-line script made available on Github under a permissive, enterprise-friendly MIT License. But the ambition was massive: automating the scientific method with AI agents while us humans sleep. "The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement," he stated on X. The system functions as an autonomous optimization loop. An AI agent is given a training script and a fixed compute budget (typically 5 minutes on a GPU). It reads its own source code, forms a hypothesis for improvement (such as changing a learning rate or an architecture depth), modifies the code, runs the experiment, and evaluates the results. If the validation loss—measured in bits per byte ( val_bpb )—improves, it keeps the change; if not, it reverts and tries again. In one overnight run, Karpathy’s agent completed 126 experiments , driving loss down from 0.9979 to 0.9697. Today, Karpathy reported that after leaving the agent to tune a "depth=12" model for two days, it successfully processed approximately 700 autonomous changes. The agent found roughly 20 additive improvements that transferred perfectly to larger models. Stacking these changes dropped the "Time to GPT-2" metric on the leaderboard from 2.02 hours to 1.80 hours—an 11% efficiency gain on a project Karpathy believed was already well-tuned. "Seeing the agent do this entire workflow end-to-end and all by itself... is wild," Karpathy remarked, noting that the agent caught oversights in attention scaling and regularization that he had missed manually over two decades of work. This is more than just a productivity hack; it is a fundamental shift in how intelligence is...