Write a program.md that describes your research goal and point an AI agent at it.
It modifies your code, trains, evaluates, keeps or discards — and repeats, while you sleep.
# Install and init your project $ pip install autoresearchstudio $ ars init Created autoresearch.yaml Created program.md # Generate a dashboard API key $ ars key โ API key saved to autoresearch.yaml # Launch Claude Code โ it takes over from here $ claude โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ Claude Code โ โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ > Hi Claude, read program.md and start the experiment. Claude: Setting up run... ars setup --tag v1 ars run -d "baseline" accuracy: 95.40 ars judge KEEP โ baseline accepted Claude: Trying dropout + cosine lr schedule... ars run -d "add dropout + cosine lr" accuracy: 97.20 ars judge KEEP โ 95.40 โ 97.20 Claude: Testing gelu activation... ars run -d "replace relu with gelu" accuracy: 96.80 ars judge DISCARD โ regressed, reverted Claude: Continuing to next experiment...
The agent repeats this cycle autonomously until your model converges.
You write program.md and autoresearch.yaml. The agent edits train.py.
You are an ML researcher optimizing a neural network for image classification. Goal: maximize accuracy on the test set. Strategy: - Try architectural changes first - Then tune hyperparameters - Prefer simpler solutions - Keep changes small and focused
project: name: mnist-research goal: Maximize accuracy files: editable: [train.py] readonly: [prepare.py] experiment: run_command: python train.py timeout: 120 metric: name: accuracy pattern: "Accuracy: ([0-9.]+)" direction: maximize
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
return self.fc2(x)
# training loop ...
print(f"Accuracy: {acc:.2f}")
Now go to sleep.
Claude will take it from here.