Turn any ML training script into an autonomous research loop. An AI agent modifies your code, runs experiments, keeps what improves the metric, discards what doesn't — and repeats, while you sleep.
# Install and init β one command creates everything $ pip install autoresearchstudio $ ars init Creating project files: train.py prepare.py CLAUDE.md Connecting to dashboard... API key: ars_bd02b4cc... Dashboard: https://autoresearch.studio/?key=ars_bd02... # Launch Claude Code β it handles everything $ claude > configure and start the experiments Claude: Setting up run... ars setup --tag mar23 ars run -d "baseline" val_accuracy: 0.9506 KEEP β baseline Claude: Trying dropout + cosine lr... ars run -d "add dropout + cosine lr" val_accuracy: 0.9720 KEEP β 0.9506 β 0.9720 Claude: Testing gelu activation... ars run -d "replace relu with gelu" val_accuracy: 0.9680 DISCARD β regressed, reverted Claude: Continuing to next experiment...
The agent repeats this cycle autonomously until your model converges.
ars init generates everything. Customize for your task, or run the MNIST default as-is.
# ββ Model (modify this) ββββββββββ class Net(nn.Module): def __init__(self): self.conv1 = nn.Conv2d(1, 32, 3) self.conv2 = nn.Conv2d(32, 64, 3) self.fc1 = nn.Linear(64*7*7, 128) self.fc2 = nn.Linear(128, 10) # ββ Hyperparams (modify these) βββ LEARNING_RATE = 1e-3 BATCH_SIZE = 128 # ββ Output (keep this format) ββββ print(f"val_accuracy: {acc:.6f}")
# Read-only during experiments TIME_BUDGET = 20 # seconds def download_data(): # downloads MNIST (or your data) ... def load_data(): # returns (train_X, train_y, # test_X, test_y) ... def evaluate_accuracy(model, X, y): # fixed eval β agent can't game it ...
project: name: my-autoresearch goal: Get the highest val_accuracy. files: editable: [train.py] readonly: [prepare.py] experiment: run_command: python train.py timeout: 60 metric: name: val_accuracy pattern: "^val_accuracy:\\s+([\\d.]+)" direction: maximize
ars init.Now go to sleep.
Claude will take it from here.