autoresearch.studio

Turn any ML training script into an autonomous research loop. An AI agent modifies your code, runs experiments, keeps what improves the metric, discards what doesn't — and repeats, while you sleep.

Get started

terminal

# Install and init — one command creates everything
$ pip install autoresearchstudio
$ ars init
Creating project files:
  train.py
  prepare.py
  CLAUDE.md
Connecting to dashboard...
  API key: ars_bd02b4cc...
  Dashboard: https://autoresearch.studio/?key=ars_bd02...

# Launch Claude Code — it handles everything
$ claude
> configure and start the experiments

Claude: Setting up run...
  ars setup --tag mar23
  ars run -d "baseline"
  val_accuracy: 0.9506
  KEEP — baseline

Claude: Trying dropout + cosine lr...
  ars run -d "add dropout + cosine lr"
  val_accuracy: 0.9720
  KEEP — 0.9506 → 0.9720

Claude: Testing gelu activation...
  ars run -d "replace relu with gelu"
  val_accuracy: 0.9680
  DISCARD — regressed, reverted

Claude: Continuing to next experiment...

One command, full project

ars init generates everything. Customize for your task, or run the MNIST default as-is.

🧪 train.py

# ── Model (modify this) ──────────
class Net(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.fc1 = nn.Linear(64*7*7, 128)
        self.fc2 = nn.Linear(128, 10)

# ── Hyperparams (modify these) ───
LEARNING_RATE = 1e-3
BATCH_SIZE = 128

# ── Output (keep this format) ────
print(f"val_accuracy: {acc:.6f}")

The agent modifies this file. Every change is a git commit — keeps advance the branch, discards are reverted.

🔒 prepare.py

# Read-only during experiments
TIME_BUDGET = 20  # seconds

def download_data():
    # downloads MNIST (or your data)
    ...

def load_data():
    # returns (train_X, train_y,
    #          test_X, test_y)
    ...

def evaluate_accuracy(model, X, y):
    # fixed eval — agent can't game it
    ...

Data loading + evaluation harness. Read-only during experiments so the agent can't game the metric.

⚙️ autoresearch.yaml

project:
  name: my-autoresearch
  goal: Get the highest val_accuracy.

files:
  editable: [train.py]
  readonly: [prepare.py]

experiment:
  run_command: python train.py
  timeout: 60

metric:
  name: val_accuracy
  pattern: "^val_accuracy:\\s+([\\d.]+)"
  direction: maximize

Config: metric, timeout, files. API key and dashboard URL are set automatically by ars init.

autoresearch.studio

The loop

One command, full project