First experiment running in <5 min

Connect repo.
Sleep.
Review diffs.

Skip the GPU server config, SSH keys, and CUDA installs. Connect your repo in minutes - NightResearch runs your autoresearch loop overnight, proposes code changes, and delivers a validated diff by morning.

Teams reclaim 20+ hours/week - setup once, iterate every night

Start with GitHub

How it works

Free credits included with every plan

5-min setupNo SSH or GPU configReviewable diff by morning

nightresearch - experiment loop

nightresearch

# Running baseline...

val_bpb: 0.9979

---

Attempt #3 - Pre-norm block

val_bpb: 0.9841 (-0.0138)

Improvement kept

---

Best: 0.9614 (-3.66%) in 4h

Patch ready for review

< 5 min

Avg time to first experiment

2,400+

Experiments run overnight

20+ hrs

Saved per team per week

“Setup took maybe 4 minutes. We pointed it at our training loop before bed and woke up to a 4% val loss improvement. The patch was clean enough to merge directly.”

Priya Sharma

ML Engineer, Stealth AI Startup

“We used to spend half a day just getting a GPU environment working. Now it's 5 minutes to connect the repo, and my team reviews diffs instead of debugging CUDA.”

Marcus Chen

Research Lead, University ML Lab

“The time I used to spend on SSH configs and environment setup is now zero. The agent runs overnight, and I come in to a reviewable diff - it's that simple.”

Elena Vogt

Senior MLE, Series B Startup

How it works

From repo to running autoresearch experiment in 5 minutes

Connect your repo - 2 min

Install the GitHub App, pick your repo. No server to provision, no SSH keys, no CUDA installs. We auto-configure everything from your project.

org/ml-trainingConnected

org/another-repo

Confirm the setup - 3 min

Review the edit scope and success metric. Add an optional note for the agent, set a budget, and save. That's it - no environment debugging.

# experiment.yaml

metric: val_bpb

direction: minimize

edit_scope: [train.py, model.py]

budget: 120 NightCredits

Wake up to results

The autoresearch agent runs on GPU overnight, validates every change, and delivers the best improvement as a clean reviewable diff. You review; it already ran.

Best: -3.66% improvement

Why NightResearch

Hours of setup vs. 5 minutes to first run

Before

Spend hours provisioning a GPU server, configuring SSH, fixing CUDA versions
Debug dependency conflicts before the first experiment even starts
Manually queue experiments before leaving - and hope nothing crashes
Wake up to failed runs, wasted GPU budget, and zero progress

After

First experiment running in under 5 minutes - no server setup, no SSH
No CUDA installs, no environment debugging, no wasted setup time
Agent iterates autonomously overnight, validates every change automatically
Wake up to a clean reviewable diff with full reasoning - ready to merge

Real results

What you wake up to - after a 5-min setup

Output from an overnight experiment on a nanoGPT training benchmark. No GPU provisioning, no SSH - just connect the repo, sleep, and wake up to 12 approaches explored and the best one packaged as a reviewable patch.

karpathy/autoresearch · train.py

val_bpb0.9979 → 0.9614↓ 3.66%12 attempts · 5 kept · 4 h runtime

Metric per attempt

KeptReverted

Best patch - 5 changes composed

# optimization

−learning_rate = 0.01

+learning_rate = 0.04

···

− self.ln1 = nn.LayerNorm(n_embd)

+ self.ln1 = RMSNorm(n_embd)

···

def forward(self, x):

− x = x + self.attn(x)

− x = self.ln1(x)

+ x = x + self.attn(self.ln1(x))

+ x = x + self.mlp(self.ln2(x))

return x

···

+def _init_weights(self):

+ scale = 1/math.sqrt(2*n_layer)

Built for trust

Every autoresearch change is earned, not assumed

Validated improvements only

Every change is tested against your actual training pipeline. If the metric doesn't improve, the change is reverted. No hallucinated gains.

Reviewable diffs

The best improvement is packaged as a clean patch with full reasoning. Review it like any PR before merging.

Safety constraints

Define exactly which files the agent can touch, set patch size limits, and require must-pass tests before any change sticks.

Live monitoring

Watch attempts appear in real-time. See what the agent tried, why it tried it, and whether it worked - all while it runs.

Security

Your code stays yours

Sandboxed execution

Every experiment runs in an isolated container with no network access to your infrastructure. Containers are destroyed after each run.

Scoped code access

You define exactly which files the agent can read and edit. Everything else is off-limits, enforced at the platform level.

No permanent repo access

The GitHub App requests only the permissions needed for a single run. We never store your code beyond the experiment lifecycle.

Data handling

Experiment logs and metrics are encrypted at rest and in transit. We do not train on your code or data. You can delete all data at any time.

Pricing

Simple, runtime-based pricing

Every NightCredit covers one minute of active runtime on standard compute. Plans include GPU time, LLM calls, and full platform access with no separate infrastructure fees.

Free credits included to get started

MonthlyAnnual

Launch pricing

Starter

For individual ML engineers running overnight experiments.

$99/mo

$149/mo

14 H100-hours / month
840 NightCredits (1 credit = 1 min)
3 projects
1 concurrent experiment
Standard GPU (H100 SXM)
Email notifications

Additional usage: $0.20/min

Start with Starter

Cancel anytime

Launch pricingMost popular

Pro

For teams iterating on multiple repos in parallel.

$249/mo

$399/mo

32 H100-hours / month
1,920 NightCredits (1 credit = 1 min)
10 projects
3 concurrent experiments
Standard GPU (H100 SXM)
Priority support

Additional usage: $0.20/min

Start with Pro

Cancel anytime

Enterprise

Unlimited NightCredits, dedicated GPU pools, SSO, compliance controls, and volume pricing. Built for teams that need more.

Schedule a demo

FAQ

Common questions

Autoresearch is the practice of running automated ML experiments in a loop — trying changes, validating them against a metric, and keeping only what improves performance. NightResearch runs this autoresearch loop on H100 GPUs overnight so you wake up to validated code improvements without any setup or manual iteration.

Yes. Your repository is cloned into an isolated, sandboxed container that is destroyed after the experiment completes. The agent only has scoped access to files you explicitly allow, and every change is validated against your metric before being kept.

Experiments run on NVIDIA H100 SXM GPUs. Each experiment gets a dedicated GPU with no noisy-neighbor contention. GPU time, LLM inference, and platform access are all included in your plan.

Absolutely. There are no lock-in contracts. You can cancel your subscription at any time from your account settings, and you'll retain access through the end of your billing period.

NightResearch works with any Python-based ML training repository hosted on GitHub. We auto-detect common frameworks like PyTorch, JAX, and TensorFlow. The repo needs a runnable training script and a measurable metric.

NightCredits are our usage unit - 1 NightCredit equals 1 minute of active experiment runtime. Your plan includes a monthly NightCredit allowance. Unused credits do not roll over.

You can keep running experiments at the overage rate of $0.20 per minute. There are no surprise charges - you set a budget cap on every experiment before it launches, and we'll notify you when you're approaching your plan limit.

Stop configuring. Start iterating.

No GPU server, no SSH, no installations. Connect your repo in 5 minutes and run autoresearch experiments overnight. Plans start at $99/month.

Start in 5 minutes

Connect repo.Sleep.Review diffs.