Experiment Tracking

Every fastkernels agent, fastkernels kernels, fastkernels eval, and fastkernels e2e run is automatically logged to MLflow. This gives you:

Kernel lineage — every generated kernel is stored as an MLflow artifact, linked to the run parameters that produced it
Benchmark history — speedup, correctness, and max error ratio for every operator across every run
Run comparison — compare runs side-by-side and visualize how metrics evolve across iterations

Tracking data is stored locally in mlruns/ (gitignored). If mlflow is not installed, all tracking calls silently become no-ops.

What Gets Logged

Command	Logged data
`fastkernels agent`	Run params, per-op generation success/failure, unit test results, e2e speedup, kernel source code
`fastkernels kernels`	Bench params, per-operator per-scenario speedup/correctness, kernel source code
`fastkernels eval`	Per-model throughput/latency speedup, alignment rate, MacroEval speedup/correctness/coverage/score, wall-clock time
`fastkernels e2e`	Throughput (tokens/s), latency (percentiles), serve metrics (TTFT, TPOT, ITL)

Each run is tagged with a tier to distinguish its source:

Tier tag	Source	Key metrics
`agent`	`fastkernels agent`	`gen_{op}_success`, `utest_{op}_success`, `e2e_speedup`, `e2e_token_match_rate`
`kernel`	`fastkernels kernels`	`{op}_avg_speedup`, `{op}_passed`, `{op}_failed`, `avg_speedup`
`eval`	`fastkernels eval`	`avg_throughput_speedup`, `avg_latency_speedup`, `alignment_rate`, `macro_speedup`, `macro_correctness`, `macro_coverage`, `macro_score`
`e2e`	`fastkernels e2e`	`tokens_per_second`, `avg_latency`, `mean_ttft_ms` (varies by bench type)

Querying from the CLI

# Recent runs across all tiers
fastkernels history

# History for a specific operator
fastkernels history --op rms_norm

# Best-ever speedup per operator
fastkernels history --best

# Show more results
fastkernels history --limit 50

Show Sample output: fastkernels history

======================================================================
  RECENT TRACKED RUNS
======================================================================
  TIMESTAMP          RUN NAME                            KEY METRICS
  ──────────────────────────────────────────────────────────────────
  2026-03-16 17:03   agent_L2_Llama-3.1-8B-Instruct      --
  2026-03-16 16:55   agent_L1_Llama-3.1-8B-Instruct      e2e_speedup=1.00x  e2e_token_match_rate=4.5%
  2026-03-16 16:52   agent_L1_Llama-3.1-8B-Instruct      e2e_speedup=0.69x  e2e_token_match_rate=10.5%
======================================================================

Show Sample output: fastkernels history --op rms_norm

======================================================================
  TRACKING HISTORY: rms_norm
======================================================================
  TIMESTAMP          RUN NAME                      SPEEDUP   PASS  ERR_RATIO     RUN ID
  ──────────────────────────────────────────────────────────────────
  2026-03-16 17:03   agent_L2_Llama-3.1-8B-Inst         --     --         --   797bf661
  2026-03-16 16:55   agent_L1_Llama-3.1-8B-Inst         -- gen=FAIL         --   4f879e0a
  2026-03-16 16:52   agent_L1_Llama-3.1-8B-Inst         -- gen=OK         --   df8ab5e3
======================================================================

Show Sample output: fastkernels history --best

======================================================================
  BEST SPEEDUP PER OPERATOR (from kernel benchmarks)
======================================================================
  OPERATOR                  BEST SPEEDUP DATE               RUN ID
  ──────────────────────────────────────────────────────────────────
  rms_norm                         1.64x 2026-03-16 16:44   a1b2c3d4
  rotary_emb                       1.22x 2026-03-15 11:30   d3e4f5a6
  silu_and_mul                     1.45x 2026-03-14 09:55   b7c8d9e0
======================================================================

MLflow Web UI

fastkernels mlflow-ui
# Open http://localhost:5000
# Press Ctrl+C to stop

The UI launches a local MLflow server backed by the mlruns/ directory. All runs appear under the fastkernels experiment.

Navigating the UI

Experiment list (left sidebar) — select the fastkernels experiment to see all tracked runs.
Runs table — each row is a tracked run. Columns show run name, start time, duration, and logged metrics. Click column headers to sort (e.g., sort by e2e_speedup to find your fastest runs).
Search and filter — use the search bar above the runs table with MLflow filter syntax:

params.level = "1"                    # L1 runs only
params.cuda_only = "True"             # CUDA-only agent runs
metrics.e2e_speedup > 1.0             # runs that beat the baseline
tags.tier = "agent"                   # agent runs only (vs "kernel", "eval", "e2e")

Inspecting a Run

Click any run row to open its detail page:

Parameters — model, level, TP degree, LLM model, seed, and other run configuration
Metrics — per-operator generation success (gen_rms_norm_success), unit test results (utest_rms_norm_success, utest_rms_norm_max_diff), and end-to-end results (e2e_speedup, e2e_token_match_rate)
Artifacts — browse the kernels/ folder to view and download the exact source code of every generated kernel; failed generations store error traces under errors/

Comparing Runs

Select two or more runs using the checkboxes in the runs table.
Click Compare. The comparison view shows:
- Parameter diff — which settings changed between runs (e.g., cuda_only: True vs False)
- Metric comparison — side-by-side values for e2e_speedup, e2e_token_match_rate, per-operator metrics
- Artifact diff — compare kernel source code between runs to see how generated code evolved

Downloading Kernel Artifacts

From any run’s artifact browser, click a kernel file (e.g., kernels/rms_norm.py) to preview it. Use the download button to save it locally. You can also download artifacts programmatically:

import mlflow

mlflow.set_tracking_uri("file://path/to/fastkernels/mlruns")
client = mlflow.tracking.MlflowClient()

# Download a specific kernel from a run
client.download_artifacts("<run_id>", "kernels/rms_norm.py", "/tmp/")

# List all artifacts for a run
for artifact in client.list_artifacts("<run_id>", "kernels"):
    print(artifact.path)

Tracking API

Any kernel optimization script or custom agent can use the tracking API directly. This is the same API that fastkernels agent, fastkernels kernels, fastkernels eval, and fastkernels e2e use internally.

from fastkernels.bench.tracking import tracker

with tracker.start_run("my-optimization-v3", params={
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "level": 1,
    "strategy": "triton-fused",
}):
    # Log a generated kernel (stored as MLflow artifact)
    tracker.log_kernel("rms_norm", level=1, code=kernel_source)

    # Log kernel benchmark results (pass KernelBenchResult directly)
    tracker.log_kernel_bench(result)

    # Log eval results (pass EvalReport directly)
    tracker.log_eval(report)

    # Log e2e benchmark results
    tracker.log_e2e(results_dict, bench_type="throughput")

    # Log any custom metrics
    tracker.log_metrics({"my_score": 0.95, "compile_time_s": 12.3})

API Reference

Function	Purpose
`tracker.start_run(name, params, tags)`	Context manager that opens an MLflow run
`tracker.log_kernel(op, level, code)`	Log kernel source code as artifact
`tracker.log_kernel_bench(result)`	Log `KernelBenchResult` metrics and kernel artifacts
`tracker.log_eval(report)`	Log `EvalReport` metrics
`tracker.log_e2e(results, bench_type)`	Log E2E benchmark metrics
`tracker.log_metrics(dict)`	Log arbitrary key-value metrics
`tracker.query_runs(experiment, filter_string, max_results)`	Query tracked runs (used by `fastkernels history`)

Design Principles

Agent-agnostic — works with any agent, script, or manual workflow
Dataclass-native — log_kernel_bench() takes KernelBenchResult, log_eval() takes EvalReport — the same objects produced by the benchmark suite
Graceful degradation — if mlflow is not installed, one warning is printed and all calls become no-ops
Exception-safe — logging errors never crash benchmarks

Disabling Tracking

Tracking is always-on when mlflow is installed. To disable:

pip uninstall mlflow

The system degrades gracefully — a single warning is printed on the first tracking call, and all subsequent calls are silent no-ops. No code changes needed.

Start Here

User Guide

Developer Guide

What Gets Logged

Querying from the CLI

MLflow Web UI

Navigating the UI

Inspecting a Run

Comparing Runs

Downloading Kernel Artifacts

Tracking API

API Reference

Design Principles

Disabling Tracking

​What Gets Logged

​Querying from the CLI

​MLflow Web UI

​Navigating the UI

​Inspecting a Run

​Comparing Runs

​Downloading Kernel Artifacts

​Tracking API

​API Reference

​Design Principles

​Disabling Tracking

What Gets Logged

Querying from the CLI

MLflow Web UI

Navigating the UI

Inspecting a Run

Comparing Runs

Downloading Kernel Artifacts

Tracking API

API Reference

Design Principles

Disabling Tracking