Skip to main content

Documentation Index

Fetch the complete documentation index at: https://snowflake-84d72a0d.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Every fastkernels agent, fastkernels kernels, fastkernels eval, and fastkernels e2e run is automatically logged to MLflow. This gives you:
  • Kernel lineage — every generated kernel is stored as an MLflow artifact, linked to the run parameters that produced it
  • Benchmark history — speedup, correctness, and max error ratio for every operator across every run
  • Run comparison — compare runs side-by-side and visualize how metrics evolve across iterations
Tracking data is stored locally in mlruns/ (gitignored). If mlflow is not installed, all tracking calls silently become no-ops.

What Gets Logged

CommandLogged data
fastkernels agentRun params, per-op generation success/failure, unit test results, e2e speedup, kernel source code
fastkernels kernelsBench params, per-operator per-scenario speedup/correctness, kernel source code
fastkernels evalPer-model throughput/latency speedup, alignment rate, MacroEval speedup/correctness/coverage/score, wall-clock time
fastkernels e2eThroughput (tokens/s), latency (percentiles), serve metrics (TTFT, TPOT, ITL)
Each run is tagged with a tier to distinguish its source:
Tier tagSourceKey metrics
agentfastkernels agentgen_{op}_success, utest_{op}_success, e2e_speedup, e2e_token_match_rate
kernelfastkernels kernels{op}_avg_speedup, {op}_passed, {op}_failed, avg_speedup
evalfastkernels evalavg_throughput_speedup, avg_latency_speedup, alignment_rate, macro_speedup, macro_correctness, macro_coverage, macro_score
e2efastkernels e2etokens_per_second, avg_latency, mean_ttft_ms (varies by bench type)

Querying from the CLI

# Recent runs across all tiers
fastkernels history

# History for a specific operator
fastkernels history --op rms_norm

# Best-ever speedup per operator
fastkernels history --best

# Show more results
fastkernels history --limit 50

MLflow Web UI

fastkernels mlflow-ui
# Open http://localhost:5000
# Press Ctrl+C to stop
The UI launches a local MLflow server backed by the mlruns/ directory. All runs appear under the fastkernels experiment.
  1. Experiment list (left sidebar) — select the fastkernels experiment to see all tracked runs.
  2. Runs table — each row is a tracked run. Columns show run name, start time, duration, and logged metrics. Click column headers to sort (e.g., sort by e2e_speedup to find your fastest runs).
  3. Search and filter — use the search bar above the runs table with MLflow filter syntax:
params.level = "1"                    # L1 runs only
params.cuda_only = "True"             # CUDA-only agent runs
metrics.e2e_speedup > 1.0             # runs that beat the baseline
tags.tier = "agent"                   # agent runs only (vs "kernel", "eval", "e2e")

Inspecting a Run

Click any run row to open its detail page:
  • Parameters — model, level, TP degree, LLM model, seed, and other run configuration
  • Metrics — per-operator generation success (gen_rms_norm_success), unit test results (utest_rms_norm_success, utest_rms_norm_max_diff), and end-to-end results (e2e_speedup, e2e_token_match_rate)
  • Artifacts — browse the kernels/ folder to view and download the exact source code of every generated kernel; failed generations store error traces under errors/

Comparing Runs

  1. Select two or more runs using the checkboxes in the runs table.
  2. Click Compare. The comparison view shows:
    • Parameter diff — which settings changed between runs (e.g., cuda_only: True vs False)
    • Metric comparison — side-by-side values for e2e_speedup, e2e_token_match_rate, per-operator metrics
    • Artifact diff — compare kernel source code between runs to see how generated code evolved

Downloading Kernel Artifacts

From any run’s artifact browser, click a kernel file (e.g., kernels/rms_norm.py) to preview it. Use the download button to save it locally. You can also download artifacts programmatically:
import mlflow

mlflow.set_tracking_uri("file://path/to/fastkernels/mlruns")
client = mlflow.tracking.MlflowClient()

# Download a specific kernel from a run
client.download_artifacts("<run_id>", "kernels/rms_norm.py", "/tmp/")

# List all artifacts for a run
for artifact in client.list_artifacts("<run_id>", "kernels"):
    print(artifact.path)

Tracking API

Any kernel optimization script or custom agent can use the tracking API directly. This is the same API that fastkernels agent, fastkernels kernels, fastkernels eval, and fastkernels e2e use internally.
from fastkernels.bench.tracking import tracker

with tracker.start_run("my-optimization-v3", params={
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "level": 1,
    "strategy": "triton-fused",
}):
    # Log a generated kernel (stored as MLflow artifact)
    tracker.log_kernel("rms_norm", level=1, code=kernel_source)

    # Log kernel benchmark results (pass KernelBenchResult directly)
    tracker.log_kernel_bench(result)

    # Log eval results (pass EvalReport directly)
    tracker.log_eval(report)

    # Log e2e benchmark results
    tracker.log_e2e(results_dict, bench_type="throughput")

    # Log any custom metrics
    tracker.log_metrics({"my_score": 0.95, "compile_time_s": 12.3})

API Reference

FunctionPurpose
tracker.start_run(name, params, tags)Context manager that opens an MLflow run
tracker.log_kernel(op, level, code)Log kernel source code as artifact
tracker.log_kernel_bench(result)Log KernelBenchResult metrics and kernel artifacts
tracker.log_eval(report)Log EvalReport metrics
tracker.log_e2e(results, bench_type)Log E2E benchmark metrics
tracker.log_metrics(dict)Log arbitrary key-value metrics
tracker.query_runs(experiment, filter_string, max_results)Query tracked runs (used by fastkernels history)

Design Principles

  • Agent-agnostic — works with any agent, script, or manual workflow
  • Dataclass-nativelog_kernel_bench() takes KernelBenchResult, log_eval() takes EvalReport — the same objects produced by the benchmark suite
  • Graceful degradation — if mlflow is not installed, one warning is printed and all calls become no-ops
  • Exception-safe — logging errors never crash benchmarks

Disabling Tracking

Tracking is always-on when mlflow is installed. To disable:
pip uninstall mlflow
The system degrades gracefully — a single warning is printed on the first tracking call, and all subsequent calls are silent no-ops. No code changes needed.