Org-level defaults for EvalOps repositories live here. Changes in this repo can alter issue intake, pull request review prompts, reusable workflow behavior, dependency update policy, and the public organization profile.
Treat this repository as a small control plane: conventions should be explicit, validated, and easy for downstream repos to adopt without copying private operational assumptions.
| Path | Purpose |
|---|---|
.github/ISSUE_TEMPLATE/ |
Default issue forms for EvalOps repos that do not override them. |
.github/agent-mcp/ |
Canonical EvalOps MCP client config templates for public repo rollout. |
.github/codex/hooks/ |
Example Codex hook pack for local EvalOps agent guardrails. |
.github/pull_request_template.md |
Default PR evidence checklist. |
.github/workflows/ |
Reusable or self-validating workflows owned by the org defaults repo. |
.github/workflow-templates/ |
Workflow picker templates for downstream adoption. |
.github/contracts/ |
Versioned org-default contracts and conformance expectations. |
.github/scripts/ |
Small helper scripts used by reusable workflows and validation rails. |
profile/ |
Public organization profile and operating conventions. |
labels.yml |
Canonical additive label taxonomy for EvalOps repositories. |
renovate-config.json |
Shared Renovate preset for dependency update policy. |
services.yaml |
Lightweight service catalog for ownership, topology, and runtime tiering. |
- Start from fresh
origin/main. This repo is small, but its effects are broad, so avoid stacking process changes on stale branches. - Check open issues and recent PRs in
evalops/.githubbefore adding a new convention. If the change is really a downstream rollout, open tracking issues in the owning repos instead of hiding the work here. - Keep defaults portable. Do not include repo-specific secrets, one-off runner assumptions, or private environment details.
- Pair every new convention with a validation path. Prefer a reusable workflow, test, or script over prose-only policy when the rule can be checked.
- Publish via PR and let downstream owners object if the wording or guardrail is too broad.
Use the workflow templates under .github/workflow-templates/ to add Codex
lanes to downstream repositories:
codex-pr-review.ymlreviews PR diffs and posts focused findings.codex-structured-pr-review.ymlreviews PR diffs with a JSON schema and posts actionable findings as inline review comments.review-thread-guard.ymlfails PRs that still have unresolved, non-outdated high-priority review threads.codex-ci-triage.ymltriages a specific failed Actions run.codex-post-merge-verify.ymlchecks default-branch health after merges.codex-label-churn-audit.ymlaudits PR label mutation loops.
Each template expects an OPENAI_API_KEY repository secret. Repositories that
need stronger, repo-specific behavior should copy the matching prompt from
.github/codex/prompts/ into their own .github/codex/prompts/ directory and
point the workflow at that file.
For deeper adoption patterns beyond PR comments, see
profile/CODEX_HIGH_LEVERAGE_WORKFLOWS.md.
Use .github/workflows/agent-authorship-label.yml to apply one authorship label
to each PR from commit trailers:
agent-authoredagent-assistedmixed-authorship
Downstream repos can adopt it from the workflow template picker or with:
name: Agent authorship labels
on:
pull_request_target:
types: [opened, synchronize, reopened, ready_for_review, edited]
permissions:
contents: read
pull-requests: write
issues: write
jobs:
label:
uses: evalops/.github/.github/workflows/agent-authorship-label.yml@mainFor production repos, pin the reusable workflow to a reviewed commit SHA and
pass the same SHA as helper_ref. That keeps the workflow and helper scripts on
one immutable revision.
Use .github/workflow-templates/review-thread-guard.yml on repos where review
threads should be merge blockers:
name: Review thread guard
on:
pull_request:
types: [opened, synchronize, reopened, ready_for_review]
permissions:
contents: read
pull-requests: read
jobs:
unresolved-review-threads:
uses: evalops/.github/.github/workflows/review-thread-guard.yml@main
with:
pr_number: ${{ github.event.pull_request.number }}The guard blocks unresolved, non-outdated review threads at high severity or
above by default. Use workflow_dispatch with min_severity=p1 for repos that
only want release-blocking findings to fail.
Use .github/workflows/codex-rails-check.yml to validate repository operating
rails:
- issue template YAML
- workflow and workflow-template YAML
- workflow template metadata
- org control-plane contract shape and evidence chain
- canonical
labels.ymlshape AGENTS.mdpresence and non-empty content- skill frontmatter
services.yamlcatalog shape- Ruby tests under
test/
The workflow can be called by downstream repos:
jobs:
codex-rails:
uses: evalops/.github/.github/workflows/codex-rails-check.yml@main
with:
require_agents: trueservices.yaml is intentionally lightweight. It should answer:
- which repo owns a service or tool
- which team is accountable for it
- whether it is critical, standard, or experimental
- where it runs
- which other catalog entries it depends on
- whether it consumes shared protobuf contracts
Validate it locally with:
ruby .github/scripts/validate-services-catalog.rb services.yamlUse depends_on only for entries that also appear in services.yaml. Use
external links or notes in the owning repo for third-party dependencies.
Before opening a PR from this repo, run the narrow checks that match the change:
ruby .github/scripts/verify-org-control-plane-contract.rb \
--json-output org-control-plane-contract-report.json \
--markdown-output org-control-plane-contract-report.md
ruby -Itest -e 'ARGV.each { |path| require "./#{path}" }' test/*_test.rb
ruby .github/scripts/validate-services-catalog.rb services.yaml
git diff --checkIf workflows changed and actionlint is available, run it on touched workflow
files. Then check the PR's live GitHub Actions results before merging.
The contract in .github/contracts/org-control-plane.yml turns the repo's
agent-facing defaults into explicit conformance requirements. It names the
correctness model, threat model, SLO dimensions, provenance IDs, and adversarial
fixtures for prompt, tool, and data poisoning. The verifier emits JSON and
Markdown reports with source digests so downstream agents can cite the exact
inputs and decisions behind an org-default change.
See profile/ORG_CONTROL_PLANE_CONTRACT.md for the design note.
labels.yml is the canonical EvalOps label set, seeded from
evalops/platform. .github/workflows/sync-labels.yml dry-runs on PRs and
comments a per-repo diff. On main, weekly schedule, or manual dispatch with
apply=true, it reconciles active evalops/* repos additively: missing labels
are created, matching names get color/description updates, and repo-local labels
are left alone. A repo can opt out by committing .github/labels-sync.disabled.
Validate the taxonomy without touching GitHub:
ruby .github/scripts/sync-labels.rb --validate-only --labels labels.ymlThe templates in .github/agent-mcp/templates/ define the committed client
config for public repos:
.mcp.jsonfor Claude Code and other MCP clients that read the common JSON shape..codex/config.tomlfor Codex..cursor/mcp.jsonfor Cursor.- an
AGENTS.mdsection explaining the EvalOps integration. .gitignoreentries for local API-key fallbacks.
Check or write those files in any repo checkout with:
ruby .github/scripts/sync-agent-mcp-config.rb --workspace /path/to/repo --check
ruby .github/scripts/sync-agent-mcp-config.rb --workspace /path/to/repo --write.github/workflows/agent-mcp-config-rollout.yml validates the templates on PRs.
Manual dispatch with apply=true and EVALOPS_MCP_ROLLOUT_TOKEN (or
EVALOPS_ORG_WRITE_TOKEN) opens rollout PRs against either the requested repos
or all active public evalops/* repos.
.github/scripts/evalops-codex-hook-guard.rb implements warning-first local
guardrails for EvalOps agent work: session-start process reminders, dirty
worktree warnings before destructive git commands, and merge/readiness nudges
when review-thread evidence is missing. The example hook config is
.github/codex/hooks/evalops-hooks.toml.
See profile/CODEX_HOOK_GUARDRAILS.md for install notes and limitations.
profile/GOVERN_EXISTING_AI_FLEET.md records the current EvalOps positioning
thesis and concrete retrofit surfaces. profile/TYPESCRIPT_TOOLING_STANDARD.md
captures the gts/wireit standardization path, including pilot criteria and
non-goals.
.github/workflows/archived-dependabot-audit.yml runs a read-only audit for
archived EvalOps repos that still have .github/dependabot.yml or open
Dependabot PRs. The pre-archive checklist in profile/ARCHIVAL_RUNBOOK.md
requires removing Dependabot config and clearing bot PRs before setting
archived=true.
Run the audit locally with:
ruby .github/scripts/audit-archived-dependabot.rb \
--owner evalops \
--json-output archived-dependabot-audit.json \
--markdown-output archived-dependabot-audit.md.github/workflows/evalops-pr-lens-review.yml sweeps open PRs in
evalops/platform, evalops/deploy, and evalops/maestro-internal every two
hours and can be run manually for specific repo#number targets. It fans out
one reviewer per lens:
- migration safety
- NATS contract drift
- Argo manifest skew
- IAM blast radius
- generated SDK delta
- eval regression risk
Each lens writes a stable commit status context named
evalops-pr-lens/<lens>. The meta-review step ranks findings by confidence,
updates evalops-pr-lens/meta-review, and only posts a PR comment when findings
clear the configured high-confidence threshold.
Required secrets in evalops/.github:
EVALOPS_PR_LENS_TOKEN: GitHub token with read/write access to the target repos for statuses and PR comments.ANTHROPIC_API_KEYorEVALOPS_ANTHROPIC_API_KEY: Anthropic key for Opus lens reviewers.OPENAI_API_KEYorEVALOPS_OPENAI_API_KEY: optional fallback when manually dispatching withprovider=openai.