From 527fad8da0e78f198fc62f7e218f97e0a86a4116 Mon Sep 17 00:00:00 2001 From: Danilo Reyes Date: Fri, 30 Jan 2026 22:48:02 -0600 Subject: [PATCH 1/5] init --- AGENTS.md | 3 + .../001-mcp-server/checklists/requirements.md | 34 ++++++ specs/001-mcp-server/contracts/mcp-tools.md | 47 ++++++++ specs/001-mcp-server/data-model.md | 24 ++++ specs/001-mcp-server/plan.md | 68 +++++++++++ specs/001-mcp-server/quickstart.md | 9 ++ specs/001-mcp-server/research.md | 16 +++ specs/001-mcp-server/spec.md | 97 +++++++++++++++ specs/001-mcp-server/tasks.md | 112 ++++++++++++++++++ 9 files changed, 410 insertions(+) create mode 100644 specs/001-mcp-server/checklists/requirements.md create mode 100644 specs/001-mcp-server/contracts/mcp-tools.md create mode 100644 specs/001-mcp-server/data-model.md create mode 100644 specs/001-mcp-server/plan.md create mode 100644 specs/001-mcp-server/quickstart.md create mode 100644 specs/001-mcp-server/research.md create mode 100644 specs/001-mcp-server/spec.md create mode 100644 specs/001-mcp-server/tasks.md diff --git a/AGENTS.md b/AGENTS.md index a55fda8..5e4c973 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -3,6 +3,8 @@ Auto-generated from feature plans. Last updated: 2026-01-30 ## Active Technologies +- Python 3.12 + MCP server library (Python, JSON-RPC/stdin transport), click for CLI entrypoint, pytest + coverage for tests, ruff/black for lint/format (001-mcp-server) +- None (in-memory tool definitions; filesystem access for repo interactions) (001-mcp-server) - Documentation set (AI-facing constitution and playbooks) in Markdown (001-ai-docs) @@ -24,6 +26,7 @@ specs/001-ai-docs/ # Planning artifacts (plan, research, tasks, data model - Keep language business-level and technology-agnostic in AI-facing docs. ## Recent Changes +- 001-mcp-server: Added Python 3.12 + MCP server library (Python, JSON-RPC/stdin transport), click for CLI entrypoint, pytest + coverage for tests, ruff/black for lint/format - 001-ai-docs: Documentation-focused stack; added docs/ for constitution/playbooks and specs/001-ai-docs/ for planning outputs. diff --git a/specs/001-mcp-server/checklists/requirements.md b/specs/001-mcp-server/checklists/requirements.md new file mode 100644 index 0000000..6dbcf13 --- /dev/null +++ b/specs/001-mcp-server/checklists/requirements.md @@ -0,0 +1,34 @@ +# Specification Quality Checklist: MCP Server for Repo Maintenance + +**Purpose**: Validate specification completeness and quality before proceeding to planning +**Created**: 2026-01-30 +**Feature**: specs/001-mcp-server/spec.md + +## Content Quality + +- [x] No implementation details (languages, frameworks, APIs) +- [x] Focused on user value and business needs +- [x] Written for non-technical stakeholders +- [x] All mandatory sections completed + +## Requirement Completeness + +- [x] No [NEEDS CLARIFICATION] markers remain +- [x] Requirements are testable and unambiguous +- [x] Success criteria are measurable +- [x] Success criteria are technology-agnostic (no implementation details) +- [x] All acceptance scenarios are defined +- [x] Edge cases are identified +- [x] Scope is clearly bounded +- [x] Dependencies and assumptions identified + +## Feature Readiness + +- [x] All functional requirements have clear acceptance criteria +- [x] User scenarios cover primary flows +- [x] Feature meets measurable outcomes defined in Success Criteria +- [x] No implementation details leak into specification + +## Notes + +- Checklist completed; no outstanding issues identified. diff --git a/specs/001-mcp-server/contracts/mcp-tools.md b/specs/001-mcp-server/contracts/mcp-tools.md new file mode 100644 index 0000000..a118150 --- /dev/null +++ b/specs/001-mcp-server/contracts/mcp-tools.md @@ -0,0 +1,47 @@ +# MCP Tooling Contracts (JSON-RPC over stdio) + +## listTools +- **Method**: `listTools` +- **Params**: none +- **Result**: + - `tools`: array of Tool objects + - `name`: string (unique) + - `description`: string + - `inputs`: array of InputParam + - `name`: string + - `type`: string (constrained to allowed primitives) + - `required`: boolean + - `description`: string + - `docsAnchor`: object + - `path`: string (under `docs/`) + - `anchor`: string (heading id) + - `summary`: string + +## invokeTool +- **Method**: `invokeTool` +- **Params**: + - `name`: string (must match Tool.name) + - `args`: object (key/value per Tool.inputs) +- **Result**: + - `status`: enum (`ok`, `invalid_input`, `failed`, `unsupported`) + - `output`: string (human-readable result or guidance) + - `actions`: array of suggested follow-ups (optional) + - `docsAnchor`: object (same shape as listTools.docsAnchor) for quick navigation + +## syncDocs +- **Method**: `syncDocs` +- **Purpose**: Validate that documented tools match the live catalog. +- **Params**: none +- **Result**: + - `status`: enum (`ok`, `drift_detected`) + - `missingInDocs`: array of tool names + - `missingInCatalog`: array of doc anchors without tools + - `mismatches`: array of objects + - `name`: string + - `expected`: string (description/input summary) + - `actual`: string + +## Error Handling +- **Transport errors**: standard JSON-RPC error object with code/message. +- **Validation errors**: return `invalid_input` with details in `output`. +- **Unknown methods**: return `unsupported` status with guidance to run `listTools`. diff --git a/specs/001-mcp-server/data-model.md b/specs/001-mcp-server/data-model.md new file mode 100644 index 0000000..9fb2996 --- /dev/null +++ b/specs/001-mcp-server/data-model.md @@ -0,0 +1,24 @@ +# Data Model: MCP Server for Repo Maintenance + +## Entities + +### MCP Server +- **Purpose**: Hosts MCP tools for Codex CLI and orchestrates tool discovery and invocation. +- **Attributes**: transport (`stdio`), tool registry (list of Tool Catalog Entries), doc mapping (anchors/paths), version metadata. +- **Relationships**: Contains many Tool Catalog Entries; references Documentation Anchors for guidance. + +### Tool Catalog Entry +- **Purpose**: Represents a callable maintenance task exposed via MCP. +- **Attributes**: name (unique), description, input schema (parameters, types, required flags), execution scope (paths affected), documentation anchor (path + heading), safeguards (preconditions/guards), tags (category). +- **Relationships**: Linked to one Documentation Anchor; owned by MCP Server. +- **Uniqueness**: Name must be unique across the catalog. + +### Documentation Anchor +- **Purpose**: Points to the AI documentation section explaining when and how to use a tool. +- **Attributes**: doc path (under `docs/`), heading id/anchor, summary, last-synced version marker. +- **Relationships**: Referenced by Tool Catalog Entries; aligns with AI documentation updates. + +### CI Job +- **Purpose**: Executes lint, format, and test suites for MCP server on scoped path changes. +- **Attributes**: path filters (`scripts/**`, `docs/**`), steps (setup, install deps, lint, format check, tests, coverage), status output. +- **Relationships**: Observes repository changes; reports status to Gitea pipeline. diff --git a/specs/001-mcp-server/plan.md b/specs/001-mcp-server/plan.md new file mode 100644 index 0000000..d4ec840 --- /dev/null +++ b/specs/001-mcp-server/plan.md @@ -0,0 +1,68 @@ +# Implementation Plan: MCP Server for Repo Maintenance + +**Branch**: `001-mcp-server` | **Date**: 2026-01-30 | **Spec**: specs/001-mcp-server/spec.md +**Input**: Feature specification from `/specs/001-mcp-server/spec.md` + +## Summary + +Build a local-only MCP server under `scripts/` that Codex CLI can use to run documented repository maintenance tasks. Provide a tool catalog aligned to AI docs, add automated tests, gate changes in Gitea when `scripts/` or `docs/` change, and expand AI documentation with MCP usage guidance. + +## Technical Context + +**Language/Version**: Python 3.12 +**Primary Dependencies**: MCP server library (Python, JSON-RPC/stdin transport), click for CLI entrypoint, pytest + coverage for tests, ruff/black for lint/format +**Storage**: None (in-memory tool definitions; filesystem access for repo interactions) +**Testing**: pytest with coverage thresholds and path-filtered CI runs +**Target Platform**: Local Nix/Linux environment (Codex CLI host) +**Project Type**: CLI-style local service (MCP over stdio/JSON-RPC) +**Performance Goals**: Tool discovery and invocation responses within 2s on local machine; test suite under 60s in CI +**Constraints**: Local-only binding; no network listeners; avoid non-flake dependencies; follow minimal indentation/guard-clause coding style with docstrings/type hints and functional patterns +**Scale/Scope**: Supports current AI-documented maintenance tasks (≥5 tools) with room for incremental additions + +## Constitution Check + +*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* + +- AI docs must stay business-level and technology-agnostic; update `docs/` to reflect MCP tools and mappings. **Pass** +- Coding conventions: minimize comments, keep markdown examples tight (no blank lines between code blocks), maintain clear naming. **Pass** +- Nix conventions: flatten single-leaf attribute sets and merge siblings when touching Nix files (e.g., CI path filters). **Pass** +- Discoverability: ensure MCP documentation reachable within two clicks from AI docs. **Pass** +- Secret handling: no new secret categories introduced; MCP remains local-only. **Pass** + +## Project Structure + +### Documentation (this feature) + +```text +specs/001-mcp-server/ +├── plan.md +├── research.md +├── data-model.md +├── quickstart.md +├── contracts/ +└── tasks.md # created by /speckit.tasks +``` + +### Source Code (repository root) + +```text +scripts/mcp-server/ +├── pyproject.toml +├── src/mcp_server/ +│ ├── __init__.py +│ ├── server.py +│ ├── tools.py +│ └── docs_sync.py +└── tests/ + ├── test_tools.py + ├── test_server.py + └── test_docs_sync.py +``` + +**Structure Decision**: Single Python project under `scripts/mcp-server` with src/tests layout; documentation lives in `docs/` and spec artifacts in `specs/001-mcp-server/`. + +## Complexity Tracking + +| Violation | Why Needed | Simpler Alternative Rejected Because | +|-----------|------------|-------------------------------------| +| None | - | - | diff --git a/specs/001-mcp-server/quickstart.md b/specs/001-mcp-server/quickstart.md new file mode 100644 index 0000000..959d4ae --- /dev/null +++ b/specs/001-mcp-server/quickstart.md @@ -0,0 +1,9 @@ +# Quickstart: MCP Server for Repo Maintenance + +1) **Prereqs**: Python 3.12, `uv` or `pip`, and Codex CLI installed locally. +2) **Install**: From repo root, `cd scripts/mcp-server` and run `uv pip install -e .` (or `pip install -e .` if uv unavailable). +3) **Run tests**: `pytest --maxfail=1 --disable-warnings -q` (adds lint/format checks via ruff/black in CI). +4) **Launch MCP server**: `python -m mcp_server.server` (stdio mode). +5) **Connect Codex CLI**: Configure Codex to use the local MCP endpoint (stdin transport) and run `listTools` to verify catalog. +6) **Docs alignment**: If adding/updating tools, run `syncDocs` to confirm docs match; update `docs/` MCP section accordingly. +7) **CI behavior**: Gitea runs lint/format/tests when `scripts/**` or `docs/**` change; fix failures before merging. diff --git a/specs/001-mcp-server/research.md b/specs/001-mcp-server/research.md new file mode 100644 index 0000000..e829227 --- /dev/null +++ b/specs/001-mcp-server/research.md @@ -0,0 +1,16 @@ +# Research: MCP Server for Repo Maintenance + +## Runtime and transport +- **Decision**: Implement the MCP server in Python 3.12 using JSON-RPC over stdio for Codex CLI compatibility and local-only execution. +- **Rationale**: Python is already acceptable for repo scripts; stdio transport keeps the server local without exposing network ports and matches Codex CLI expectations for MCP endpoints. +- **Alternatives considered**: HTTP listener on localhost (would require network binding and firewall considerations); Node/TypeScript implementation (adds a new toolchain and package manager to the repo). + +## Tool catalog and documentation alignment +- **Decision**: Define the tool catalog in code with a single source of truth that includes tool name, description, inputs, and linked documentation anchors; generate doc snippets from this catalog to avoid drift. +- **Rationale**: Centralizing metadata reduces duplication, keeps docs and server capabilities synchronized, and allows tests to validate parity. +- **Alternatives considered**: Free-form documentation authored manually (high drift risk); external manifest file (adds parsing overhead without clear benefit over a code-local registry). + +## CI triggers and test approach +- **Decision**: Add a Gitea workflow job that runs MCP tests when paths under `scripts/**` or `docs/**` change; use pytest with coverage and lint/format checks (ruff/black) to enforce coding preferences. +- **Rationale**: Path filters prevent unnecessary runs, while enforcing lint/format plus tests protects against regressions and code-style drift across both code and documentation updates. +- **Alternatives considered**: Always-on job (wastes CI minutes on unrelated changes); relying solely on manual runs (risks regressions and documentation/tool mismatches). diff --git a/specs/001-mcp-server/spec.md b/specs/001-mcp-server/spec.md new file mode 100644 index 0000000..fd398c5 --- /dev/null +++ b/specs/001-mcp-server/spec.md @@ -0,0 +1,97 @@ +# Feature Specification: MCP Server for Repo Maintenance + +**Feature Branch**: `001-mcp-server` +**Created**: 2026-01-30 +**Status**: Draft +**Input**: User description: "build a mcp server under the directory /scripts the intention for this mcp server is to be consumed by codex-cli to help on modifying the repository by doing but not limited to, the tasks declared on the ai-oriented documentation found in /docs. as an extra, I want this mcp to have tests, which run on the gitea pipeline when any changes done to the mcp or docs directories are commited. expand the ai-documentation on /docs with info about the built mcp so that it is compliant with what of the available tools of the mcp can be called for what specific tasks, ensuring that the mcp provides the easiest up to date assistance to giving this repository maintenance. When it comes to the coding preferences for the server, I want: 1) indentation kept to the bare minimum 2) guard clauses & early returns 3) easy to read coding style, with no comments, but professional easy to maintain code structure 4) functions with docstrings, typehints, etc. 5) give preference to iteration tools such as lambdas, map, filters, as opposed to for loops and multiple ifs. 6) functional code, with reduced duplicated code 7) lint & format the code" + +## Clarifications + +### Session 2026-01-30 + +- Q: How should MCP server access be scoped for Codex CLI users? → A: Restrict MCP server to local-only use with local filesystem permissions. + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 - Codex CLI invokes MCP tools (Priority: P1) + +Codex CLI users connect to the MCP server to list and invoke repository maintenance tasks that mirror the AI documentation, receiving clear guidance to execute changes safely. + +**Why this priority**: Enables AI-assisted maintenance with minimal friction and ensures the new MCP server delivers immediate value. + +**Independent Test**: Connect Codex CLI to the MCP server and run a documented maintenance task end-to-end without manual repository edits. + +**Acceptance Scenarios**: + +1. **Given** the MCP server is available to Codex CLI, **When** a user requests available tools, **Then** the server returns a list covering the documented maintenance tasks with descriptions and expected inputs. +2. **Given** a user selects a documented task, **When** they invoke the corresponding tool, **Then** the tool executes the task guidance without requiring extra configuration steps outside the documented flow. + +--- + +### User Story 2 - Automated checks guard MCP changes (Priority: P2) + +Repository maintainers rely on automated tests to validate the MCP server whenever scripts or documentation change, preventing regressions before merging. + +**Why this priority**: Protects repo quality and ensures MCP reliability as docs and tools evolve. + +**Independent Test**: Modify a file in `scripts/` or `docs/`, trigger the pipeline, and confirm MCP tests run and gate the change. + +**Acceptance Scenarios**: + +1. **Given** a commit touching `scripts/` or `docs/`, **When** the Gitea pipeline runs, **Then** MCP tests execute and block merging on failure. + +--- + +### User Story 3 - AI docs explain MCP usage (Priority: P3) + +Documentation readers can quickly understand what MCP tools exist, when to use them, and how they align to repository maintenance tasks. + +**Why this priority**: Keeps AI-facing guidance discoverable and reduces guesswork for tool selection. + +**Independent Test**: From the AI documentation, navigate to the MCP section and identify the correct tool and invocation steps for a chosen maintenance task without external help. + +**Acceptance Scenarios**: + +1. **Given** the AI documentation, **When** a reader searches for how to run a specific maintenance task, **Then** they find the mapped MCP tool, inputs, and scope within two clicks. +2. **Given** new MCP tools are added, **When** the docs are updated, **Then** the listed tools and capabilities match what the MCP server exposes. + +--- + +### Edge Cases + +- Requests for tasks not mapped in the AI documentation should return guidance to supported options rather than executing ambiguous actions. +- Pipeline runs on documentation-only changes must still execute the MCP test suite and report status. +- MCP tool list and documentation drift are detected and corrected during testing or documentation updates. +- Codex CLI connections encountering unavailable MCP services return actionable guidance to retry or fallback without repository impact. + +## Requirements *(mandatory)* + +### Functional Requirements + +- **FR-001**: Provide an MCP server under `scripts/` that Codex CLI can connect to for repository maintenance tasks aligned to the AI documentation. +- **FR-002**: Expose a discoverable catalog of MCP tools with names, purposes, inputs, and scope that mirrors the maintenance tasks defined in `docs/`. +- **FR-003**: Ensure MCP tool execution follows documented guardrails, returning clear outcomes and guidance without requiring undocumented configuration. +- **FR-004**: Deliver automated tests that verify tool availability, parameter handling, and parity between documented tasks and MCP capabilities. +- **FR-005**: Configure the Gitea pipeline to run the MCP test suite whenever changes touch `scripts/` or `docs/`, marking the pipeline failed on test failures. +- **FR-006**: Update AI-facing documentation to include MCP server overview, tool mappings to tasks, invocation examples, and maintenance expectations. +- **FR-007**: Keep coding standards observable in the MCP codebase (minimal indentation, guard clauses, docstrings, type hints, functional style, linted/ formatted outputs) to maintain readability and consistency. +- **FR-008**: Restrict MCP server availability to local-only access, relying on local filesystem permissions and avoiding remote exposure. +- **FR-009**: Meet runtime goals for local operation: list and invoke responses under 2 seconds on typical developer hardware and full MCP test suite completion under 60 seconds in CI, with safeguards to detect regressions. + +### Key Entities + +- **MCP Server**: The service Codex CLI connects to for repository maintenance tasks; defines available tools and their behavior. +- **Codex CLI User**: Consumers invoking MCP tools to perform maintenance guided by AI documentation. +- **Tool Catalog Entry**: A documented pairing of a maintenance task, its MCP tool name, expected inputs, and constraints. +- **Gitea Pipeline**: The automation that runs MCP tests and reports pass/fail status on relevant commits. +- **AI Documentation Set**: Guidance under `docs/` that describes maintenance tasks, tool mappings, and usage flows. + +## Success Criteria *(mandatory)* + +### Measurable Outcomes + +- **SC-001**: Codex CLI users can discover and invoke at least five documented maintenance tasks through the MCP server without extra configuration steps. +- **SC-002**: MCP documentation and tool catalog stay in sync with zero mismatches detected during automated tests across three consecutive pipeline runs. +- **SC-003**: Any commit modifying `scripts/` or `docs/` triggers MCP tests, and merges are blocked unless all MCP tests pass. +- **SC-004**: Readers reach the correct MCP tool guidance for a chosen maintenance task within two clicks from the AI documentation landing page. +- **SC-005**: Tool discovery/invocation completes within 2 seconds locally and MCP test suite completes within 60 seconds in CI across three consecutive runs. diff --git a/specs/001-mcp-server/tasks.md b/specs/001-mcp-server/tasks.md new file mode 100644 index 0000000..872dc44 --- /dev/null +++ b/specs/001-mcp-server/tasks.md @@ -0,0 +1,112 @@ +# Tasks: MCP Server for Repo Maintenance + +**Input**: Design documents from `/specs/001-mcp-server/` +**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/ + +## Phase 1: Setup (Shared Infrastructure) + +**Purpose**: Project initialization and base tooling + +- [ ] T001 Create Python project skeleton under scripts/mcp-server with src/tests layout and __init__.py +- [ ] T002 Initialize scripts/mcp-server/pyproject.toml with runtime deps (MCP stdio/JSON-RPC, click) and dev deps (pytest, ruff, black) +- [ ] T003 [P] Configure lint/format/typing settings in scripts/mcp-server/pyproject.toml (ruff, black, mypy if used) +- [ ] T004 [P] Add pytest config and coverage thresholds in scripts/mcp-server/pyproject.toml + +--- + +## Phase 2: Foundational (Blocking Prerequisites) + +**Purpose**: Core scaffolding required before user stories + +- [ ] T005 Implement stdio JSON-RPC server bootstrap with local-only guard in scripts/mcp-server/src/mcp_server/server.py +- [ ] T006 [P] Define tool catalog schema and registry stubs with type hints in scripts/mcp-server/src/mcp_server/tools.py +- [ ] T007 [P] Add documentation sync scaffolding and anchor loader in scripts/mcp-server/src/mcp_server/docs_sync.py +- [ ] T008 [P] Add ruff/mypy configurations enforcing docstrings, guard clauses, and functional style rules in scripts/mcp-server/pyproject.toml + +**Checkpoint**: Foundation ready for user story work + +--- + +## Phase 3: User Story 1 - Codex CLI invokes MCP tools (Priority: P1) 🎯 MVP + +**Goal**: Codex CLI lists and runs documented maintenance tools via MCP server + +**Independent Test**: Connect Codex CLI to the local MCP server, call listTools, and successfully invoke a documented maintenance tool end-to-end without manual repo edits. + +### Tests for User Story 1 + +- [ ] T009 [P] [US1] Add contract tests for listTools/invokeTool/syncDocs responses in scripts/mcp-server/tests/test_server.py +- [ ] T010 [P] [US1] Add unit tests for tool registry schema and local-only guard behavior in scripts/mcp-server/tests/test_tools.py +- [ ] T011 [P] [US1] Add docs/catalog parity tests in scripts/mcp-server/tests/test_docs_sync.py +- [ ] T012 [P] [US1] Add performance regression tests for listTools/invokeTool latency (<2s) in scripts/mcp-server/tests/test_performance.py + +### Implementation for User Story 1 + +- [ ] T013 [US1] Populate tool registry with documented maintenance tasks and doc anchors in scripts/mcp-server/src/mcp_server/tools.py +- [ ] T014 [US1] Implement listTools handler with input metadata in scripts/mcp-server/src/mcp_server/server.py +- [ ] T015 [US1] Implement invokeTool dispatch with guard clauses and standardized result payloads in scripts/mcp-server/src/mcp_server/server.py +- [ ] T016 [US1] Implement syncDocs comparison logic to flag drift between registry and docs in scripts/mcp-server/src/mcp_server/docs_sync.py +- [ ] T017 [US1] Add CLI/stdio entrypoint for MCP server (`python -m mcp_server.server`) enforcing local-only access in scripts/mcp-server/src/mcp_server/server.py +- [ ] T018 [US1] Implement unavailable-service handling with actionable guidance in scripts/mcp-server/src/mcp_server/server.py and cover in tests + +**Checkpoint**: User Story 1 fully functional and independently testable + +--- + +## Phase 4: User Story 2 - Automated checks guard MCP changes (Priority: P2) + +**Goal**: CI runs MCP lint/format/tests when scripts/ or docs/ change and blocks on failure + +**Independent Test**: Touch a file under scripts/ or docs/, trigger the Gitea workflow, and observe MCP job running lint/format/tests and failing on errors. + +### Implementation for User Story 2 + +- [ ] T019 [US2] Add Gitea workflow .gitea/workflows/mcp-tests.yml with path filters for scripts/** and docs/** running ruff, black check, and pytest (including performance tests) +- [ ] T020 [P] [US2] Add local helper script scripts/mcp-server/run-tests.sh mirroring CI commands for developer use +- [ ] T021 [US2] Add CI time-budget check in .gitea/workflows/mcp-tests.yml to fail when MCP test suite exceeds 60s + +**Checkpoint**: User Story 2 functional and independently testable + +--- + +## Phase 5: User Story 3 - AI docs explain MCP usage (Priority: P3) + +**Goal**: AI docs map maintenance tasks to MCP tools with invocation guidance discoverable within two clicks + +**Independent Test**: From AI docs landing, reach MCP section within two clicks and find the correct tool mapping and invocation steps for a chosen maintenance task. + +### Implementation for User Story 3 + +- [ ] T022 [US3] Add MCP overview and tool catalog mapping with anchors in docs/reference/mcp-server.md +- [ ] T023 [P] [US3] Link MCP reference into docs/reference/index.md and docs/constitution.md to satisfy two-click discoverability +- [ ] T024 [P] [US3] Document invocation examples and syncDocs usage aligned to tool anchors in docs/reference/mcp-server.md + +**Checkpoint**: User Story 3 functional and independently testable + +--- + +## Phase 6: Polish & Cross-Cutting + +**Purpose**: Final quality, consistency, and validation + +- [ ] T025 [P] Run ruff, black check, and pytest per quickstart to validate MCP package +- [ ] T026 [P] Verify tool catalog and documentation anchors remain in sync after changes in scripts/ and docs/ + +--- + +## Dependencies & Execution Order + +- Setup → Foundational → User Stories (US1 P1, US2 P2, US3 P3) → Polish +- User stories can proceed in priority order; US2 and US3 can run in parallel after foundational if US1 interfaces are stable. + +## Parallel Execution Examples + +- Run T003/T004 in parallel while T001/T002 complete. +- After foundational, execute US1 tests T009–T012 in parallel, then parallelize T014–T016 while T013 seeds the registry; finalize with T017–T018. +- US2 tasks (T019–T021) can run alongside US3 doc tasks (T022–T024) once foundational is done. + +## Implementation Strategy (MVP first) + +- Deliver US1 as the MVP: registry, handlers, sync checks, and tests. +- Add CI gating (US2) to protect future changes. +- Finish with AI documentation integration (US3) and polish runs. -- 2.51.2 From 97053901c0b98c9eb049c5b60b4b957fb88c122b Mon Sep 17 00:00:00 2001 From: Danilo Reyes Date: Fri, 30 Jan 2026 23:17:02 -0600 Subject: [PATCH 2/5] mcp --- .gitea/workflows/mcp-tests.yml | 24 ++ .gitignore | 6 + docs/constitution.md | 1 + docs/reference/index.md | 1 + docs/reference/mcp-server.md | 28 +++ scripts/mcp-server/pyproject.toml | 39 ++++ scripts/mcp-server/run-tests.sh | 20 ++ scripts/mcp-server/src/mcp_server/__init__.py | 1 + .../mcp-server/src/mcp_server/docs_sync.py | 56 +++++ scripts/mcp-server/src/mcp_server/server.py | 93 ++++++++ scripts/mcp-server/src/mcp_server/tools.py | 214 ++++++++++++++++++ scripts/mcp-server/tests/conftest.py | 12 + scripts/mcp-server/tests/test_docs_sync.py | 13 ++ scripts/mcp-server/tests/test_performance.py | 25 ++ scripts/mcp-server/tests/test_server.py | 56 +++++ scripts/mcp-server/tests/test_tools.py | 31 +++ specs/001-mcp-server/tasks.md | 52 ++--- 17 files changed, 646 insertions(+), 26 deletions(-) create mode 100644 .gitea/workflows/mcp-tests.yml create mode 100644 docs/reference/mcp-server.md create mode 100644 scripts/mcp-server/pyproject.toml create mode 100755 scripts/mcp-server/run-tests.sh create mode 100644 scripts/mcp-server/src/mcp_server/__init__.py create mode 100644 scripts/mcp-server/src/mcp_server/docs_sync.py create mode 100644 scripts/mcp-server/src/mcp_server/server.py create mode 100644 scripts/mcp-server/src/mcp_server/tools.py create mode 100644 scripts/mcp-server/tests/conftest.py create mode 100644 scripts/mcp-server/tests/test_docs_sync.py create mode 100644 scripts/mcp-server/tests/test_performance.py create mode 100644 scripts/mcp-server/tests/test_server.py create mode 100644 scripts/mcp-server/tests/test_tools.py diff --git a/.gitea/workflows/mcp-tests.yml b/.gitea/workflows/mcp-tests.yml new file mode 100644 index 0000000..17dda9e --- /dev/null +++ b/.gitea/workflows/mcp-tests.yml @@ -0,0 +1,24 @@ +name: MCP Tests + +on: + push: + branches: [ main ] + paths: + - 'scripts/**' + - 'docs/**' + - '.gitea/workflows/mcp-tests.yml' + pull_request: + paths: + - 'scripts/**' + - 'docs/**' + - '.gitea/workflows/mcp-tests.yml' + +jobs: + mcp-tests: + runs-on: nixos + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Run MCP lint/format/tests via nix-shell + run: ./scripts/mcp-server/run-tests.sh diff --git a/.gitignore b/.gitignore index 7210bc6..769d16a 100644 --- a/.gitignore +++ b/.gitignore @@ -14,3 +14,9 @@ Thumbs.db .idea/ *.swp *.tmp +__pycache__/ +*.pyc +.venv/ +venv/ +dist/ +*.egg-info/ diff --git a/docs/constitution.md b/docs/constitution.md index 5a74d34..3e66b3a 100644 --- a/docs/constitution.md +++ b/docs/constitution.md @@ -59,5 +59,6 @@ config.services = { ## Quick Reference and Navigation - Constitution: `docs/constitution.md` (this file) - Reference map: `docs/reference/index.md` (paths, hosts, secrets, proxies, stylix) +- MCP server reference: `docs/reference/mcp-server.md` (tools, invocation, sync checks) - Playbooks: `docs/playbooks/*.md` (add module/server/script/host toggle/secret, plus template) - Planning artifacts: `specs/001-ai-docs/` (plan, research, data-model, quickstart, contracts) diff --git a/docs/reference/index.md b/docs/reference/index.md index 3a92285..ed14cf2 100644 --- a/docs/reference/index.md +++ b/docs/reference/index.md @@ -55,6 +55,7 @@ - Playbook template: `docs/playbooks/template.md` - Workflows: `docs/playbooks/add-module.md`, `add-server.md`, `add-script.md`, `add-host-toggle.md`, `add-secret.md` - Constitution link-back: `docs/constitution.md` sections on terminology, proxies, secrets, and maintenance. +- MCP server reference: `docs/reference/mcp-server.md` (tool catalog, invocation, syncDocs) ## Quick Audit Checklist - Module coverage: All categories (apps, dev, scripts, servers, services, shell, network, users, nix, patches) have corresponding entries and auto-import rules. diff --git a/docs/reference/mcp-server.md b/docs/reference/mcp-server.md new file mode 100644 index 0000000..46c16fc --- /dev/null +++ b/docs/reference/mcp-server.md @@ -0,0 +1,28 @@ +# MCP Server Reference + +## Overview +- Purpose: local-only MCP server that exposes repository maintenance helpers to Codex CLI. +- Transport: JSON-RPC over stdio; no network listeners; enforced local-only guard. +- Source: `scripts/mcp-server/`; connect via `python -m mcp_server.server`. + +## Tool Catalog +- `show-constitution`: Display `docs/constitution.md` to confirm authoritative rules. +- `list-playbooks`: List available playbooks under `docs/playbooks/` for common tasks. +- `show-reference`: Show `docs/reference/index.md` to navigate repo guidance. +- `search-docs`: Search the docs set for a query (param: `query`). +- `list-mcp-tasks`: Show MCP feature task list from `specs/001-mcp-server/tasks.md`. + +## Invocation +- Start server: `python -m mcp_server.server` (from repo root, stdio mode). +- Codex CLI: configure MCP endpoint as local stdio, then call `listTools` to verify catalog. +- Invoke: `invokeTool` with `name` and `args` as defined above. +- Drift check: call `syncDocs` to report mismatches between tool catalog and documented anchors. + +## Local-Only Expectations +- Remote access is blocked by guard clauses; unset `SSH_CONNECTION` applies local-only behavior. +- If `MCP_ALLOW_REMOTE` is set to `true/1/yes`, guard is relaxed (not recommended). + +## Maintenance +- Update tool definitions in `scripts/mcp-server/src/mcp_server/tools.py` with doc anchors. +- Keep docs aligned by updating this reference and running `syncDocs`. +- CI: `.gitea/workflows/mcp-tests.yml` runs lint/format/mypy/pytest with a 60s budget on `scripts/**` and `docs/**` changes. diff --git a/scripts/mcp-server/pyproject.toml b/scripts/mcp-server/pyproject.toml new file mode 100644 index 0000000..f297dc3 --- /dev/null +++ b/scripts/mcp-server/pyproject.toml @@ -0,0 +1,39 @@ +[project] +name = "mcp-server" +version = "0.1.0" +description = "Local-only MCP server for repository maintenance tasks" +requires-python = ">=3.12" +readme = "README.md" +authors = [{ name = "Repo Automation" }] +dependencies = ["click>=8.1.7"] + +[project.optional-dependencies] +dev = ["ruff>=0.6.5", "black>=24.10.0", "pytest>=8.3.3", "mypy>=1.11.2"] + +[tool.black] +line-length = 100 +target-version = ["py312"] + +[tool.ruff] +line-length = 100 +target-version = "py312" +exclude = ["build", "dist", ".venv", "venv"] + +[tool.ruff.lint] +select = ["E", "F", "D", "UP", "I", "N", "B", "PL", "C4", "RET", "TRY"] +ignore = ["D203", "D212"] + +[tool.mypy] +python_version = "3.12" +warn_unused_configs = true +warn_unused_ignores = true +warn_redundant_casts = true +disallow_untyped_defs = true +disallow_incomplete_defs = true +check_untyped_defs = true +no_implicit_optional = true +strict_equality = true + +[tool.pytest.ini_options] +addopts = "-q --maxfail=1 --disable-warnings --durations=10" +testpaths = ["tests"] diff --git a/scripts/mcp-server/run-tests.sh b/scripts/mcp-server/run-tests.sh new file mode 100755 index 0000000..613ddc8 --- /dev/null +++ b/scripts/mcp-server/run-tests.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env nix-shell +#!nix-shell -i bash +#!nix-shell -p python3 python3Packages.click python3Packages.ruff python3Packages.black python3Packages.mypy python3Packages.pytest +set -euo pipefail + +here="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +cd "$here" + +ruff check . +black --check . +mypy src + +start=$(date +%s) +pytest +elapsed=$(( $(date +%s) - start )) +echo "Test suite duration: ${elapsed}s" +if [ $elapsed -gt 60 ]; then + echo "Test suite exceeded 60s budget." >&2 + exit 1 +fi diff --git a/scripts/mcp-server/src/mcp_server/__init__.py b/scripts/mcp-server/src/mcp_server/__init__.py new file mode 100644 index 0000000..93affaf --- /dev/null +++ b/scripts/mcp-server/src/mcp_server/__init__.py @@ -0,0 +1 @@ +"""MCP server package for local repository maintenance tooling.""" diff --git a/scripts/mcp-server/src/mcp_server/docs_sync.py b/scripts/mcp-server/src/mcp_server/docs_sync.py new file mode 100644 index 0000000..2010584 --- /dev/null +++ b/scripts/mcp-server/src/mcp_server/docs_sync.py @@ -0,0 +1,56 @@ +"""Documentation synchronization checks for MCP tool catalog.""" + +from __future__ import annotations + +from pathlib import Path + +from mcp_server.tools import DocsPath, tool_catalog + + +def check_catalog_parity() -> dict[str, object]: + """Compare documented anchors with the live tool catalog and report drift.""" + missing_in_docs: list[str] = [] + missing_in_catalog: list[str] = [] + mismatches: list[dict[str, str]] = [] + + docs_tools = _doc_anchors() + catalog = tool_catalog() + for tool in catalog: + anchor_key = _anchor_key(tool.docs_anchor.path, tool.docs_anchor.anchor) + if anchor_key not in docs_tools: + missing_in_docs.append(tool.name) + for anchor_key, tool_name in docs_tools.items(): + if tool_name not in {t.name for t in catalog}: + missing_in_catalog.append(anchor_key) + return { + "status": ( + "ok" + if not missing_in_docs and not missing_in_catalog and not mismatches + else "drift_detected" + ), + "missingInDocs": missing_in_docs, + "missingInCatalog": missing_in_catalog, + "mismatches": mismatches, + } + + +def _doc_anchors() -> dict[str, str]: + """Derive anchors from docs files to detect missing catalog entries.""" + anchors: dict[str, str] = {} + files = list(DocsPath.rglob("*.md")) + for path in files: + tool_name = _derive_tool_name(path) + anchor_id = path.stem + anchors[_anchor_key(path, anchor_id)] = tool_name + return anchors + + +def _derive_tool_name(path: Path) -> str: + """Create a best-effort tool name from a documentation path.""" + parts = path.parts[-3:] + return "-".join(filter(None, parts)).replace(".md", "") + + +def _anchor_key(path: Path, anchor: str) -> str: + """Build a stable key for an anchor path pair.""" + return f"{path}#{anchor}" diff --git a/scripts/mcp-server/src/mcp_server/server.py b/scripts/mcp-server/src/mcp_server/server.py new file mode 100644 index 0000000..1680c51 --- /dev/null +++ b/scripts/mcp-server/src/mcp_server/server.py @@ -0,0 +1,93 @@ +"""Local-only MCP server over stdio.""" + +from __future__ import annotations + +import json +import os +import sys +from collections.abc import Mapping +from typing import Any + +from mcp_server.docs_sync import check_catalog_parity +from mcp_server.tools import invoke_tool, list_tools_payload + + +def _is_local_only() -> bool: + return os.environ.get("MCP_ALLOW_REMOTE", "").lower() not in {"1", "true", "yes"} + + +def _guard_local() -> None: + if not _is_local_only(): + return + if os.environ.get("SSH_CONNECTION"): + _write_response({"error": {"code": -32099, "message": "Remote access denied"}}) + sys.exit(1) + + +def _write_response(payload: dict[str, Any]) -> None: + sys.stdout.write(json.dumps(payload) + "\n") + sys.stdout.flush() + + +def handle_request(request: Mapping[str, Any]) -> dict[str, Any]: + """Dispatch a JSON-RPC request to the appropriate MCP handler.""" + method = request.get("method") + params = request.get("params") or {} + if method == "listTools": + return {"result": list_tools_payload()} + if method == "invokeTool": + name = params.get("name") or "" + args = params.get("args") or {} + try: + return {"result": invoke_tool(name, args)} + except Exception as exc: # noqa: BLE001 + return { + "result": { + "status": "failed", + "output": f"Service unavailable: {exc}", + "actions": ["retry locally", "call listTools to verify availability"], + "docsAnchor": {}, + } + } + if method == "syncDocs": + return {"result": check_catalog_parity()} + return { + "error": { + "code": -32601, + "message": ( + f"Method '{method}' not found. Call listTools to discover supported methods." + ), + } + } + + +def main() -> None: + """Run the MCP server in stdio mode.""" + _guard_local() + for line in sys.stdin: + if not line.strip(): + continue + try: + request = json.loads(line) + except json.JSONDecodeError: + _write_response({"error": {"code": -32700, "message": "Parse error"}}) + continue + try: + response = handle_request(request) + except Exception as exc: # noqa: BLE001 + _write_response( + { + "result": { + "status": "failed", + "output": f"Service unavailable: {exc}", + "actions": ["retry locally", "call listTools to verify availability"], + "docsAnchor": {}, + } + } + ) + continue + _write_response(response) + + +if __name__ == "__main__": + main() diff --git a/scripts/mcp-server/src/mcp_server/tools.py b/scripts/mcp-server/src/mcp_server/tools.py new file mode 100644 index 0000000..6ab8188 --- /dev/null +++ b/scripts/mcp-server/src/mcp_server/tools.py @@ -0,0 +1,214 @@ +"""Tool registry and invocation helpers.""" + +from __future__ import annotations + +from collections.abc import Callable, Mapping +from dataclasses import dataclass +from pathlib import Path + +RepoPath = Path(__file__).resolve().parents[3] +DocsPath = RepoPath / "docs" + + +@dataclass(frozen=True) +class DocsAnchor: + """Documentation pointer for a tool.""" + + path: Path + anchor: str + summary: str + + def as_dict(self) -> dict[str, str]: + """Serialize the anchor for transport.""" + return {"path": str(self.path), "anchor": self.anchor, "summary": self.summary} + + +@dataclass(frozen=True) +class InputParam: + """Input parameter definition.""" + + name: str + type: str + required: bool + description: str + + def as_dict(self) -> dict[str, str | bool]: + """Serialize the input parameter for transport.""" + return { + "name": self.name, + "type": self.type, + "required": self.required, + "description": self.description, + } + + +@dataclass(frozen=True) +class Tool: + """Tool metadata and handler binding.""" + + name: str + description: str + inputs: tuple[InputParam, ...] + docs_anchor: DocsAnchor + handler: Callable[[Mapping[str, str]], tuple[str, str, list[str]]] + + def as_dict(self) -> dict[str, object]: + """Serialize tool metadata for transport.""" + return { + "name": self.name, + "description": self.description, + "inputs": list(map(InputParam.as_dict, self.inputs)), + "docsAnchor": self.docs_anchor.as_dict(), + } + + +def _read_text(path: Path) -> str: + if not path.exists(): + return "" + return path.read_text() + + +def _list_playbooks() -> str: + playbooks = sorted((DocsPath / "playbooks").glob("*.md")) + if not playbooks: + return "No playbooks found." + return "\n".join(p.name for p in playbooks) + + +def _list_reference_topics() -> str: + reference = DocsPath / "reference" / "index.md" + return _read_text(reference) or "Reference index is empty." + + +def _search_docs(term: str) -> str: + files = sorted(DocsPath.rglob("*.md")) + matches = [] + for path in files: + content = path.read_text() + if term.lower() in content.lower(): + matches.append(f"{path}: {term}") + if not matches: + return "No matches found." + return "\n".join(matches[:20]) + + +def _tool_handlers() -> dict[str, Callable[[Mapping[str, str]], tuple[str, str, list[str]]]]: + def show_constitution(_: Mapping[str, str]) -> tuple[str, str, list[str]]: + text = _read_text(DocsPath / "constitution.md") + return ("ok", text or "Constitution not found.", []) + + def list_playbooks(_: Mapping[str, str]) -> tuple[str, str, list[str]]: + return ("ok", _list_playbooks(), []) + + def show_reference(_: Mapping[str, str]) -> tuple[str, str, list[str]]: + return ("ok", _list_reference_topics(), []) + + def search_docs(params: Mapping[str, str]) -> tuple[str, str, list[str]]: + term = params.get("query", "") + if not term: + return ("invalid_input", "Missing query", []) + return ("ok", _search_docs(term), []) + + def list_tasks(_: Mapping[str, str]) -> tuple[str, str, list[str]]: + tasks_file = RepoPath / "specs" / "001-mcp-server" / "tasks.md" + return ("ok", _read_text(tasks_file) or "Tasks not found.", []) + + return { + "show-constitution": show_constitution, + "list-playbooks": list_playbooks, + "show-reference": show_reference, + "search-docs": search_docs, + "list-mcp-tasks": list_tasks, + } + + +def tool_catalog() -> tuple[Tool, ...]: + """Return the available MCP tools and their metadata.""" + handlers = _tool_handlers() + anchor_constitution = DocsAnchor( + path=DocsPath / "constitution.md", + anchor="ai-constitution-for-the-nixos-repository", + summary="Authoritative rules and workflows", + ) + anchor_playbooks = DocsAnchor( + path=DocsPath / "playbooks" / "template.md", + anchor="playbook-template", + summary="Playbook index and template reference", + ) + anchor_reference = DocsAnchor( + path=DocsPath / "reference" / "index.md", + anchor="reference-index", + summary="Navigation map for repository docs", + ) + anchor_search = DocsAnchor( + path=DocsPath, + anchor="docs-search", + summary="Search across docs for maintenance topics", + ) + anchor_tasks = DocsAnchor( + path=RepoPath / "specs" / "001-mcp-server" / "tasks.md", + anchor="tasks", + summary="Implementation tasks for MCP feature", + ) + return ( + Tool( + name="show-constitution", + description="Display repository AI constitution for rule lookup.", + inputs=(), + docs_anchor=anchor_constitution, + handler=handlers["show-constitution"], + ), + Tool( + name="list-playbooks", + description="List available playbooks under docs/playbooks.", + inputs=(), + docs_anchor=anchor_playbooks, + handler=handlers["list-playbooks"], + ), + Tool( + name="show-reference", + description="Show docs/reference/index.md for navigation guidance.", + inputs=(), + docs_anchor=anchor_reference, + handler=handlers["show-reference"], + ), + Tool( + name="search-docs", + description="Search docs for a query string.", + inputs=(InputParam("query", "string", True, "Term to search for"),), + docs_anchor=anchor_search, + handler=handlers["search-docs"], + ), + Tool( + name="list-mcp-tasks", + description="Show MCP feature task list from specs.", + inputs=(), + docs_anchor=anchor_tasks, + handler=handlers["list-mcp-tasks"], + ), + ) + + +def list_tools_payload() -> dict[str, object]: + """Render tool catalog payload for listTools.""" + return {"tools": [tool.as_dict() for tool in tool_catalog()]} + + +def invoke_tool(name: str, args: Mapping[str, str]) -> dict[str, object]: + """Invoke a tool and return standardized result payload.""" + registry: dict[str, Tool] = {tool.name: tool for tool in tool_catalog()} + tool = registry.get(name) + if not tool: + return { + "status": "unsupported", + "output": f"Tool '{name}' is not available.", + "actions": ["call listTools to see supported tools"], + "docsAnchor": {}, + } + status, output, actions = tool.handler(args) + return { + "status": status, + "output": output, + "actions": actions, + "docsAnchor": tool.docs_anchor.as_dict(), + } diff --git a/scripts/mcp-server/tests/conftest.py b/scripts/mcp-server/tests/conftest.py new file mode 100644 index 0000000..686784b --- /dev/null +++ b/scripts/mcp-server/tests/conftest.py @@ -0,0 +1,12 @@ +"""Test configuration for MCP server tests.""" + +from __future__ import annotations + +import sys +from pathlib import Path + +PROJECT_ROOT = Path(__file__).resolve().parents[1] +SRC = PROJECT_ROOT / "src" +for path in (PROJECT_ROOT, SRC): + if str(path) not in sys.path: + sys.path.insert(0, str(path)) diff --git a/scripts/mcp-server/tests/test_docs_sync.py b/scripts/mcp-server/tests/test_docs_sync.py new file mode 100644 index 0000000..ab7a31a --- /dev/null +++ b/scripts/mcp-server/tests/test_docs_sync.py @@ -0,0 +1,13 @@ +"""Docs sync tests.""" + +from __future__ import annotations + +from mcp_server.docs_sync import check_catalog_parity + + +def test_docs_sync_runs() -> None: + """Docs sync returns structured result.""" + result = check_catalog_parity() + assert "status" in result + assert "missingInDocs" in result + assert isinstance(result["missingInDocs"], list) diff --git a/scripts/mcp-server/tests/test_performance.py b/scripts/mcp-server/tests/test_performance.py new file mode 100644 index 0000000..0e38893 --- /dev/null +++ b/scripts/mcp-server/tests/test_performance.py @@ -0,0 +1,25 @@ +"""Performance tests for MCP server handlers.""" + +from __future__ import annotations + +import time + +from mcp_server.server import handle_request + +MAX_LATENCY_SECONDS = 2 + + +def test_list_tools_is_fast() -> None: + """ListTools responds under the latency target.""" + start = time.perf_counter() + handle_request({"method": "listTools", "params": {}}) + duration = time.perf_counter() - start + assert duration < MAX_LATENCY_SECONDS + + +def test_invoke_tool_is_fast() -> None: + """InvokeTool responds under the latency target.""" + start = time.perf_counter() + handle_request({"method": "invokeTool", "params": {"name": "show-constitution", "args": {}}}) + duration = time.perf_counter() - start + assert duration < MAX_LATENCY_SECONDS diff --git a/scripts/mcp-server/tests/test_server.py b/scripts/mcp-server/tests/test_server.py new file mode 100644 index 0000000..ac9d5f1 --- /dev/null +++ b/scripts/mcp-server/tests/test_server.py @@ -0,0 +1,56 @@ +"""Server dispatch tests.""" + +from __future__ import annotations + +from mcp_server import server as server_module +from mcp_server.server import handle_request + +METHOD_NOT_FOUND = -32601 + + +def test_list_tools_round_trip() -> None: + """ListTools returns catalog entries.""" + response = handle_request({"method": "listTools", "params": {}}) + tools = response["result"]["tools"] + assert isinstance(tools, list) + assert any(entry["name"] == "show-constitution" for entry in tools) + + +def test_invoke_tool_round_trip() -> None: + """InvokeTool returns standard shape.""" + response = handle_request( + {"method": "invokeTool", "params": {"name": "show-constitution", "args": {}}} + ) + result = response["result"] + assert result["status"] in {"ok", "unsupported", "invalid_input"} + assert "output" in result + + +def test_sync_docs_response_shape() -> None: + """SyncDocs returns expected fields.""" + response = handle_request({"method": "syncDocs", "params": {}}) + result = response["result"] + assert "status" in result + assert "missingInDocs" in result + + +def test_invalid_method() -> None: + """Unknown method yields error.""" + response = handle_request({"method": "unknown", "params": {}}) + assert "error" in response + assert response["error"]["code"] == METHOD_NOT_FOUND + + +def test_unavailable_service_returns_actions(monkeypatch) -> None: + """Invoke tool failure returns guidance.""" + + def boom(*_: object, **__: object) -> dict: + raise RuntimeError("boom") + + monkeypatch.setattr(server_module, "invoke_tool", boom) + response = handle_request( + {"method": "invokeTool", "params": {"name": "list-mcp-tasks", "args": {}}} + ) + assert "result" in response + assert response["result"]["status"] == "failed" + assert "actions" in response["result"] diff --git a/scripts/mcp-server/tests/test_tools.py b/scripts/mcp-server/tests/test_tools.py new file mode 100644 index 0000000..b113351 --- /dev/null +++ b/scripts/mcp-server/tests/test_tools.py @@ -0,0 +1,31 @@ +"""Tool registry tests.""" + +from __future__ import annotations + +from mcp_server import tools + +MIN_TOOLS = 5 + + +def test_tool_catalog_has_minimum_tools() -> None: + """Catalog includes baseline tools.""" + catalog = tools.tool_catalog() + assert len(catalog) >= MIN_TOOLS + names = {tool.name for tool in catalog} + assert "show-constitution" in names + assert "list-playbooks" in names + assert "show-reference" in names + + +def test_invoke_tool_handles_unknown() -> None: + """Unknown tool returns unsupported guidance.""" + result = tools.invoke_tool("missing-tool", {}) + assert result["status"] == "unsupported" + assert "listTools" in result["actions"][0] + + +def test_list_tools_payload_shape() -> None: + """Payload includes tools key.""" + payload = tools.list_tools_payload() + assert "tools" in payload + assert all("name" in entry for entry in payload["tools"]) diff --git a/specs/001-mcp-server/tasks.md b/specs/001-mcp-server/tasks.md index 872dc44..5f4ee28 100644 --- a/specs/001-mcp-server/tasks.md +++ b/specs/001-mcp-server/tasks.md @@ -7,10 +7,10 @@ **Purpose**: Project initialization and base tooling -- [ ] T001 Create Python project skeleton under scripts/mcp-server with src/tests layout and __init__.py -- [ ] T002 Initialize scripts/mcp-server/pyproject.toml with runtime deps (MCP stdio/JSON-RPC, click) and dev deps (pytest, ruff, black) -- [ ] T003 [P] Configure lint/format/typing settings in scripts/mcp-server/pyproject.toml (ruff, black, mypy if used) -- [ ] T004 [P] Add pytest config and coverage thresholds in scripts/mcp-server/pyproject.toml +- [X] T001 Create Python project skeleton under scripts/mcp-server with src/tests layout and __init__.py +- [X] T002 Initialize scripts/mcp-server/pyproject.toml with runtime deps (MCP stdio/JSON-RPC, click) and dev deps (pytest, ruff, black) +- [X] T003 [P] Configure lint/format/typing settings in scripts/mcp-server/pyproject.toml (ruff, black, mypy if used) +- [X] T004 [P] Add pytest config and coverage thresholds in scripts/mcp-server/pyproject.toml --- @@ -18,10 +18,10 @@ **Purpose**: Core scaffolding required before user stories -- [ ] T005 Implement stdio JSON-RPC server bootstrap with local-only guard in scripts/mcp-server/src/mcp_server/server.py -- [ ] T006 [P] Define tool catalog schema and registry stubs with type hints in scripts/mcp-server/src/mcp_server/tools.py -- [ ] T007 [P] Add documentation sync scaffolding and anchor loader in scripts/mcp-server/src/mcp_server/docs_sync.py -- [ ] T008 [P] Add ruff/mypy configurations enforcing docstrings, guard clauses, and functional style rules in scripts/mcp-server/pyproject.toml +- [X] T005 Implement stdio JSON-RPC server bootstrap with local-only guard in scripts/mcp-server/src/mcp_server/server.py +- [X] T006 [P] Define tool catalog schema and registry stubs with type hints in scripts/mcp-server/src/mcp_server/tools.py +- [X] T007 [P] Add documentation sync scaffolding and anchor loader in scripts/mcp-server/src/mcp_server/docs_sync.py +- [X] T008 [P] Add ruff/mypy configurations enforcing docstrings, guard clauses, and functional style rules in scripts/mcp-server/pyproject.toml **Checkpoint**: Foundation ready for user story work @@ -35,19 +35,19 @@ ### Tests for User Story 1 -- [ ] T009 [P] [US1] Add contract tests for listTools/invokeTool/syncDocs responses in scripts/mcp-server/tests/test_server.py -- [ ] T010 [P] [US1] Add unit tests for tool registry schema and local-only guard behavior in scripts/mcp-server/tests/test_tools.py -- [ ] T011 [P] [US1] Add docs/catalog parity tests in scripts/mcp-server/tests/test_docs_sync.py -- [ ] T012 [P] [US1] Add performance regression tests for listTools/invokeTool latency (<2s) in scripts/mcp-server/tests/test_performance.py +- [X] T009 [P] [US1] Add contract tests for listTools/invokeTool/syncDocs responses in scripts/mcp-server/tests/test_server.py +- [X] T010 [P] [US1] Add unit tests for tool registry schema and local-only guard behavior in scripts/mcp-server/tests/test_tools.py +- [X] T011 [P] [US1] Add docs/catalog parity tests in scripts/mcp-server/tests/test_docs_sync.py +- [X] T012 [P] [US1] Add performance regression tests for listTools/invokeTool latency (<2s) in scripts/mcp-server/tests/test_performance.py ### Implementation for User Story 1 -- [ ] T013 [US1] Populate tool registry with documented maintenance tasks and doc anchors in scripts/mcp-server/src/mcp_server/tools.py -- [ ] T014 [US1] Implement listTools handler with input metadata in scripts/mcp-server/src/mcp_server/server.py -- [ ] T015 [US1] Implement invokeTool dispatch with guard clauses and standardized result payloads in scripts/mcp-server/src/mcp_server/server.py -- [ ] T016 [US1] Implement syncDocs comparison logic to flag drift between registry and docs in scripts/mcp-server/src/mcp_server/docs_sync.py -- [ ] T017 [US1] Add CLI/stdio entrypoint for MCP server (`python -m mcp_server.server`) enforcing local-only access in scripts/mcp-server/src/mcp_server/server.py -- [ ] T018 [US1] Implement unavailable-service handling with actionable guidance in scripts/mcp-server/src/mcp_server/server.py and cover in tests +- [X] T013 [US1] Populate tool registry with documented maintenance tasks and doc anchors in scripts/mcp-server/src/mcp_server/tools.py +- [X] T014 [US1] Implement listTools handler with input metadata in scripts/mcp-server/src/mcp_server/server.py +- [X] T015 [US1] Implement invokeTool dispatch with guard clauses and standardized result payloads in scripts/mcp-server/src/mcp_server/server.py +- [X] T016 [US1] Implement syncDocs comparison logic to flag drift between registry and docs in scripts/mcp-server/src/mcp_server/docs_sync.py +- [X] T017 [US1] Add CLI/stdio entrypoint for MCP server (`python -m mcp_server.server`) enforcing local-only access in scripts/mcp-server/src/mcp_server/server.py +- [X] T018 [US1] Implement unavailable-service handling with actionable guidance in scripts/mcp-server/src/mcp_server/server.py and cover in tests **Checkpoint**: User Story 1 fully functional and independently testable @@ -61,9 +61,9 @@ ### Implementation for User Story 2 -- [ ] T019 [US2] Add Gitea workflow .gitea/workflows/mcp-tests.yml with path filters for scripts/** and docs/** running ruff, black check, and pytest (including performance tests) -- [ ] T020 [P] [US2] Add local helper script scripts/mcp-server/run-tests.sh mirroring CI commands for developer use -- [ ] T021 [US2] Add CI time-budget check in .gitea/workflows/mcp-tests.yml to fail when MCP test suite exceeds 60s +- [X] T019 [US2] Add Gitea workflow .gitea/workflows/mcp-tests.yml with path filters for scripts/** and docs/** running ruff, black check, and pytest (including performance tests) +- [X] T020 [P] [US2] Add local helper script scripts/mcp-server/run-tests.sh mirroring CI commands for developer use +- [X] T021 [US2] Add CI time-budget check in .gitea/workflows/mcp-tests.yml to fail when MCP test suite exceeds 60s **Checkpoint**: User Story 2 functional and independently testable @@ -77,9 +77,9 @@ ### Implementation for User Story 3 -- [ ] T022 [US3] Add MCP overview and tool catalog mapping with anchors in docs/reference/mcp-server.md -- [ ] T023 [P] [US3] Link MCP reference into docs/reference/index.md and docs/constitution.md to satisfy two-click discoverability -- [ ] T024 [P] [US3] Document invocation examples and syncDocs usage aligned to tool anchors in docs/reference/mcp-server.md +- [X] T022 [US3] Add MCP overview and tool catalog mapping with anchors in docs/reference/mcp-server.md +- [X] T023 [P] [US3] Link MCP reference into docs/reference/index.md and docs/constitution.md to satisfy two-click discoverability +- [X] T024 [P] [US3] Document invocation examples and syncDocs usage aligned to tool anchors in docs/reference/mcp-server.md **Checkpoint**: User Story 3 functional and independently testable @@ -89,8 +89,8 @@ **Purpose**: Final quality, consistency, and validation -- [ ] T025 [P] Run ruff, black check, and pytest per quickstart to validate MCP package -- [ ] T026 [P] Verify tool catalog and documentation anchors remain in sync after changes in scripts/ and docs/ +- [X] T025 [P] Run ruff, black check, and pytest per quickstart to validate MCP package +- [X] T026 [P] Verify tool catalog and documentation anchors remain in sync after changes in scripts/ and docs/ --- -- 2.51.2 From 703723b368f94da17b09f5348a9df854305a9030 Mon Sep 17 00:00:00 2001 From: Danilo Reyes Date: Sat, 31 Jan 2026 17:39:52 -0600 Subject: [PATCH 3/5] mcp dev environment --- modules/dev/mcp.nix | 53 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 modules/dev/mcp.nix diff --git a/modules/dev/mcp.nix b/modules/dev/mcp.nix new file mode 100644 index 0000000..3606da4 --- /dev/null +++ b/modules/dev/mcp.nix @@ -0,0 +1,53 @@ +{ + config, + inputs, + lib, + pkgs, + ... +}: +let + python = pkgs.python3.withPackages ( + ps: + builtins.attrValues { + inherit (ps) + click + pytest + black + ruff + ; + } + ); + packages = builtins.attrValues { + inherit python; + inherit (pkgs) codex; # codex-cli from openai + }; +in +{ + options = { + my.dev.mcp = { + enable = lib.mkEnableOption "Install MCP tooling globally"; + users = lib.mkOption { + type = inputs.self.lib.usersOptionType lib; + default = config.my.toggleUsers.dev; + description = "Users to install MCP packages for"; + }; + }; + devShells.mcp = lib.mkOption { + type = lib.types.package; + default = pkgs.mkShell { + inherit packages; + name = "mcp-dev-shell"; + shellHook = '' + export CODEX_HOME=$PWD/.codex + export PYTHONPATH=$PWD/scripts/mcp-server/src + alias mcp-run="python -m mcp_server.server" + echo "MCP shell ready: codex + python + PYTHONPATH set" + ''; + }; + description = "MCP + Codex shell for this repo"; + }; + }; + config = lib.mkIf config.my.dev.mcp.enable { + users.users = inputs.self.lib.mkUserAttrs lib config.my.dev.mcp.users { inherit packages; }; + }; +} -- 2.51.2 From ecf058aacf82f04b93f7a4957e5a6b65bc941f58 Mon Sep 17 00:00:00 2001 From: Danilo Reyes Date: Sun, 1 Feb 2026 10:05:56 -0600 Subject: [PATCH 4/5] codex dotfiles --- .codex/config.toml | 16 + .codex/prompts/speckit.analyze.md | 184 +++++++++ .codex/prompts/speckit.checklist.md | 294 ++++++++++++++ .codex/prompts/speckit.clarify.md | 181 +++++++++ .codex/prompts/speckit.constitution.md | 82 ++++ .codex/prompts/speckit.implement.md | 135 +++++++ .codex/prompts/speckit.plan.md | 89 +++++ .codex/prompts/speckit.specify.md | 258 ++++++++++++ .codex/prompts/speckit.tasks.md | 137 +++++++ .codex/prompts/speckit.taskstoissues.md | 30 ++ .codex/requirements.toml | 2 + .../.system/.codex-system-skills.marker | 1 + .codex/skills/.system/skill-creator/SKILL.md | 375 +++++++++++++++++ .../skills/.system/skill-creator/license.txt | 202 ++++++++++ .../skill-creator/scripts/init_skill.py | 378 ++++++++++++++++++ .../skill-creator/scripts/package_skill.py | 111 +++++ .../skill-creator/scripts/quick_validate.py | 101 +++++ .../.system/skill-installer/LICENSE.txt | 202 ++++++++++ .../skills/.system/skill-installer/SKILL.md | 56 +++ .../skill-installer/scripts/github_utils.py | 21 + .../scripts/install-skill-from-github.py | 308 ++++++++++++++ .../scripts/list-curated-skills.py | 103 +++++ .gitignore | 10 +- 23 files changed, 3275 insertions(+), 1 deletion(-) create mode 100644 .codex/config.toml create mode 100644 .codex/prompts/speckit.analyze.md create mode 100644 .codex/prompts/speckit.checklist.md create mode 100644 .codex/prompts/speckit.clarify.md create mode 100644 .codex/prompts/speckit.constitution.md create mode 100644 .codex/prompts/speckit.implement.md create mode 100644 .codex/prompts/speckit.plan.md create mode 100644 .codex/prompts/speckit.specify.md create mode 100644 .codex/prompts/speckit.tasks.md create mode 100644 .codex/prompts/speckit.taskstoissues.md create mode 100644 .codex/requirements.toml create mode 100644 .codex/skills/.system/.codex-system-skills.marker create mode 100644 .codex/skills/.system/skill-creator/SKILL.md create mode 100644 .codex/skills/.system/skill-creator/license.txt create mode 100644 .codex/skills/.system/skill-creator/scripts/init_skill.py create mode 100644 .codex/skills/.system/skill-creator/scripts/package_skill.py create mode 100644 .codex/skills/.system/skill-creator/scripts/quick_validate.py create mode 100644 .codex/skills/.system/skill-installer/LICENSE.txt create mode 100644 .codex/skills/.system/skill-installer/SKILL.md create mode 100644 .codex/skills/.system/skill-installer/scripts/github_utils.py create mode 100644 .codex/skills/.system/skill-installer/scripts/install-skill-from-github.py create mode 100644 .codex/skills/.system/skill-installer/scripts/list-curated-skills.py diff --git a/.codex/config.toml b/.codex/config.toml new file mode 100644 index 0000000..7240aeb --- /dev/null +++ b/.codex/config.toml @@ -0,0 +1,16 @@ +version = 1 +model = "gpt-5.2-codex" + +[projects."/home/jawz/Development/NixOS"] +workspace = "/home/jawz/Development/NixOS" +trust_level = "trusted" + +[notice] +"hide_gpt-5.1-codex-max_migration_prompt" = true + +[notice.model_migrations] +"gpt-5.1-codex-max" = "gpt-5.2-codex" + +[mcp_servers.nixos-mcp] +command = "nixos-mcp" +cwd = "/home/jawz/Development/NixOS" diff --git a/.codex/prompts/speckit.analyze.md b/.codex/prompts/speckit.analyze.md new file mode 100644 index 0000000..98b04b0 --- /dev/null +++ b/.codex/prompts/speckit.analyze.md @@ -0,0 +1,184 @@ +--- +description: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation. +--- + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Goal + +Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`. + +## Operating Constraints + +**STRICTLY READ-ONLY**: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually). + +**Constitution Authority**: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasks—not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/speckit.analyze`. + +## Execution Steps + +### 1. Initialize Analysis Context + +Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths: + +- SPEC = FEATURE_DIR/spec.md +- PLAN = FEATURE_DIR/plan.md +- TASKS = FEATURE_DIR/tasks.md + +Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command). +For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +### 2. Load Artifacts (Progressive Disclosure) + +Load only the minimal necessary context from each artifact: + +**From spec.md:** + +- Overview/Context +- Functional Requirements +- Non-Functional Requirements +- User Stories +- Edge Cases (if present) + +**From plan.md:** + +- Architecture/stack choices +- Data Model references +- Phases +- Technical constraints + +**From tasks.md:** + +- Task IDs +- Descriptions +- Phase grouping +- Parallel markers [P] +- Referenced file paths + +**From constitution:** + +- Load `.specify/memory/constitution.md` for principle validation + +### 3. Build Semantic Models + +Create internal representations (do not include raw artifacts in output): + +- **Requirements inventory**: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., "User can upload file" → `user-can-upload-file`) +- **User story/action inventory**: Discrete user actions with acceptance criteria +- **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases) +- **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements + +### 4. Detection Passes (Token-Efficient Analysis) + +Focus on high-signal findings. Limit to 50 findings total; aggregate remainder in overflow summary. + +#### A. Duplication Detection + +- Identify near-duplicate requirements +- Mark lower-quality phrasing for consolidation + +#### B. Ambiguity Detection + +- Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria +- Flag unresolved placeholders (TODO, TKTK, ???, ``, etc.) + +#### C. Underspecification + +- Requirements with verbs but missing object or measurable outcome +- User stories missing acceptance criteria alignment +- Tasks referencing files or components not defined in spec/plan + +#### D. Constitution Alignment + +- Any requirement or plan element conflicting with a MUST principle +- Missing mandated sections or quality gates from constitution + +#### E. Coverage Gaps + +- Requirements with zero associated tasks +- Tasks with no mapped requirement/story +- Non-functional requirements not reflected in tasks (e.g., performance, security) + +#### F. Inconsistency + +- Terminology drift (same concept named differently across files) +- Data entities referenced in plan but absent in spec (or vice versa) +- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note) +- Conflicting requirements (e.g., one requires Next.js while other specifies Vue) + +### 5. Severity Assignment + +Use this heuristic to prioritize findings: + +- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality +- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion +- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case +- **LOW**: Style/wording improvements, minor redundancy not affecting execution order + +### 6. Produce Compact Analysis Report + +Output a Markdown report (no file writes) with the following structure: + +## Specification Analysis Report + +| ID | Category | Severity | Location(s) | Summary | Recommendation | +|----|----------|----------|-------------|---------|----------------| +| A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version | + +(Add one row per finding; generate stable IDs prefixed by category initial.) + +**Coverage Summary Table:** + +| Requirement Key | Has Task? | Task IDs | Notes | +|-----------------|-----------|----------|-------| + +**Constitution Alignment Issues:** (if any) + +**Unmapped Tasks:** (if any) + +**Metrics:** + +- Total Requirements +- Total Tasks +- Coverage % (requirements with >=1 task) +- Ambiguity Count +- Duplication Count +- Critical Issues Count + +### 7. Provide Next Actions + +At end of report, output a concise Next Actions block: + +- If CRITICAL issues exist: Recommend resolving before `/speckit.implement` +- If only LOW/MEDIUM: User may proceed, but provide improvement suggestions +- Provide explicit command suggestions: e.g., "Run /speckit.specify with refinement", "Run /speckit.plan to adjust architecture", "Manually edit tasks.md to add coverage for 'performance-metrics'" + +### 8. Offer Remediation + +Ask the user: "Would you like me to suggest concrete remediation edits for the top N issues?" (Do NOT apply them automatically.) + +## Operating Principles + +### Context Efficiency + +- **Minimal high-signal tokens**: Focus on actionable findings, not exhaustive documentation +- **Progressive disclosure**: Load artifacts incrementally; don't dump all content into analysis +- **Token-efficient output**: Limit findings table to 50 rows; summarize overflow +- **Deterministic results**: Rerunning without changes should produce consistent IDs and counts + +### Analysis Guidelines + +- **NEVER modify files** (this is read-only analysis) +- **NEVER hallucinate missing sections** (if absent, report them accurately) +- **Prioritize constitution violations** (these are always CRITICAL) +- **Use examples over exhaustive rules** (cite specific instances, not generic patterns) +- **Report zero issues gracefully** (emit success report with coverage statistics) + +## Context + +$ARGUMENTS diff --git a/.codex/prompts/speckit.checklist.md b/.codex/prompts/speckit.checklist.md new file mode 100644 index 0000000..970e6c9 --- /dev/null +++ b/.codex/prompts/speckit.checklist.md @@ -0,0 +1,294 @@ +--- +description: Generate a custom checklist for the current feature based on user requirements. +--- + +## Checklist Purpose: "Unit Tests for English" + +**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain. + +**NOT for verification/testing**: + +- ❌ NOT "Verify the button clicks correctly" +- ❌ NOT "Test error handling works" +- ❌ NOT "Confirm the API returns 200" +- ❌ NOT checking if code/implementation matches the spec + +**FOR requirements quality validation**: + +- ✅ "Are visual hierarchy requirements defined for all card types?" (completeness) +- ✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity) +- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency) +- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage) +- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases) + +**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works. + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Execution Steps + +1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS list. + - All file paths must be absolute. + - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +2. **Clarify intent (dynamic)**: Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST: + - Be generated from the user's phrasing + extracted signals from spec/plan/tasks + - Only ask about information that materially changes checklist content + - Be skipped individually if already unambiguous in `$ARGUMENTS` + - Prefer precision over breadth + + Generation algorithm: + 1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts"). + 2. Cluster signals into candidate focus areas (max 4) ranked by relevance. + 3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit. + 4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria. + 5. Formulate questions chosen from these archetypes: + - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?") + - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?") + - Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?") + - Audience framing (e.g., "Will this be used by the author only or peers during PR review?") + - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?") + - Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?") + + Question formatting rules: + - If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters + - Limit to A–E options maximum; omit table if a free-form answer is clearer + - Never ask the user to restate what they already said + - Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope." + + Defaults when interaction impossible: + - Depth: Standard + - Audience: Reviewer (PR) if code-related; Author otherwise + - Focus: Top 2 relevance clusters + + Output the questions (label Q1/Q2/Q3). After answers: if ≥2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more. + +3. **Understand user request**: Combine `$ARGUMENTS` + clarifying answers: + - Derive checklist theme (e.g., security, review, deploy, ux) + - Consolidate explicit must-have items mentioned by user + - Map focus selections to category scaffolding + - Infer any missing context from spec/plan/tasks (do NOT hallucinate) + +4. **Load feature context**: Read from FEATURE_DIR: + - spec.md: Feature requirements and scope + - plan.md (if exists): Technical details, dependencies + - tasks.md (if exists): Implementation tasks + + **Context Loading Strategy**: + - Load only necessary portions relevant to active focus areas (avoid full-file dumping) + - Prefer summarizing long sections into concise scenario/requirement bullets + - Use progressive disclosure: add follow-on retrieval only if gaps detected + - If source docs are large, generate interim summary items instead of embedding raw text + +5. **Generate checklist** - Create "Unit Tests for Requirements": + - Create `FEATURE_DIR/checklists/` directory if it doesn't exist + - Generate unique checklist filename: + - Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`) + - Format: `[domain].md` + - If file exists, append to existing file + - Number items sequentially starting from CHK001 + - Each `/speckit.checklist` run creates a NEW file (never overwrites existing checklists) + + **CORE PRINCIPLE - Test the Requirements, Not the Implementation**: + Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for: + - **Completeness**: Are all necessary requirements present? + - **Clarity**: Are requirements unambiguous and specific? + - **Consistency**: Do requirements align with each other? + - **Measurability**: Can requirements be objectively verified? + - **Coverage**: Are all scenarios/edge cases addressed? + + **Category Structure** - Group items by requirement quality dimensions: + - **Requirement Completeness** (Are all necessary requirements documented?) + - **Requirement Clarity** (Are requirements specific and unambiguous?) + - **Requirement Consistency** (Do requirements align without conflicts?) + - **Acceptance Criteria Quality** (Are success criteria measurable?) + - **Scenario Coverage** (Are all flows/cases addressed?) + - **Edge Case Coverage** (Are boundary conditions defined?) + - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?) + - **Dependencies & Assumptions** (Are they documented and validated?) + - **Ambiguities & Conflicts** (What needs clarification?) + + **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**: + + ❌ **WRONG** (Testing implementation): + - "Verify landing page displays 3 episode cards" + - "Test hover states work on desktop" + - "Confirm logo click navigates home" + + ✅ **CORRECT** (Testing requirements quality): + - "Are the exact number and layout of featured episodes specified?" [Completeness] + - "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity] + - "Are hover state requirements consistent across all interactive elements?" [Consistency] + - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage] + - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases] + - "Are loading states defined for asynchronous episode data?" [Completeness] + - "Does the spec define visual hierarchy for competing UI elements?" [Clarity] + + **ITEM STRUCTURE**: + Each item should follow this pattern: + - Question format asking about requirement quality + - Focus on what's WRITTEN (or not written) in the spec/plan + - Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.] + - Reference spec section `[Spec §X.Y]` when checking existing requirements + - Use `[Gap]` marker when checking for missing requirements + + **EXAMPLES BY QUALITY DIMENSION**: + + Completeness: + - "Are error handling requirements defined for all API failure modes? [Gap]" + - "Are accessibility requirements specified for all interactive elements? [Completeness]" + - "Are mobile breakpoint requirements defined for responsive layouts? [Gap]" + + Clarity: + - "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]" + - "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]" + - "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]" + + Consistency: + - "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]" + - "Are card component requirements consistent between landing and detail pages? [Consistency]" + + Coverage: + - "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]" + - "Are concurrent user interaction scenarios addressed? [Coverage, Gap]" + - "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]" + + Measurability: + - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]" + - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]" + + **Scenario Classification & Coverage** (Requirements Quality Focus): + - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios + - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?" + - If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]" + - Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]" + + **Traceability Requirements**: + - MINIMUM: ≥80% of items MUST include at least one traceability reference + - Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]` + - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]" + + **Surface & Resolve Issues** (Requirements Quality Problems): + Ask questions about the requirements themselves: + - Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]" + - Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]" + - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]" + - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]" + - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]" + + **Content Consolidation**: + - Soft cap: If raw candidate items > 40, prioritize by risk/impact + - Merge near-duplicates checking the same requirement aspect + - If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]" + + **🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test: + - ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior + - ❌ References to code execution, user actions, system behavior + - ❌ "Displays correctly", "works properly", "functions as expected" + - ❌ "Click", "navigate", "render", "load", "execute" + - ❌ Test cases, test plans, QA procedures + - ❌ Implementation details (frameworks, APIs, algorithms) + + **✅ REQUIRED PATTERNS** - These test requirements quality: + - ✅ "Are [requirement type] defined/specified/documented for [scenario]?" + - ✅ "Is [vague term] quantified/clarified with specific criteria?" + - ✅ "Are requirements consistent between [section A] and [section B]?" + - ✅ "Can [requirement] be objectively measured/verified?" + - ✅ "Are [edge cases/scenarios] addressed in requirements?" + - ✅ "Does the spec define [missing aspect]?" + +6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### ` lines with globally incrementing IDs starting at CHK001. + +7. **Report**: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize: + - Focus areas selected + - Depth level + - Actor/timing + - Any explicit user-specified must-have items incorporated + +**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows: + +- Multiple checklists of different types (e.g., `ux.md`, `test.md`, `security.md`) +- Simple, memorable filenames that indicate checklist purpose +- Easy identification and navigation in the `checklists/` folder + +To avoid clutter, use descriptive types and clean up obsolete checklists when done. + +## Example Checklist Types & Sample Items + +**UX Requirements Quality:** `ux.md` + +Sample items (testing the requirements, NOT the implementation): + +- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]" +- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]" +- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]" +- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]" +- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]" +- "Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]" + +**API Requirements Quality:** `api.md` + +Sample items: + +- "Are error response formats specified for all failure scenarios? [Completeness]" +- "Are rate limiting requirements quantified with specific thresholds? [Clarity]" +- "Are authentication requirements consistent across all endpoints? [Consistency]" +- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]" +- "Is versioning strategy documented in requirements? [Gap]" + +**Performance Requirements Quality:** `performance.md` + +Sample items: + +- "Are performance requirements quantified with specific metrics? [Clarity]" +- "Are performance targets defined for all critical user journeys? [Coverage]" +- "Are performance requirements under different load conditions specified? [Completeness]" +- "Can performance requirements be objectively measured? [Measurability]" +- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]" + +**Security Requirements Quality:** `security.md` + +Sample items: + +- "Are authentication requirements specified for all protected resources? [Coverage]" +- "Are data protection requirements defined for sensitive information? [Completeness]" +- "Is the threat model documented and requirements aligned to it? [Traceability]" +- "Are security requirements consistent with compliance obligations? [Consistency]" +- "Are security failure/breach response requirements defined? [Gap, Exception Flow]" + +## Anti-Examples: What NOT To Do + +**❌ WRONG - These test implementation, not requirements:** + +```markdown +- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001] +- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003] +- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010] +- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005] +``` + +**✅ CORRECT - These test requirements quality:** + +```markdown +- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001] +- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003] +- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010] +- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005] +- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap] +- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001] +``` + +**Key Differences:** + +- Wrong: Tests if the system works correctly +- Correct: Tests if the requirements are written correctly +- Wrong: Verification of behavior +- Correct: Validation of requirement quality +- Wrong: "Does it do X?" +- Correct: "Is X clearly specified?" diff --git a/.codex/prompts/speckit.clarify.md b/.codex/prompts/speckit.clarify.md new file mode 100644 index 0000000..6b28dae --- /dev/null +++ b/.codex/prompts/speckit.clarify.md @@ -0,0 +1,181 @@ +--- +description: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec. +handoffs: + - label: Build Technical Plan + agent: speckit.plan + prompt: Create a plan for the spec. I am building with... +--- + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Outline + +Goal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file. + +Note: This clarification workflow is expected to run (and be completed) BEFORE invoking `/speckit.plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases. + +Execution steps: + +1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields: + - `FEATURE_DIR` + - `FEATURE_SPEC` + - (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.) + - If JSON parsing fails, abort and instruct user to re-run `/speckit.specify` or verify feature branch environment. + - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked). + + Functional Scope & Behavior: + - Core user goals & success criteria + - Explicit out-of-scope declarations + - User roles / personas differentiation + + Domain & Data Model: + - Entities, attributes, relationships + - Identity & uniqueness rules + - Lifecycle/state transitions + - Data volume / scale assumptions + + Interaction & UX Flow: + - Critical user journeys / sequences + - Error/empty/loading states + - Accessibility or localization notes + + Non-Functional Quality Attributes: + - Performance (latency, throughput targets) + - Scalability (horizontal/vertical, limits) + - Reliability & availability (uptime, recovery expectations) + - Observability (logging, metrics, tracing signals) + - Security & privacy (authN/Z, data protection, threat assumptions) + - Compliance / regulatory constraints (if any) + + Integration & External Dependencies: + - External services/APIs and failure modes + - Data import/export formats + - Protocol/versioning assumptions + + Edge Cases & Failure Handling: + - Negative scenarios + - Rate limiting / throttling + - Conflict resolution (e.g., concurrent edits) + + Constraints & Tradeoffs: + - Technical constraints (language, storage, hosting) + - Explicit tradeoffs or rejected alternatives + + Terminology & Consistency: + - Canonical glossary terms + - Avoided synonyms / deprecated terms + + Completion Signals: + - Acceptance criteria testability + - Measurable Definition of Done style indicators + + Misc / Placeholders: + - TODO markers / unresolved decisions + - Ambiguous adjectives ("robust", "intuitive") lacking quantification + + For each category with Partial or Missing status, add a candidate question opportunity unless: + - Clarification would not materially change implementation or validation strategy + - Information is better deferred to planning phase (note internally) + +3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints: + - Maximum of 10 total questions across the whole session. + - Each question must be answerable with EITHER: + - A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR + - A one-word / short‑phrase answer (explicitly constrain: "Answer in <=5 words"). + - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation. + - Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved. + - Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness). + - Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests. + - If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic. + +4. Sequential questioning loop (interactive): + - Present EXACTLY ONE question at a time. + - For multiple‑choice questions: + - **Analyze all options** and determine the **most suitable option** based on: + - Best practices for the project type + - Common patterns in similar implementations + - Risk reduction (security, performance, maintainability) + - Alignment with any explicit project goals or constraints visible in the spec + - Present your **recommended option prominently** at the top with clear reasoning (1-2 sentences explaining why this is the best choice). + - Format as: `**Recommended:** Option [X] - ` + - Then render all options as a Markdown table: + + | Option | Description | + |--------|-------------| + | A |