Cross-Module Contracts and Handoffs
Purpose
This document defines the current architectural contract between mash, spirit_safe, still_charger, fermenter, wizard, bottler, shipper, and wikibase for Data Distillery and broader GKC workflows.
It is written as a practical anti-reinvention guide for contributors and custom agents.
Infrastructure perspective for these module boundaries:
- Data Distillery Wikibase defines semantic foundation (profiles, statements, value-list semantics).
- SpiritSafe materializes those semantics as deterministic artifacts.
gkcmodules execute packet assembly, validation/coercion, planning, and shipping from those artifacts.
Contract clarification (active direction):
- JSON Entity Profiles materialized from DD Wikibase semantics are the only active profile runtime contract.
- YAML-era profile loading/validation/generation is superseded and retained only as temporary legacy surface pending removal.
- Runtime validation/coercion ownership is centralized in
fermenter. - Wizard runtime ownership is centralized in
wizard. - No compatibility aliasing should be introduced between
gkc.profiles.forms.*andgkc.wizard.*.
Boundary Summary
Mash (gkc.mash)
Responsibility:
- Read and retrieve source data from Wikibase/Wikidata-compatible APIs.
- Return stable template structures for downstream processing.
- Provide generic API helpers reusable across Data Distillery and Wikidata.
Out of scope:
- Write or edit operations to remote systems.
- Semantic projection and packaging contracts for runtime artifacts.
Current anchor surface:
WikibaseApiClientWikibaseLoaderWikipediaLoader
Spirit Safe (gkc.spirit_safe)
Responsibility:
- Load SpiritSafe profile sources (GitHub or local).
- Build JSON Entity Profiles from SpiritSafe cache entities.
- Extract value-list talk-page payload blocks based on value-list classification.
- Hydrate value-list cache artifacts from extracted SPARQL queries.
- Export profile JSON artifacts for downstream packet assembly and hydration.
- Provide profile graph metadata and profile-loading utilities.
- Publish artifact metadata needed to compare long-lived curation packets against the SpiritSafe state from which they were minted.
Out of scope:
- Direct write execution to Wikibase APIs.
- UI-specific packet interpretation behavior.
Current anchor surface:
build_entity_profile_json_documentsexport_entity_profile_json_documentsdiscover_value_list_idsexport_value_list_sparql_querieshydrate_value_lists_from_cacheload_profile
Transition-only legacy surface (deferred removal):
validate_packet_structure
Still Charger (gkc.still_charger)
Responsibility:
- Assemble curation packet scaffolds from Entity Profile JSON documents.
- Charge packet entities from source values (Wikidata and other adapters).
- Inject per-entity source provenance into packet metadata for charged packets.
- Reseal packet metadata digest after charge-time metadata mutation.
- Orchestrate packet conformance section assembly from fermenter primitives.
Out of scope:
- Target payload shaping for any specific destination API.
- API transport execution.
- Validation semantics and outcome interpretation.
Current anchor surface:
create_curation_packetbuild_curation_packet_from_json_profilecharge_packet_from_wikidata_items(active Wikidata charging path)ChargeReportChargeIssue
Legacy surface (preserved for existing integrations; new workflows should use charge_packet_from_wikidata_items):
charge_curation_packet
Fermenter (gkc.fermenter)
Responsibility:
- Validate and coerce inbound values using profile-defined directives.
- Provide atomic datatype validators and coercion primitives.
- Enforce fixed values, value-list constraints, and full statement-shape constraints (value + qualifiers + references).
- Enforce derived-value constraints (for example, reference/qualifier values sourced from parent statement values).
- Validate packet-level compatibility and lifecycle conformance for long-lived curation packets.
- Emit shared
ConformanceNoticerecords with actionable feedback. - Serialize packet-facing statement conformance records from atomic statement evaluations.
Out of scope:
- Packet assembly and packet orchestration.
- Destination-specific payload shaping and transport execution.
Current anchor surface:
ConformanceNoticeConformanceOutcome(evaluation outcome contract under active refinement; packet-facing record migration tracked in #200)StatementEvaluationEntityEvaluationValidationResultvalidate_*datatype validatorscoerce_*datatype coercersnormalize_claim_valueevaluate_statement_claimevaluate_statement_instanceevaluate_entitystatement_evaluation_to_recordconformance_notice_payloadscheck_packet_integrityvalidate_packet_inlinevalidate_packet_from_file
Wizard (gkc.wizard)
Responsibility:
- Own interactive wizard runtime and UI orchestration.
- Render profile-driven curation flows from packet/profile artifacts.
- Consume
still_chargerpacket assembly/charging outputs andfermenterconformance outputs. - Manage wizard-only state and UX helpers (for example draft persistence and packet-to-view adapters).
Out of scope:
- Runtime validation/coercion rule ownership.
- Packet scaffold assembly ownership.
- SpiritSafe artifact translation/materialization ownership.
Current anchor surface:
- Top-level CLI entry
gkc wizard(public contract). - Streamlit app runtime and wizard step orchestration modules.
Implementation note:
- Wizard runtime is implemented under
gkc.wizard.
Bottler (gkc.bottler)
Responsibility:
- Provide canonical Wikibase JSON construction primitives for all claim/statement building.
- Transform values and mapping recipes into Wikibase payload structures (datavalues, snaks, claims).
- Build deterministic, multilingual label/description/alias blocks from profile metadata.
- Build Wikibase entity shells from profile metadata for profile-only packet generation.
- All consuming code must use bottler primitives rather than building JSON inline.
Out of scope:
- Remote API transport and authentication session management.
- Registry synchronization and semantic drift management.
- Validation and coercion logic (handled by fermenter).
Current anchor surface:
DataTypeTransformer(static methods for datatype conversion)SnakBuilder(atomic snak construction with datatypes)ClaimBuilder(complete statement building with qualifiers/references)LanguageBuilder(multilingual label/description/alias block building)EntityShellBuilder(Wikibase entity shell building from profile metadata)normalize_claim_datavalue(value-to-datatype mapping utility)build_claim_from_property_and_value(convenience statement builder)Distillate(end-to-end mapping configuration container)
Shipper (gkc.shipper)
Responsibility:
- Execute write operations against Wikibase-compatible APIs.
- Enforce write safety behavior (summary checks, dry-run paths, request shaping).
- Provide plan/preview behavior for create/update/no-op decisions.
- Support writes to any Wikibase instance (Wikidata, Data Distillery, etc.).
Out of scope:
- Generic read-model ownership (belongs to mash).
- Semantic modeling and profile ontology design ownership (belongs to DD Wikibase ontology assets and SpiritSafe profile artifacts).
Current anchor surface:
WikibaseShipper(works with any Wikibase instance)CommonsShipper(placeholder)OpenStreetMapShipper(placeholder)DiffPlan,DiffOperation,WriteResult
Data Distillery Wikibase Semantics (Architecture Layer)
Responsibility:
- Data Distillery semantic backbone and ontology governance.
- Authoritative semantic definitions consumed by cache/materialization pipelines.
- Architectural source of truth for profile semantics.
Out of scope:
- Reimplementing generic read client logic.
- Reimplementing generic write transport logic.
Current anchor surface:
- SpiritSafe cache entities and generated profile artifacts
- Mash recentchanges polling and cache refresh commands
- Shipper write execution paths when programmatic writes are required
Handoff Flows
Flow 1: Semantic Cache Synchronization
mashpolls MediaWiki recentchanges for Wikibase entity updates.mashrefreshes per-entity cache artifacts in SpiritSafe format.spirit_safematerializes JSON profile artifacts from cache entities.- Downstream packet and validation flows consume those artifacts.
Flow 2: Ontology Dogfooding (Next-Wave Entity Types)
- Profile definitions describe ontology entities to provision.
wikibaseorchestration resolves desired vs existing state.mashperforms lookup/reconciliation reads.bottlerandwikibaseorchestration shape payload structures.shipperperforms dry-run/execute writes.
Flow 2.5: Shared Profile-to-Write Planning Pipeline (Active)
spirit_safeloads and exports JSON Entity Profiles and value-list cache artifacts.still_chargerassembles curation packets from profile JSON using bottler's EntityShellBuilder to build canonical Wikibase JSON entity shells for profile-only packets.fermenterevaluates atomic statement instances (including qualifiers/references), coerces values, and serializes packet-facing conformance records.shippercomputes create/update/no-op diff plans and executes writes when enabled.
Flow 2.5a: Profile-Only Packet Wikibase JSON Shell Generation (New)
still_chargercalls bottler'sEntityShellBuilderduringbuild_curation_packet_from_json_profile.- For each profile in the packet, bottler extracts identification metadata (labels/descriptions/aliases) and statement property IDs.
- Bottler builds canonical Wikibase entity shells with:
- Language-keyed labels, descriptions, aliases blocks
- Empty claims dictionary with deterministically sorted property IDs
- Shells are embedded in
data.entities[*].entityfor deterministic, shape-consistent packet generation. - Profile-only packets require no charging and can proceed directly to fermenter validation.
- Charged packets (with Wikidata values) merge this shell with charged statement instances.
Flow 2.6: SpiritSafe JSON Profile Materialization (Active)
mashrefreshes and reconciles per-entity cache files.spirit_safebuilds JSON Entity Profiles fromstill/entitiesartifacts.spirit_safeexports per-profile JSON files (for examplestill/profiles/Q4.json).- Downstream packet/hydration stages consume exported profile artifacts.
Flow 2.7: Value-List Query Hydration (Active)
spirit_safediscovers value-list entities and resolves list class semantics.mashreads value-list talk pages and extracts class-coupled payload blocks.- For
sparql_value_list,spirit_safewrites the first<sparql>block tostill/value_lists/queries/<QID>.sparql. - For
embedded_value_list,spirit_safewrites the first<syntaxhighlight lang="json">block tostill/value_lists/cache/<QID>.json. - If a watched talk-page block is deleted, the corresponding materialized artifact is removed on the next sync.
- SPARQL hydration runs as a separate step for SPARQL-backed lists and does not overwrite cache on failed refresh.
- Meta-Wikibase conformance checks compare against the materialized SpiritSafe query/cache artifacts rather than the live Wikibase talk pages.
Flow 2.8: Packet Re-entry and Forward Migration (Planned)
still_chargeror another caller loads a previously minted curation packet.fermentervalidates packet type/shape first; structural/type failures are hard blockers.spirit_safeprovides current profile/value-list artifact metadata for compatibility comparison.- Network-backed semantic revision context may be added in the future when needed.
fermenterclassifies drift (patch_compatible,minor_compatible,migration_required,breaking) and applies approved migration transforms when available.fermenterre-validates the packet after migration and emits compatibility notices plus migration report data.- Downstream write-planning and shipping proceed only if the packet remains structurally valid and any required migration succeeded.
Flow 3: Semantic Projection for Runtime Artifacts
mashretrieves semantic entities and related metadata.- Runtime orchestration applies current write-planning transformations.
bottlershapes final claim/snak structures where transport payload format is required.- Artifacts are validated against SpiritSafe/runtime schema contracts.
- Runtime manifests track projection provenance and drift metadata.
Flow 4: Sync and Drift Management
mashreads revision/update baselines.shipper.plan_batchcomputes deterministic write-operation diffs.- Runtime sync policy applies conflict strategy.
shipperexecutes writes when sync direction targets remote Wikibase.- Reports and manifest metadata are emitted for traceability.
Non-Negotiable Contracts
- Do not add a new generic Wikibase client outside
gkc.mash. - Do not bypass
shipperfor Wikibase write execution paths. - Keep SpiritSafe runtime contracts stable and testable.
- Preserve offline-first behavior: network-backed enhancement must not break cache-only operation.
- Preserve JSON profile export determinism (stable ordering and artifact path shape).
- Treat packet type/shape conformance as the primary hard blocker; other conformance failures should default to actionable notices unless policy explicitly escalates them.
- Do not add bridge/shim module aliases that preserve
gkc.profiles.forms.*as a shadow wizard API. - Do not add new runtime validation/coercion logic under
gkc.profiles.*.
Decision Matrix for New Work
When adding new functionality, assign ownership using this matrix:
- Need to fetch/query source entity data? ->
mash - Need to build/export profile JSON artifacts from SpiritSafe cache entities? ->
spirit_safe - Need to assemble curation packets from profile JSON and populate them with source values? ->
still_charger - Need atomic validation/coercion and conformance notices? ->
fermenter - Need to build/shape values into claim/snak/payload structures? ->
bottler - Need schema/specification retrieval? ->
mash - Need to execute write operations to external APIs? ->
shipper - Need Data Distillery semantic orchestration or ontology conformance? -> DD Wikibase ontology + SpiritSafe artifacts
Current Gaps to Revisit
- Boundaries between wikibase write planning and bottler transformation stages still need explicit acceptance criteria.
- Cross-module tests should identify failure source by layer (read, transform, payload-shape, write, orchestration).
- Packet compatibility metadata, change classification, and forward-migration rules for long-lived offline packets remain to be implemented.
Additional contract-alignment gaps:
- Packet
dataremains transitional hybrid shape in current runtime implementation and must be normalized per #200 contract direction. - Still charger should not patch fermenter record fields post-serialization; missing fields must be addressed in fermenter-owned serializer contracts.
Additional active boundary cleanup:
- The old YAML-era
gkc.profilesruntime path (loaders,generators,validation, and related CLI surfaces) is superseded and should be removed unless a concrete retained consumer is explicitly approved. GKCEntityProfileshould be integrated with fermenter-owned validation/coercion pathways (including Pydantic-backed validation surfaces where applicable).- Core architecture classes should remain explicit and aligned with top-level components:
GKCEntityProfile,GKCEntityStatement,GKCValueList, andGKCCurationPacket.
Theoretical Design Notes
Execute-Mode Safety Contract (Planned)
This section documents the intended cross-module safety contract for execute mode. It is not fully implemented yet.
Required sequence:
wikibasebuilds plan artifacts through the shared packet pipeline.shipper.plan_batchproduces create/update/no-op/blocked preview.- caller explicitly confirms execute intent.
shipperperforms writes with summary/auth/bot context.wikibaseemits execution report with provenance and failure localization.
Non-negotiable execution guardrails:
- no implicit writes from planning commands
- explicit execute flag required for write calls
- authenticated mode required when policy or target instance requires it
- dry-run report shape should mirror execute report shape for parity
- failures should remain attributable to layer (charge, barrel, shipper, orchestration)
Open design questions:
- whether write execution should stop-on-first-failure or continue-and-report
- whether execute should consume only on-disk plan artifacts or in-memory plan results
- whether operation idempotency checks belong only in shipper or in both shipper and orchestration
Handoff Summary Template (for Agent-to-Agent Continuity)
Use this concise structure when handing work from one module owner to another:
- Scope completed:
- Module touched:
- Public contracts used:
- Assumptions made:
- Open risks:
- Next owning module:
- Inputs required for next step: