GKC Architecture Overview
Introduction
The Global Knowledge Commons (GKC) is a framework for understanding and working with structured knowledge across multiple open public platforms. The initial design focuses on Wikidata, Wikimedia Commons, Wikipedia templates, and OpenStreetMap.
The project uses a data distillery metaphor to describe the pipeline that converts raw, heterogeneous inputs into validated, platform-ready outputs. Mash Bills describe incoming structure, Modulation Profiles guide transformation, GKC Entity Profiles define canonical entity forms, and Barrel Profiles represent downstream platform-specific targets.
The GKC is built on three foundational architectural components that work together to transform declarative entity definitions into actionable curation workflows:
- GKC Entity Profiles — Declarative YAML definitions of entity structure and semantics
- GKC Entity JSON Schemas — Machine-readable, serializable representations of profiles
- GKC Curation Packets — Actionable bundles of 1+ entities flowing through curation workflows
These components enable a consistent pattern: define once, use everywhere. A profile written in YAML drives wizard UI generation, validation logic, bulk data operations, API contracts, and cross-platform serialization—all from a single source of truth.
Core Architectural Components
GKC Entity
A GKC Entity is a semantically coherent representation of a real-world thing rooted in the Wikibase/Wikidata model and extended across platforms in the Global Knowledge Commons.
Multiple platforms contribute to and consume a single GKC Entity:
- Wikibase/Wikidata foundation: Labels, descriptions, aliases, statements, qualifiers, references, and sitelinks.
- Linked entities: Item-valued statements naturally connect entities.
- Multi-entity workflows: One curation action may require adding or updating related entities.
- Cross-platform integration: Canonical data in Wikidata can drive content in Commons, Wikipedia, and OSM.
GKC Entity Profile
Definition: GKC Entity Profiles are declarative definitions of the canonical structure, semantics, and cross-platform meaning of a real-world entity in the Global Knowledge Commons.
Implementation: Profiles exist as YAML files in the SpiritSafe registry (profiles/<ProfileID>/profile.yaml) and are loaded into Pydantic models at runtime, serving as the authoritative source of truth for:
- Entity structure — What statements, qualifiers, and references constitute the entity
- Validation rules — Constraints, datatypes, cardinality, required vs optional fields
- Cross-platform semantics — Mappings to Wikidata properties, Wikimedia Commons categories, OSM tags
- UI generation — Field labels, guidance text, input prompts, allowed-items lists
- Profile relationships — How entities link to other entity types (profile graphs)
Key Characteristics:
- Declarative, not imperative — Profiles describe what an entity is, not how to build it
- Human-readable and machine-executable — YAML is both documentation and runtime specification
- Version-controlled — Managed in SpiritSafe repository with CHANGELOG tracking
- Profile-driven workflows — Wizards, validation engines, and serializers consume profiles directly
Related Documentation:
GKC Entity JSON Schemas
Definition: GKC Entity JSON Schemas are machine-readable, serializable representations of GKC Entity Profiles that provide stable contracts for API routes, external tools, and inter-profile composition.
Implementation: Generated programmatically from Pydantic models (which are in turn loaded from YAML profiles), JSON Schemas:
- Define data contracts — External tools can validate GKC Entity JSON without importing Python code
- Enable profile composition — Profiles reference one another via
entity_profilestatements, forming graphs - Support dynamic UI generation — Web frontends can generate forms from JSON Schema definitions
- Provide type safety — IDE autocomplete, validation, and API documentation all derive from schemas
Relationship to Profiles:
Profile YAML → Pydantic Model → JSON Schema → API Contract
↓ ↓
Validation External Tools
Profiles are the source of truth; JSON Schemas are the machine interface.
Related Documentation:
GKC Curation Packets
Definition: A GKC Curation Packet is the actionable bundle of information required to create or edit one or more entities. It combines:
- Primary entity — The entity being directly curated
- Related entities — Secondary entities linked via profile graph (e.g., offices, organizations)
- Packet metadata — Creation timestamps, curator username, status tracking
- Local reference system —
packet_ididentifiers (e.g.,ent-001-primary,ent-002-office) that resolve to Wikidata QIDs post-shipping
Packet Lifecycle:
- Creation — Wizard or bulk loader initializes packet from profile(s)
- Population — User (wizard) or automation (bulk op) fills entity data
- Validation — Profile-driven validators check completeness and constraints
- Shipping — Serializers transform packet → Wikidata JSON, resolve cross-entity references
- Post-creation — QIDs returned from Wikidata; packet updated with
status: waiting_for_qid
Status Values:
in_progress— Curator actively editingready_to_resolve_refs— All data entered; awaiting cross-entity QID resolutionwaiting_for_qid— Shipped to Wikidata; awaiting item creation response
Creation Path Breadcrumbs:
The creation_path field tracks entity provenance:
primary— Root entity loaded directly by wizard/bulk opprimary.office_held_by_head_of_state— Created via sub-wizard from primary entity's office statementprimary.headquarters.location— Nested entity (location of headquarters of primary)
This enables dependency ordering (ship entities depth-first), audit trails, and rollback logic.
Related Documentation:
SpiritSafe
SpiritSafe is the profile registry and supporting query/cache infrastructure. It stores profile packages (profile.yaml, metadata.yaml, docs, and queries/) and provides a source for GKC runtime loading in local or GitHub-backed modes.
For complete documentation on SpiritSafe, see SpiritSafe Registry.
Architectural Data Flow
Profile → Wizard → Packet → Wikidata
┌─────────────────────────────────────────────────────────────────┐
│ SpiritSafe Registry │
│ profiles/TribalGovernmentUS/profile.yaml │
│ profiles/OfficeHeldByHeadOfState/profile.yaml │
└────────────────────┬────────────────────────────────────────────┘
│ Load profiles
↓
┌─────────────────────────────────────────────────────────────────┐
│ GKC Spirit Safe Module │
│ - Parse YAML → Pydantic EntityProfile models │
│ - Build profile graph (TribalGov → Office linkage) │
│ - Hydrate SPARQL allowed-items lists │
└────────────────────┬────────────────────────────────────────────┘
│ Provide profile models
↓
┌─────────────────────────────────────────────────────────────────┐
│ GKC Wizard │
│ - Generate 5-step UI from profile metadata │
│ - Create curation packet (ent-001 primary, ent-002 office) │
│ - Collect user input with real-time validation │
└────────────────────┬────────────────────────────────────────────┘
│ Curation packet (unsaved)
↓
┌─────────────────────────────────────────────────────────────────┐
│ Validation Engine │
│ - Check completeness (required fields filled?) │
│ - Validate constraints (datatypes, cardinality, allowed-items) │
│ - Cross-entity validation (future: office.inception ≤ gov.inception) │
└────────────────────┬────────────────────────────────────────────┘
│ Validated packet
↓
┌─────────────────────────────────────────────────────────────────┐
│ Shipper │
│ - Serialize GKC Entity JSON → Wikidata JSON │
│ - Resolve cross-entity references (ent-002 → create office, get QID) │
│ - Ship entities depth-first (office before tribal gov) │
└────────────────────┬────────────────────────────────────────────┘
│ Wikidata API calls
↓
┌─────────────────────────────────────────────────────────────────┐
│ Wikidata │
│ - Create office item → Q999888 (new QID) │
│ - Create tribal gov item → Q999889 │
│ - Tribal gov.P1906 = Q999888 (office held by head of state) │
└─────────────────────────────────────────────────────────────────┘
Profile Graph Discovery
When wizard loads TribalGovernmentUS, it:
- Scans statements for
entity_profiletypes → findsoffice_held_by_head_of_state - Reads
value.profile_name: OfficeHeldByHeadOfState - Loads
OfficeHeldByHeadOfStateprofile recursively - Builds profile graph:
TribalGovernmentUS→OfficeHeldByHeadOfState - Creates packet with placeholders for both entities
Future metadata enhancement will make this explicit via metadata.yaml profile_graph section (see Multi-Profile Configuration).
Design Principles
1. Profiles Are the Single Source of Truth
No hardcoded entity logic exists in wizards, validators, or serializers. Everything derives from profiles:
- Field labels →
profile.statements[].label - Validation rules →
profile.statements[].constraints - UI widgets →
profile.statements[].value.type - Allowed values →
profile.statements[].value.allowed_items(SPARQL-driven)
Anti-pattern: Wizard code that says "if entity is tribal government, add office field". Instead: Profile declares office statement; wizard reads it.
2. Declarative Over Imperative
Profiles describe what an entity is, not how to build it. The engine interprets profiles and generates behavior dynamically.
3. Cross-Platform by Design
Profiles map GKC concepts to multiple platforms simultaneously:
- Wikidata: Property IDs (P31, P1906), datatype semantics
- Wikimedia Commons: Category structure, file upload patterns
- Wikipedia: Sitelinks, article naming conventions
- OpenStreetMap: Relation IDs, tag mappings
Future platform integrations (e.g., DBpedia, Schema.org) extend profiles without changing core architecture.
4. Graph-Oriented Entity Modeling
Entities rarely exist in isolation. Profiles model relationships explicitly:
TribalGovernmentUSdeclares linkage toOfficeHeldByHeadOfState- Packets bundle related entities together
- Validation can span entity boundaries (future)
This mirrors real-world curation: creating a tribal government often requires creating its leadership office.
5. Fail Gracefully, Validate Continuously
Following Wikipedia/Wikidata philosophy:
- Curators can save incomplete entities — Validation warns but doesn't block
- Real-time validation — Feedback on every field blur, not just on submit
- Progressive enhancement — Minimal entities ship quickly; enrichment happens iteratively
Implementation Status
Stable & Production-Ready:
- Profile YAML loading and Pydantic model generation
- SPARQL allowed-items hydration with fallback lists
- Single-entity curation packets
- Wikidata JSON serialization
- Statement, qualifier, reference validation
In Development (Wizard MVP):
- Multi-entity packets with cross-entity references
- Profile graph metadata (
metadata.yamlenhancements) - Wizard multi-tab UI for related entities
- Status tracking lifecycle
Planned (Post-MVP):
- QID-based packet hydration (load existing Wikidata items into packets)
- Cross-entity validation rules
- Bulk operations with statement filtration
- Recursive profile graph loading (depth > 1)
- Round-trip transformation (Wikidata → GKC Entity JSON → Wikidata)
Detailed Architecture Documents
This architecture section includes detailed documentation on specific subsystems:
Related Documentation
Core Concepts
- GKC Entity Profiles: Construction and Reference
- GKC Entity JSON Schema
- GKC Wizard Documentation
- SpiritSafe Registry
Specialized Topics
Developer Guides
Glossary
| Term | Definition |
|---|---|
| Profile | YAML definition of entity structure in SpiritSafe registry |
| Packet | Bundle of 1+ entities being curated together |
| Entity | Single real-world thing represented in GKC (tribal government, office, person) |
| Statement | Single property-value assertion about an entity (analogous to Wikidata claim) |
| Profile Graph | Network of profiles linked via entity_profile statements |
| packet_id | Local identifier (e.g., ent-001) used within a packet before QID assignment |
| creation_path | Breadcrumb showing how entity was created (e.g., primary.office) |
| Shipper | Module that serializes packets to platform-specific formats (Wikidata JSON, etc.) |
| Allowed Items | SPARQL-driven choice lists for statement values (e.g., Federal Register issues) |
Theoretical Design Notes
The following are directionally important but not yet fully implemented as stable architecture in GKC:
- Wizard execution environments beyond current local Python interfaces.
- Expanded profile composition and branching workflow semantics.
- Additional cross-platform publishing orchestration beyond current shipping abstractions.
These are retained as design intent for follow-on implementation work by the Wizard Engineer and Validation Agent.
Last Updated: March 3, 2026
Maintainer: Profile Architect
Status: Stable (subject to enhancement as architecture evolves)