GKC Entity Profiles: Architecture and Authoring Reference
Plain Meaning: GKC Entity Profiles are the declarative source of truth for entity structure, validation shape, and I/O routing across the Global Knowledge Commons.
Overview
A profile package defines:
- What data is collected for an entity type.
- How values are constrained and validated.
- How statements map to inbound and outbound systems.
- How references and qualifiers are collected for provenance and context.
Profiles are curator-facing documents and machine-readable contracts. Runtime components (still_charger, wikibase, shipper, and validation layers) consume profile structure directly.
Operational framing:
- Data Distillery Wikibase defines profile, statement, and value-list semantics in foundation form.
- SpiritSafe materializes those semantics into deterministic JSON/cache artifacts.
gkcexecutes packet and shipping workflows from SpiritSafe materializations.
Implementation Status
Implemented and architecturally committed:
- JSON Entity Profile loading from SpiritSafe (
still/profiles/<QID>.json). - Statement/value/reference/qualifier schema structures in exported profile artifacts.
- SPARQL-driven allowed-items hydration with cached fallbacks (
still/value_lists/cache/<QID>.json). - Directional mapping architecture via
io_map. - URI/QID-aware profile graph traversal from embedded profile metadata.
Theoretical Design Notes
The following directions are architecturally planned but may be implemented incrementally:
- Resolver-backed transform catalogs referenced by
value_transform. - Extended directional mapping forms (
to-through,from-through) for multi-hop routing. - Deeper profile graph traversal policies and orchestration-driven packet expansion.
These notes are design guidance for future implementation planning.
Profile Artifact Structure
Current SpiritSafe publication artifacts use QID-keyed JSON profile files:
still/profiles/<QID>.json
Each profile artifact contains:
entityidentificationstatementsmetadata
metadata includes profile_graph and value_list_graph summaries used by packet scaffolding and registry indexing. value_list_graph includes value-list routes used by top-level statements and nested reference/qualifier statements.
Statement value payloads may also include derived-value semantics used by downstream consumers:
value_source: statement_valuewhen the statement value should be populated from its parent statement value in context.value_source_statementas the URI of the parent statement definition that activates this behavior.
These fields are emitted for qualifier/reference statement instances when the corresponding GKC Entity Statement item declares derives default value from semantics in Wikibase, with applies to profile scoping honored when present.
Statement-spec resolution in materialized JSON follows a specificity-first contract:
statement typeandsame asare always statement-authoritative and are not profile-overrideable.max_countcan be set at statement level and overridden at the profile claim level.has valueat profile claim level replaces statement-level value linkage for that statement instance.has qualifierandhas referenceare resolved per nested statement id:- profile-local entries win for nested statement ids explicitly configured in profile context.
- statement-level defaults still apply for nested statement ids not explicitly configured in profile context.
- Statement-level directive scoping uses
applies to profileandapplies to statementqualifiers with AND-across-properties semantics.
SpiritSafe registry and packet tooling now enumerate still/profiles/*.json directly rather than relying on a separate manifest or entity-index artifact.
Profile Anatomy
Top-Level Keys (JSON Entity Profile)
| Key | Type | Required | Purpose |
|---|---|---|---|
entity |
string | YES | Canonical profile URI |
name_identifier |
string | YES | Human-facing stable profile key |
identification |
object | YES | Multilingual label/description/alias prompts and constraints |
statements |
array | YES | Statement specifications for packet scaffolding and validation |
metadata |
object | YES | Profile graph, value-list graph, and publication metadata |
Statement Keys
| Key | Type | Required | Purpose |
|---|---|---|---|
id |
string | YES | Stable statement identifier (snake_case) |
label |
string | YES | Curator-facing statement name |
input_prompt |
string | NO | Prompt shown in wizard-style UIs |
guidance |
string | NO | Additional curation instructions |
type |
string | YES | Statement role (statement, qualifier, reference) |
io_map |
array | YES | Directional route definitions (to/from) |
max_count |
integer/null | NO | Statement multiplicity limit |
validation_policy |
string | NO | Strictness policy for validation flows |
behavior |
object | NO | Value/reference/qualifier editability semantics |
value |
object | YES | Datatype and constraint definition |
qualifiers |
array | NO | Nested qualifier definitions |
references |
object | NO | Nested reference definitions |
entity_profile |
string | NO | Linked profile identifier for secondary entities |
IO Mapping Architecture
io_map is the canonical mapping model for all routes.
Canonical Shape
{
"name_identifier": "instance_of",
"entity": "https://datadistillery.wikibase.cloud/entity/Q16",
"io_map": [
{"to": "https://www.wikidata.org/entity/P31", "value_transform": null},
{"to": "https://datadistillery.wikibase.cloud/entity/P1", "value_transform": null},
{"from": "resolvable_input_fetcher", "value_transform": "normalize_to_item"}
],
"value": {"type": "item"}
}
Entry Contract
Each io_map entry must include exactly one directional key:
to: outbound destination identifier.from: inbound source identifier.
Optional fields:
value_transform: transform resolver target (nullfor identity behavior).
Identifier Rules
Use resolvable identifiers as first-class routing keys.
- Full IRIs for systems that expose stable entity/property IRIs.
- Resolver keys for abstracted ingestion routes when no stable IRI is available.
Examples:
https://www.wikidata.org/entity/P31https://datadistillery.wikibase.cloud/entity/P1csv:tribe_status_columnapi:tribal_directory.classification
Runtime Semantics
frommappings are inbound transformation routes (fermentation stage).tomappings are outbound serialization routes (bottling stage).- Internal inferencing/distillation remains runtime-owned and may consume declared transform references.
Validation Requirements
Validation of io_map should enforce:
io_mapexists and is non-empty.- Each entry has exactly one of
toorfrom. - No duplicate
toroutes in a statement. - No duplicate
fromroutes in a statement. value_transformis eithernullor a valid resolver target format.
Validation Policy
validation_policy controls strictness when processing existing and newly entered data.
Supported policy values:
allow_existing_nonconformingstrict
allow_existing_nonconforming
Use when preserving existing external data is operationally important while still guiding new curation toward conformance.
strict
Use when all encountered values must satisfy profile constraints before contribution is accepted.
Statement Behavior
behavior controls mutability and responsibility boundaries for value, qualifiers, and references.
Canonical shape:
{
"behavior": {
"value": "editable",
"qualifiers": "editable",
"references": "editable"
}
}
Datatype Reference
Common datatypes:
itemstringurlquantitytimemonolingualtextglobecoordinateexternal-idcommonsMedia
Datatype-specific constraints are declared under value and follow explicit schema validation rules.
References and Provenance
Reference definitions are nested under references and reused by explicit statement identifiers.
Example:
{
"references": {
"allowed": [
{
"name_identifier": "stated_in",
"type": "item",
"io_map": [{"to": "https://www.wikidata.org/entity/P248", "value_transform": null}]
},
{
"name_identifier": "reference_url",
"type": "url",
"io_map": [{"to": "https://www.wikidata.org/entity/P854", "value_transform": null}]
}
]
}
}
Qualifiers
Qualifier definitions are nested statement-like structures with their own io_map, datatype, and constraints.
Example:
{
"qualifiers": [
{
"name_identifier": "point_in_time",
"label": "Point in time",
"type": "qualifier",
"io_map": [{"to": "https://www.wikidata.org/entity/P585", "value_transform": null}],
"value": {"type": "time"}
}
]
}
Statement Types Reference
Statement type currently supports the following roles:
statement: primary claim/value carried by the entity.qualifier: contextual modifier attached to a statement.reference: provenance/source statement attached to a statement.
Profiles should keep role usage explicit and avoid implicit promotion between roles in runtime code.
Entity Metadata Blocks
Profiles can define metadata capture structure for:
labelsdescriptionsaliasessitelinks
These blocks define curation prompts and constraints, not mapping routes.
Secondary Entities and Profile References
Use entity_profile on a statement when the value represents a secondary entity curated through another profile package.
This enables orchestrated multi-entity packet assembly while keeping profile boundaries explicit.
Profile Graphs & Cross-References
profile_graph captures inter-profile adjacency and traversal intent.
Current committed usage:
- Declaring neighbors.
- Declaring edge metadata (
target_profile,via_statement,relationship_type, cardinality hints).
Future implementations may use graph metadata for packet expansion and cross-profile recommendation logic.
Profile Metadata Schema (metadata object)
Canonical fields:
| Key | Type | Required | Purpose |
|---|---|---|---|
name |
string | YES | Profile display name |
description |
string | YES | Registry-facing summary |
version |
string | YES | Semantic version |
status |
string | YES | Publication status |
published_date |
date | NO | Publication date |
authors |
array | NO | Author list |
maintainers |
array | NO | Maintainer list |
source_references |
array | NO | External conceptual sources |
related_profiles |
array | NO | Related profile IDs |
profile_graph |
object | NO | Graph metadata mirror |
community_feedback |
object | NO | Issue tracker links |
datatypes_used |
array | NO | Discovery metadata |
Best Practices
- Keep statements domain-cohesive and curator-readable.
- Prefer explicit constraints over inferred behavior.
- Reuse explicit statement identifiers for shared reference structures.
- Keep
io_maproutes explicit and unambiguous. - Keep transform references declarative; enforce execution policy in code.
- Keep examples synchronized with active schema decisions.
Complete Example (Minimal)
{
"entity": "https://datadistillery.wikibase.cloud/entity/Q4",
"name_identifier": "tribal_government_us",
"statements": [
{
"name_identifier": "instance_of",
"label": "Instance of",
"type": "statement",
"io_map": [
{"to": "https://www.wikidata.org/entity/P31", "value_transform": null},
{"to": "https://datadistillery.wikibase.cloud/entity/P1", "value_transform": null}
],
"value": {"type": "item", "fixed": "Q7840353"}
},
{
"name_identifier": "official_website",
"label": "Official website",
"type": "statement",
"io_map": [{"to": "https://www.wikidata.org/entity/P856", "value_transform": null}],
"value": {"type": "url"}
}
]
}
Step-by-Step Authoring Flow
- Define domain scope and entity boundaries.
- Define statement set and datatypes.
- Add constraints and validation policy.
- Add
io_maproutes for outbound and inbound systems. - Add references/qualifiers and reusable nested statement definitions.
- Add metadata, README, and CHANGELOG.
- Validate profile package and run hydration checks.
Future Enhancements
- Resolver-backed transform catalogs.
- Extended route operators (
to-through,from-through). - Richer graph-driven multi-entity packet orchestration.
- Automated linting for route consistency across profile sets.