Skip to content

GKC Entity Profiles: Architecture and Authoring Reference

Plain Meaning: GKC Entity Profiles are the declarative source of truth for entity structure, validation shape, and I/O routing across the Global Knowledge Commons.

Overview

A profile package defines:

  • What data is collected for an entity type.
  • How values are constrained and validated.
  • How statements map to inbound and outbound systems.
  • How references and qualifiers are collected for provenance and context.

Profiles are curator-facing documents and machine-readable contracts. Runtime components (still_charger, wikibase, shipper, and validation layers) consume profile structure directly.

Operational framing:

  • Data Distillery Wikibase defines profile, statement, and value-list semantics in foundation form.
  • SpiritSafe materializes those semantics into deterministic JSON/cache artifacts.
  • gkc executes packet and shipping workflows from SpiritSafe materializations.

Implementation Status

Implemented and architecturally committed:

  • JSON Entity Profile loading from SpiritSafe (still/profiles/<QID>.json).
  • Statement/value/reference/qualifier schema structures in exported profile artifacts.
  • SPARQL-driven allowed-items hydration with cached fallbacks (still/value_lists/cache/<QID>.json).
  • Directional mapping architecture via io_map.
  • URI/QID-aware profile graph traversal from embedded profile metadata.

Theoretical Design Notes

The following directions are architecturally planned but may be implemented incrementally:

  • Resolver-backed transform catalogs referenced by value_transform.
  • Extended directional mapping forms (to-through, from-through) for multi-hop routing.
  • Deeper profile graph traversal policies and orchestration-driven packet expansion.

These notes are design guidance for future implementation planning.

Profile Artifact Structure

Current SpiritSafe publication artifacts use QID-keyed JSON profile files:

still/profiles/<QID>.json

Each profile artifact contains:

  • entity
  • identification
  • statements
  • metadata

metadata includes profile_graph and value_list_graph summaries used by packet scaffolding and registry indexing. value_list_graph includes value-list routes used by top-level statements and nested reference/qualifier statements.

Statement value payloads may also include derived-value semantics used by downstream consumers:

  • value_source: statement_value when the statement value should be populated from its parent statement value in context.
  • value_source_statement as the URI of the parent statement definition that activates this behavior.

These fields are emitted for qualifier/reference statement instances when the corresponding GKC Entity Statement item declares derives default value from semantics in Wikibase, with applies to profile scoping honored when present.

Statement-spec resolution in materialized JSON follows a specificity-first contract:

  • statement type and same as are always statement-authoritative and are not profile-overrideable.
  • max_count can be set at statement level and overridden at the profile claim level.
  • has value at profile claim level replaces statement-level value linkage for that statement instance.
  • has qualifier and has reference are resolved per nested statement id:
  • profile-local entries win for nested statement ids explicitly configured in profile context.
  • statement-level defaults still apply for nested statement ids not explicitly configured in profile context.
  • Statement-level directive scoping uses applies to profile and applies to statement qualifiers with AND-across-properties semantics.

SpiritSafe registry and packet tooling now enumerate still/profiles/*.json directly rather than relying on a separate manifest or entity-index artifact.

Profile Anatomy

Top-Level Keys (JSON Entity Profile)

Key Type Required Purpose
entity string YES Canonical profile URI
name_identifier string YES Human-facing stable profile key
identification object YES Multilingual label/description/alias prompts and constraints
statements array YES Statement specifications for packet scaffolding and validation
metadata object YES Profile graph, value-list graph, and publication metadata

Statement Keys

Key Type Required Purpose
id string YES Stable statement identifier (snake_case)
label string YES Curator-facing statement name
input_prompt string NO Prompt shown in wizard-style UIs
guidance string NO Additional curation instructions
type string YES Statement role (statement, qualifier, reference)
io_map array YES Directional route definitions (to/from)
max_count integer/null NO Statement multiplicity limit
validation_policy string NO Strictness policy for validation flows
behavior object NO Value/reference/qualifier editability semantics
value object YES Datatype and constraint definition
qualifiers array NO Nested qualifier definitions
references object NO Nested reference definitions
entity_profile string NO Linked profile identifier for secondary entities

IO Mapping Architecture

io_map is the canonical mapping model for all routes.

Canonical Shape

{
  "name_identifier": "instance_of",
  "entity": "https://datadistillery.wikibase.cloud/entity/Q16",
  "io_map": [
    {"to": "https://www.wikidata.org/entity/P31", "value_transform": null},
    {"to": "https://datadistillery.wikibase.cloud/entity/P1", "value_transform": null},
    {"from": "resolvable_input_fetcher", "value_transform": "normalize_to_item"}
  ],
  "value": {"type": "item"}
}

Entry Contract

Each io_map entry must include exactly one directional key:

  • to: outbound destination identifier.
  • from: inbound source identifier.

Optional fields:

  • value_transform: transform resolver target (null for identity behavior).

Identifier Rules

Use resolvable identifiers as first-class routing keys.

  • Full IRIs for systems that expose stable entity/property IRIs.
  • Resolver keys for abstracted ingestion routes when no stable IRI is available.

Examples:

  • https://www.wikidata.org/entity/P31
  • https://datadistillery.wikibase.cloud/entity/P1
  • csv:tribe_status_column
  • api:tribal_directory.classification

Runtime Semantics

  • from mappings are inbound transformation routes (fermentation stage).
  • to mappings are outbound serialization routes (bottling stage).
  • Internal inferencing/distillation remains runtime-owned and may consume declared transform references.

Validation Requirements

Validation of io_map should enforce:

  • io_map exists and is non-empty.
  • Each entry has exactly one of to or from.
  • No duplicate to routes in a statement.
  • No duplicate from routes in a statement.
  • value_transform is either null or a valid resolver target format.

Validation Policy

validation_policy controls strictness when processing existing and newly entered data.

Supported policy values:

  • allow_existing_nonconforming
  • strict

allow_existing_nonconforming

Use when preserving existing external data is operationally important while still guiding new curation toward conformance.

strict

Use when all encountered values must satisfy profile constraints before contribution is accepted.

Statement Behavior

behavior controls mutability and responsibility boundaries for value, qualifiers, and references.

Canonical shape:

{
  "behavior": {
    "value": "editable",
    "qualifiers": "editable",
    "references": "editable"
  }
}

Datatype Reference

Common datatypes:

  • item
  • string
  • url
  • quantity
  • time
  • monolingualtext
  • globecoordinate
  • external-id
  • commonsMedia

Datatype-specific constraints are declared under value and follow explicit schema validation rules.

References and Provenance

Reference definitions are nested under references and reused by explicit statement identifiers.

Example:

{
  "references": {
    "allowed": [
      {
        "name_identifier": "stated_in",
        "type": "item",
        "io_map": [{"to": "https://www.wikidata.org/entity/P248", "value_transform": null}]
      },
      {
        "name_identifier": "reference_url",
        "type": "url",
        "io_map": [{"to": "https://www.wikidata.org/entity/P854", "value_transform": null}]
      }
    ]
  }
}

Qualifiers

Qualifier definitions are nested statement-like structures with their own io_map, datatype, and constraints.

Example:

{
  "qualifiers": [
    {
      "name_identifier": "point_in_time",
      "label": "Point in time",
      "type": "qualifier",
      "io_map": [{"to": "https://www.wikidata.org/entity/P585", "value_transform": null}],
      "value": {"type": "time"}
    }
  ]
}

Statement Types Reference

Statement type currently supports the following roles:

  • statement: primary claim/value carried by the entity.
  • qualifier: contextual modifier attached to a statement.
  • reference: provenance/source statement attached to a statement.

Profiles should keep role usage explicit and avoid implicit promotion between roles in runtime code.

Entity Metadata Blocks

Profiles can define metadata capture structure for:

  • labels
  • descriptions
  • aliases
  • sitelinks

These blocks define curation prompts and constraints, not mapping routes.

Secondary Entities and Profile References

Use entity_profile on a statement when the value represents a secondary entity curated through another profile package.

This enables orchestrated multi-entity packet assembly while keeping profile boundaries explicit.

Profile Graphs & Cross-References

profile_graph captures inter-profile adjacency and traversal intent.

Current committed usage:

  • Declaring neighbors.
  • Declaring edge metadata (target_profile, via_statement, relationship_type, cardinality hints).

Future implementations may use graph metadata for packet expansion and cross-profile recommendation logic.

Profile Metadata Schema (metadata object)

Canonical fields:

Key Type Required Purpose
name string YES Profile display name
description string YES Registry-facing summary
version string YES Semantic version
status string YES Publication status
published_date date NO Publication date
authors array NO Author list
maintainers array NO Maintainer list
source_references array NO External conceptual sources
related_profiles array NO Related profile IDs
profile_graph object NO Graph metadata mirror
community_feedback object NO Issue tracker links
datatypes_used array NO Discovery metadata

Best Practices

  • Keep statements domain-cohesive and curator-readable.
  • Prefer explicit constraints over inferred behavior.
  • Reuse explicit statement identifiers for shared reference structures.
  • Keep io_map routes explicit and unambiguous.
  • Keep transform references declarative; enforce execution policy in code.
  • Keep examples synchronized with active schema decisions.

Complete Example (Minimal)

{
  "entity": "https://datadistillery.wikibase.cloud/entity/Q4",
  "name_identifier": "tribal_government_us",
  "statements": [
    {
      "name_identifier": "instance_of",
      "label": "Instance of",
      "type": "statement",
      "io_map": [
        {"to": "https://www.wikidata.org/entity/P31", "value_transform": null},
        {"to": "https://datadistillery.wikibase.cloud/entity/P1", "value_transform": null}
      ],
      "value": {"type": "item", "fixed": "Q7840353"}
    },
    {
      "name_identifier": "official_website",
      "label": "Official website",
      "type": "statement",
      "io_map": [{"to": "https://www.wikidata.org/entity/P856", "value_transform": null}],
      "value": {"type": "url"}
    }
  ]
}

Step-by-Step Authoring Flow

  1. Define domain scope and entity boundaries.
  2. Define statement set and datatypes.
  3. Add constraints and validation policy.
  4. Add io_map routes for outbound and inbound systems.
  5. Add references/qualifiers and reusable nested statement definitions.
  6. Add metadata, README, and CHANGELOG.
  7. Validate profile package and run hydration checks.

Future Enhancements

  • Resolver-backed transform catalogs.
  • Extended route operators (to-through, from-through).
  • Richer graph-driven multi-entity packet orchestration.
  • Automated linting for route consistency across profile sets.

See Also