Fermenter API
Overview
The gkc.fermenter module is the atomic validation and coercion layer for GKC curation pipelines.
It validates and normalizes individual values against profile-defined constraints.
Both the still charger and the cooperage/barrel layer consume it, and all wizard and CLI pipelines share its output envelope.
All validation surfaces return a ConformanceNotice (or internal ValidationResult) for consistent error reporting across wizard, CLI, and bulk pipelines.
ConformanceNotice
Shared result envelope for all validation and coercion operations.
from gkc.fermenter import ConformanceNotice
notice = ConformanceNotice(
severity="error",
entity_ref="https://datadistillery.wikibase.cloud/entity/Q4",
code="fixed_value_violation",
message="Statement requires fixed value Q7840353 but received Q9592",
statement_ref="https://datadistillery.wikibase.cloud/entity/P5",
normalized_value=None,
)
Fields:
| Field | Type | Description |
|---|---|---|
severity |
str |
"error", "warning", or "info" |
entity_ref |
str |
Full URI or intra-packet entity ID |
code |
str |
Short machine-readable code (e.g., fixed_value_violation) |
message |
str |
Human-readable description |
statement_ref |
str \| None |
Statement entity URI, or None for entity-level notices |
normalized_value |
Any |
Coerced output if coercion succeeded, None otherwise |
ChargeIssue and BarrelIssue are aliases for ConformanceNotice (transition compatibility).
Datatype Validators
Each validator accepts a raw value and returns an internal ValidationResult with valid, value, errors, and warnings fields. Use validate_by_datatype() as the primary dispatcher.
validate_by_datatype(datatype, value)
Dispatch to the appropriate validator based on the Wikibase primitive datatype string.
from gkc.fermenter import validate_by_datatype
result = validate_by_datatype("wikibase-item", "Q195562")
print(result.valid) # True
print(result.value) # {"entity-type": "item", "numeric-id": 195562, "id": "Q195562"}
result = validate_by_datatype("url", "not-a-url")
print(result.valid) # False
print(result.errors) # ["url must start with http:// or https://"]
Supported datatypes: wikibase-item, string, monolingualtext, url, time, quantity, globe-coordinate, commonsMedia.
Returns a ValidationResult(valid=False, ...) with an error message for unrecognized datatypes.
validate_wikibase_item(value)
Validates a Wikibase item reference. Accepts a QID string (coerces to full Wikibase JSON) or a dict already containing an "id" key.
from gkc.fermenter import validate_wikibase_item
# From QID string
result = validate_wikibase_item("Q195562")
# result.value → {"entity-type": "item", "numeric-id": 195562, "id": "Q195562"}
# From existing Wikibase dict
result = validate_wikibase_item({"entity-type": "item", "id": "Q195562"})
# result.valid → True
validate_string(value)
Validates a plain string value. Coerces non-string values to string where possible and emits a warning.
from gkc.fermenter import validate_string
result = validate_string("Cherokee Nation")
# result.valid → True, result.value → "Cherokee Nation"
result = validate_string(12345)
# result.valid → True, result.value → "12345", result.warnings → ["Coerced int to string"]
validate_monolingualtext(value)
Validates a monolingual text dict. Requires both "language" and "text" string fields.
from gkc.fermenter import validate_monolingualtext
result = validate_monolingualtext({"language": "en", "text": "Cherokee Nation"})
# result.valid → True
validate_url(value)
Validates a URL string. Must start with http:// or https://.
from gkc.fermenter import validate_url
result = validate_url("https://www.cherokee.org")
# result.valid → True
result = validate_url("www.cherokee.org")
# result.valid → False
validate_time(value)
Validates a Wikibase time dict. Requires time, timezone, before, after, and calendarmodel fields.
from gkc.fermenter import validate_time
result = validate_time({
"time": "+2020-01-15T00:00:00Z",
"timezone": 0,
"before": 0,
"after": 0,
"calendarmodel": "http://www.wikidata.org/entity/Q1985727",
})
# result.valid → True
validate_quantity(value)
Validates a Wikibase quantity dict. Requires amount and unit fields.
from gkc.fermenter import validate_quantity
result = validate_quantity({
"amount": "+3500",
"unit": "http://www.wikidata.org/entity/Q11573",
})
# result.valid → True
validate_globe_coordinate(value)
Validates a globe-coordinate dict. Requires latitude, longitude, precision, and globe fields.
Latitude must be in [-90, 90] and longitude in [-180, 180].
from gkc.fermenter import validate_globe_coordinate
result = validate_globe_coordinate({
"latitude": 35.5,
"longitude": -95.0,
"altitude": None,
"precision": 0.0001,
"globe": "http://www.wikidata.org/entity/Q2",
})
# result.valid → True
validate_commons_media(value)
Validates a Wikimedia Commons filename string. Must be a non-empty string.
from gkc.fermenter import validate_commons_media
result = validate_commons_media("Cherokee Nation seal.svg")
# result.valid → True
Value List Validation
validate_value_from_list(value, value_list_path, match_policy)
Validate a candidate item value against a cached value list JSON file. Follows an offline-first design: if the cache file is absent, returns an error rather than attempting live resolution.
from pathlib import Path
from gkc.fermenter import validate_value_from_list
result = validate_value_from_list(
value="Q195562",
value_list_path=Path("/path/to/SpiritSafe/cache/queries/Q4.json"),
match_policy="strict",
)
print(result.valid) # True if Q195562 is in the cached list
# With fuzzy label matching
result = validate_value_from_list(
value={"id": "Q195562", "label": "Cherokee Nation"},
value_list_path=Path("/path/to/SpiritSafe/cache/queries/Q4.json"),
match_policy="fuzzy",
)
Arguments:
| Argument | Type | Description |
|---|---|---|
value |
Any |
A QID string or dict with an "id" key |
value_list_path |
Path |
Path to the cached value list JSON file |
match_policy |
str |
"strict" (QID exact match) or "fuzzy" (label fallback) |
Returns ValidationResult(valid=False, errors=["Value list cache unavailable: ..."]) when the cache file does not exist.
Fixed Value Enforcement
enforce_fixed_value(user_value, fixed_value, statement_ref)
Enforce a profile-defined fixed value constraint on a statement.
- If
user_valueisNone— injects the fixed value and emits aninfonotice - If
user_valuematchesfixed_value— accepts - If
user_valuediffers — rejects with anerrornotice
from gkc.fermenter import enforce_fixed_value
# Auto-injection when user provides nothing
result, notice = enforce_fixed_value(
user_value=None,
fixed_value="Q7840353",
statement_ref="https://datadistillery.wikibase.cloud/entity/P5",
)
# result.valid → True, result.value → "Q7840353"
# notice.code → "fixed_value_injected"
# Violation
result, notice = enforce_fixed_value(
user_value="Q9592",
fixed_value="Q7840353",
statement_ref="https://datadistillery.wikibase.cloud/entity/P5",
)
# result.valid → False
# notice.severity → "error", notice.code → "fixed_value_violation"
Returns (ValidationResult, ConformanceNotice | None). The notice is None when the user value matches the required fixed value exactly.