ShEx Commands
Plain meaning: Validate RDF data against ShEx schemas.
Overview
The ShEx module in GKC validates RDF data against ShEx (Shape Expression) schemas. It's designed primarily for validating Wikidata entities against EntitySchemas but supports validation of local RDF files as well.
ShEx validation helps ensure data quality by checking that entities conform to expected patterns—required properties, value types, cardinality constraints, and more.
Current implementations: - Wikidata entity validation (QID + EID) - Local RDF file validation - Mixed source validation (Wikidata entity + local schema, or local RDF + EntitySchema)
Future implementations: Batch validation, validation reports, schema coverage analysis
Validate RDF Data
gkc shex validate [options]
Validate RDF data (from Wikidata or local file) against a ShEx schema (from Wikidata EntitySchema or local file).
Source Options
RDF Data Source (choose one):
- --qid <QID>: Fetch RDF data for Wikidata item (e.g., Q42)
- --rdf-file <path>: Use local RDF file (Turtle format)
Schema Source (choose one):
- --eid <EID>: Fetch ShEx schema from Wikidata EntitySchema (e.g., E502)
- --schema-file <path>: Use local ShEx schema file
Other Options
--user-agent <string>: Custom user agent for Wikidata API requests (recommended for bots)
Examples
Validate Wikidata entity against EntitySchema
gkc shex validate --qid Q14708404 --eid E502
Example output:
{
"command": "shex.validate",
"ok": true,
"message": "✓ Validation passed\nEntity: Q14708404\nSchema: E502",
"details": {
"entity": "Q14708404",
"entity_uri": "http://www.wikidata.org/entity/Q14708404",
"schema": "E502",
"valid": true
}
}
Validation failure
gkc shex validate --qid Q999999 --eid E502
Example output:
{
"command": "shex.validate",
"ok": false,
"message": "✗ Validation failed\nError: wdt:P31 value not in allowed set [wd:Q133346]\nEntity: Q999999\nSchema: E502",
"details": {
"entity": "Q999999",
"entity_uri": "http://www.wikidata.org/entity/Q999999",
"schema": "E502",
"valid": false,
"error_summary": "wdt:P31 value not in allowed set [wd:Q133346]"
}
}
Validate local RDF file
gkc shex validate --rdf-file entity.ttl --eid E502
Validates a locally stored Turtle file against Wikidata EntitySchema E502.
Validate Wikidata entity with local schema
gkc shex validate --qid Q42 --schema-file custom_schema.shex
Fetches entity data from Wikidata but uses a custom local ShEx schema.
Validate local files (fully offline)
gkc shex validate --rdf-file entity.ttl --schema-file schema.shex
Both RDF data and ShEx schema are loaded from local files—no network requests.
Custom user agent (for bots)
gkc shex validate \
--qid Q14708404 \
--eid E502 \
--user-agent "MyBot/1.0 (https://example.com; [email protected])"
Use a custom user agent string when fetching data from Wikidata APIs (recommended for automated tools).
Output Format
The gkc shex validate command outputs JSON with the following structure:
{
"command": "shex.validate",
"ok": true|false,
"message": "Human-readable summary with status",
"details": {
"entity": "Q14708404",
"entity_uri": "http://www.wikidata.org/entity/Q14708404",
"schema": "E502",
"valid": true|false,
"error_summary": "Brief error description (only on failure)",
"results": "Full validation results (available if needed)"
}
}
Fields:
- ok: Command execution status (true if command succeeded, false on error)
- valid: Validation result (true if entity conforms to schema, false if not)
- message: Human-readable summary of the validation result with entity/schema info
- error_summary: Brief description of validation errors (present only when valid=false)
- details: Complete validation details including entity URI, schema reference, and error information
Validation Workflow Examples
Pre-upload quality check
Before uploading new data to Wikidata, validate it locally:
# 1. Export new entity as RDF
cat > new_tribe.ttl << 'EOF'
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
wd:QNEW wdt:P31 wd:Q133346 ;
wdt:P17 wd:Q30 ;
wdt:P571 "1979-01-01"^^xsd:date .
EOF
# 2. Validate against tribe schema
gkc shex validate --rdf-file new_tribe.ttl --eid E502
# 3. If validation passes, proceed with upload
Batch validation of existing entities
# Create list of QIDs to validate
cat > tribes_to_check.txt << 'EOF'
Q14708404
Q3551781
Q1948829
Q4675719
EOF
# Validate each entity
while read qid; do
echo "Checking $qid..."
gkc shex validate --qid "$qid" --eid E502 --json >> validation_report.jsonl
done < tribes_to_check.txt
# Parse results
jq 'select(.valid == false) | .qid' validation_report.jsonl
Testing schema changes
When developing new EntitySchemas, test them locally before publishing:
# 1. Create draft schema
cat > draft_schema.shex << 'EOF'
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
<FederallyRecognizedTribe> {
wdt:P31 [ wd:Q133346 ] ;
wdt:P17 [ wd:Q30 ] ;
wdt:P571 xsd:date ? ;
wdt:P576 xsd:date ?
}
EOF
# 2. Test against known good entity
gkc shex validate --qid Q14708404 --schema-file draft_schema.shex
# 3. Test against known problematic entity
gkc shex validate --qid Q999999 --schema-file draft_schema.shex --verbose
# 4. If tests pass, publish schema to Wikidata as new EntitySchema
Comparing entities to multiple schemas
# Check if entity conforms to multiple related schemas
schemas=("E502" "E503" "E504")
for eid in "${schemas[@]}"; do
result=$(gkc shex validate --qid Q14708404 --eid "$eid" --json)
valid=$(echo "$result" | jq -r '.valid')
echo "$eid: $valid"
done
Common EntitySchemas
| EID | Description | Use Case |
|---|---|---|
| E502 | Federally recognized tribe | US tribal nations |
| E49 | Human | Person/biographical items |
| E10 | Place | Geographic locations |
| E42 | Scholarly article | Scientific publications |
Browse all schemas: Wikidata:EntitySchema
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Validation passed (entity conforms to schema) |
| 1 | Validation failed (entity does not conform to schema) |
| 2 | Error during validation process (missing files, network error, etc.) |
Use exit codes in scripts:
if gkc shex validate --qid Q42 --eid E49 --json > /dev/null 2>&1; then
echo "Q42 is a valid human entity"
else
echo "Q42 has validation issues"
fi
See Also
- ShEx API Documentation - Python library usage
- Utilities Guide - ShEx validation concepts and patterns
- Wikidata EntitySchemas - Browse published schemas
- ShEx Primer - Learn ShEx syntax
- ShEx Test Suite - Additional validation examples