ShEx Validation API
Overview
The ShEx validation module provides RDF data validation against ShEx (Shape Expression) schemas. It's designed primarily for validating Wikidata entities against EntitySchemas but supports any RDF data and ShEx schema combination.
Current implementations: Wikidata EntitySchema validation, local file validation
Future implementations: Additional RDF graph sources, streaming validation
Quick Start
from gkc import ShexValidator
# Validate Wikidata item against EntitySchema
validator = ShexValidator(qid='Q14708404', eid='E502')
result = validator.check()
if result.is_valid():
print("✓ Validation passed")
else:
print("✗ Validation failed")
Classes
ShexValidator
ShEx Validator: Validate RDF data against ShEx schemas.
Validates Wikidata entities or local RDF data against EntitySchemas (ShEx format). Supports multiple input sources: Wikidata entities, local files, or text strings.
Plain meaning: Check if data matches schema structure and rules.
Example
Validate a Wikidata item against an EntitySchema
validator = ShexValidator(qid='Q42', eid='E502') result = validator.check() if result.is_valid(): ... print("Validation passed!")
Use local schema file
validator = ShexValidator( ... qid='Q42', ... schema_file='schema.shex' ... ) validator.check()
Use RDF text directly
validator = ShexValidator( ... rdf_text=my_rdf_data, ... schema_text=my_schema ... ) validator.check()
Source code in gkc/shex.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 | |
__init__(qid=None, eid=None, user_agent=None, schema_text=None, schema_file=None, rdf_text=None, rdf_file=None)
Initialize the ShEx validator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
qid
|
Optional[str]
|
Wikidata entity ID (e.g., 'Q42'). Optional if rdf_text or rdf_file provided. |
None
|
eid
|
Optional[str]
|
EntitySchema ID for Wikidata schema (e.g., 'E502'). Optional if schema_text or schema_file provided. |
None
|
user_agent
|
Optional[str]
|
Custom user agent for Wikidata requests. |
None
|
schema_text
|
Optional[str]
|
ShEx schema as ShExC string (alternative to eid). |
None
|
schema_file
|
Optional[str]
|
Path to file containing ShEx schema (alternative to eid). |
None
|
rdf_text
|
Optional[str]
|
RDF data as a string (alternative to qid). |
None
|
rdf_file
|
Optional[str]
|
Path to file containing RDF data (alternative to qid). |
None
|
Source code in gkc/shex.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
__repr__()
String representation of validator.
Source code in gkc/shex.py
285 286 287 288 289 290 291 292 293 294 295 296 297 298 | |
check()
Validate: Load schema, load RDF, and evaluate in one call.
This is the main entry point for validation. It loads schema and data from configured sources, then performs the validation.
Returns:
| Type | Description |
|---|---|
ShexValidator
|
Self with results populated |
Example
validator = ShexValidator(qid='Q42', eid='E502') validator.check() if validator.is_valid(): ... print("Validation passed!")
Source code in gkc/shex.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | |
evaluate()
Evaluate RDF data against the ShEx schema specification.
Must call load_specification() and load_rdf() first, or use check().
Returns:
| Type | Description |
|---|---|
ShexValidator
|
Self with results populated |
Raises:
| Type | Description |
|---|---|
ShexValidationError
|
If evaluation fails or data not loaded |
Source code in gkc/shex.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
is_valid()
Check if validation passed.
Returns:
| Type | Description |
|---|---|
bool
|
True if validation passed, False otherwise |
Raises:
| Type | Description |
|---|---|
ShexValidationError
|
If check() hasn't been called yet |
Source code in gkc/shex.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | |
load_rdf()
Load RDF data from configured source.
Tries sources in order: rdf_text, rdf_file, qid (from Wikidata).
Returns:
| Type | Description |
|---|---|
ShexValidator
|
Self for method chaining |
Raises:
| Type | Description |
|---|---|
ShexValidationError
|
If no valid RDF source or loading fails |
Source code in gkc/shex.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | |
load_specification()
Load the ShEx schema specification from configured source.
Tries sources in order: schema_text, schema_file, eid (fetch from Wikidata).
Returns:
| Type | Description |
|---|---|
ShexValidator
|
Self for method chaining |
Raises:
| Type | Description |
|---|---|
ShexValidationError
|
If no valid schema source or loading fails |
Source code in gkc/shex.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
passes_inspection()
Check if validation passed (alias for is_valid).
Returns:
| Type | Description |
|---|---|
bool
|
True if validation passed, False otherwise |
Raises:
| Type | Description |
|---|---|
ShexValidationError
|
If check() hasn't been called yet |
Source code in gkc/shex.py
170 171 172 173 174 175 176 177 178 179 180 | |
Exceptions
ShexValidationError
Bases: Exception
Raised when ShEx validation encounters an error.
Source code in gkc/shex.py
24 25 26 27 | |
Examples
Validating Wikidata Entities
Validate a Wikidata item against a published EntitySchema:
from gkc import ShexValidator
# Validate federally recognized tribe (Q14708404) against tribe schema (E502)
validator = ShexValidator(qid='Q14708404', eid='E502')
result = validator.check()
if result.is_valid():
print("✓ Item conforms to EntitySchema E502")
else:
print("✗ Item does not conform:")
for res in result.results:
print(f" - {res.reason}")
Validating Local Files
Validate local RDF data against a local ShEx schema:
from gkc import ShexValidator
validator = ShexValidator(
rdf_file='path/to/entity.ttl',
schema_file='path/to/schema.shex'
)
result = validator.check()
print(f"Valid: {result.is_valid()}")
Mixed Sources
You can mix Wikidata and local sources:
# Wikidata entity with local schema
validator = ShexValidator(
qid='Q42',
schema_file='custom_schema.shex'
)
result = validator.check()
# Local RDF data with Wikidata EntitySchema
validator = ShexValidator(
rdf_file='new_entity.ttl',
eid='E502'
)
result = validator.check()
Using Text Strings
For programmatically generated RDF or schemas:
from gkc import ShexValidator
my_rdf = """
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
wd:Q42 wdt:P31 wd:Q5 .
"""
my_schema = """
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
<Human> {
wdt:P31 [ wd:Q5 ]
}
"""
validator = ShexValidator(
rdf_text=my_rdf,
schema_text=my_schema
)
result = validator.check()
Fluent API Pattern
Load and validate step-by-step:
from gkc import ShexValidator
validator = ShexValidator(qid='Q42', eid='E502')
# Load schema
validator.load_specification()
print(f"Schema loaded: {len(validator._schema)} characters")
# Load RDF data
validator.load_rdf()
print(f"RDF loaded: {len(validator._rdf)} characters")
# Perform validation
validator.evaluate()
# Check result
if validator.passes_inspection():
print("✓ Validation passed!")
Custom User Agent
When fetching from Wikidata, use a custom user agent:
from gkc import ShexValidator
validator = ShexValidator(
qid='Q42',
eid='E502',
user_agent='MyBot/1.0 (https://example.com; [email protected])'
)
result = validator.check()
Pre-Upload Quality Check
Validate data before uploading to Wikidata:
from gkc import ShexValidator, ShexValidationError
def validate_before_upload(rdf_data: str, target_schema: str) -> bool:
"""Validate RDF data against target EntitySchema."""
try:
validator = ShexValidator(
rdf_text=rdf_data,
eid=target_schema
)
result = validator.check()
if result.is_valid():
return True
else:
# Log validation errors
for res in result.results:
print(f"Validation error: {res.reason}")
return False
except ShexValidationError as e:
print(f"Validation failed: {e}")
return False
# Use in upload workflow
if validate_before_upload(my_data, 'E502'):
upload_to_wikidata(my_data)
else:
print("Fix validation errors before uploading")
Batch Validation
Validate multiple entities:
from gkc import ShexValidator
entities_to_validate = [
('Q14708404', 'Wanapum'),
('Q3551781', 'Umatilla'),
('Q1948829', 'Muckleshoot')
]
schema_id = 'E502' # Federally recognized tribe schema
results = {}
for qid, name in entities_to_validate:
validator = ShexValidator(qid=qid, eid=schema_id)
validator.check()
results[name] = validator.is_valid()
# Report
for name, valid in results.items():
status = "✓" if valid else "✗"
print(f"{status} {name}")
Error Handling
Common Error Patterns
from gkc.shex import ShexValidationError, ShexValidator
try:
validator = ShexValidator(qid='Q42', eid='E502')
result = validator.check()
if not result.is_valid():
# Parse validation errors
for res in result.results:
reason = res.reason or ""
if "not in value set" in reason:
print(f"Value constraint violation: {reason}")
elif "does not match" in reason:
print(f"Format/type mismatch: {reason}")
elif "Constraint violation" in reason:
print(f"Cardinality or requirement error: {reason}")
except ShexValidationError as e:
print(f"Validation process failed: {e}")
Handling Missing Sources
from gkc.shex import ShexValidationError, ShexValidator
try:
validator = ShexValidator() # No sources provided
validator.check()
except ShexValidationError as e:
print(f"Error: {e}")
# Output: "No schema source provided. Specify eid, schema_text, or schema_file."
See Also
- ShEx CLI Documentation - Command-line interface
- Utilities Guide - General ShEx validation guide
- Wikidata EntitySchemas - Wikidata's schema namespace
- ShEx Primer - Learn ShEx syntax