Skip to content

SPARQL Query Utilities for GKC

A comprehensive SPARQL query utility module for the Global Knowledge Commons (GKC) project, providing a clean Pythonic interface for querying Wikidata and other SPARQL endpoints.

Quick Start

from gkc import SPARQLQuery

# Create executor
executor = SPARQLQuery()

# Execute query
results = executor.query("""
    SELECT ?item ?itemLabel WHERE {
      ?item wdt:P31 wd:Q146 .
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
      }
    }
    LIMIT 10
""")

# Convert to list of plain Python dicts (default shape)
rows = executor.to_dict_list("""
    SELECT ?item ?itemLabel WHERE {
      ?item wdt:P31 wd:Q146 .
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
      }
    }
    LIMIT 10
""")

# Export to CSV
executor.to_csv("SELECT ?item WHERE { ?item wdt:P31 wd:Q146 } LIMIT 10", filepath="results.csv")

Features

  • Multiple Input Formats: Raw SPARQL or Wikidata Query Service URLs
  • Multiple Output Formats: JSON, dictionary lists, CSV, and optional DataFrames
  • Flexible Configuration: Custom endpoints, timeouts, user agents
  • Robust Error Handling: Comprehensive error messages and exception handling
  • Optional Pandas Support: Works with or without pandas
  • Type Hints: Full type annotations for IDE support
  • Comprehensive Tests: 24 test cases with 83% coverage
  • Complete Documentation: Full API reference and examples

Installation

The SPARQL module is included in GKC. For optional pandas support:

pip install pandas

Usage

Basic Query

from gkc import SPARQLQuery

executor = SPARQLQuery()
results = executor.query("SELECT ?item WHERE { ?item wdt:P31 wd:Q5 }")

Query from Wikidata URL

# Share queries as Wikidata URLs
url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20..."
results = executor.query(url)  # Automatically extracts and executes

Convert to Dictionary List (Default)

rows = executor.to_dict_list(query)
for row in rows[:5]:
  print(row)

Convert to DataFrame (Optional)

df = executor.to_dataframe(query)
print(df.head())

Export to CSV

executor.to_csv(query, filepath="results.csv")

Custom Endpoints

executor = SPARQLQuery(
    endpoint="https://dbpedia.org/sparql",
    timeout=60
)

API Reference

Classes

SPARQLQuery

Main class for executing SPARQL queries.

Methods: - query(query, format='json', raw=False) - Execute query - to_dict_list(query) - Convert to list of dicts (recommended default) - to_dataframe(query) - Convert to DataFrame (requires pandas) - to_csv(query, filepath=None) - Export to CSV - parse_wikidata_query_url(url) - Extract query from URL (static) - normalize_query(query) - Normalize query string (static)

SPARQLError

Custom exception for SPARQL query errors.

Functions

execute_sparql(query, endpoint=..., format='json')

Quick function to execute a single query.

execute_sparql_to_dataframe(query, endpoint=...)

Quick function to execute query and return DataFrame.

Documentation

Examples

Example 1: Find Cities with Large Populations

from gkc import SPARQLQuery

executor = SPARQLQuery()

query = """
SELECT ?item ?itemLabel ?population WHERE {
  ?item wdt:P31 wd:Q3624078 .
  ?item wdt:P1082 ?population .
  FILTER(?population > 5000000)
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
ORDER BY DESC(?population)
LIMIT 10
"""

results = executor.to_dict_list(query)
for row in results:
    print(f"{row['itemLabel']}: {row['population']}")

Example 2: Data Analysis Without Pandas

from gkc import SPARQLQuery

executor = SPARQLQuery()
rows = executor.to_dict_list("""
SELECT ?item ?itemLabel ?population WHERE {
  ?item wdt:P31 wd:Q3624078 .
  ?item wdt:P1082 ?population .
}
""")

rows_sorted = sorted(
    rows,
    key=lambda r: int(r.get("population", "0")),
    reverse=True,
)
print(rows_sorted[:10])

Example 3: Data Analysis With Optional DataFrame

from gkc import execute_sparql_to_dataframe

df = execute_sparql_to_dataframe("""
SELECT ?item ?itemLabel ?population WHERE {
  ?item wdt:P31 wd:Q3624078 .
  ?item wdt:P1082 ?population .
}
""")

# Analyze with pandas
top_10 = df.nlargest(10, 'population')
print(top_10)

Testing

Run tests with:

pytest tests/test_sparql.py -v

Results: - 22 tests passed - 2 tests skipped (pandas optional) - 83% code coverage

Resources

License

MIT License - See LICENSE file for details

Author

GKC Contributors