Skip to content

Mash Commands

Plain meaning: Load source data as ingredients for further actions.

Overview

The mash module in GKC handles the input of various data and information content and structure that will be processed through data distillery workflows. The mash CLI provides an interface to load Wikidata entities - items (QID), properties (PID), EntitySchemas (EID) - as well as Wikipedia templates.

The name "mash" comes from the distillery metaphor—like grain that's been milled and steeped to extract fermentable sugars, mashed entities extract the essential structure and content from source data, readying the ingredients for further processing.

Current implementations: - Wikidata items (QID) - Wikidata properties (PID) - Wikidata EntitySchemas (EID) - Wikipedia templates

Future implementations: CSV files, JSON APIs, dataframes


Load Wikidata Items by QID

gkc mash qid <QID> [options]

Load one or more Wikidata items by QID and output them in various formats.

Arguments

  • qid: Positional argument for a single item ID (e.g., Q42)
  • --qid <QID>: Repeatable flag for multiple items (e.g., --qid Q42 --qid Q5)
  • --qid-list <file>: Path to file containing item IDs (one per line)

Output Options

  • -o, --output <file>: Write output to file instead of stdout
  • --raw: Output raw JSON to stdout (default behavior when no transform specified)
  • --summary: Output a summary of the item(s) with labels, descriptions, and statement count
  • --transform <type>: Transform the output
  • shell: Strip all identifiers for new item creation
  • qsv1: Convert to QuickStatements V1 format
  • gkc_entity_profile: Convert to GKC Entity Profile (not yet implemented)

Filtering Options

  • --include-properties <P1,P2,...>: Comma-separated list of properties to include
  • --exclude-properties <P1,P2,...>: Comma-separated list of properties to exclude
  • --exclude-qualifiers: Omit all qualifiers from output
  • --exclude-references: Omit all references from output
  • --no-entity-labels: Skip fetching entity labels for QuickStatements comments (faster)

Examples

Load a single item and display summary

gkc mash qid Q42 --summary

Output: JSON summary with labels, descriptions, and statement count

Load a single item (raw JSON)

gkc mash qid Q42 --raw

Output: Raw Wikidata JSON for item Q42

Load multiple items

# Using repeatable --qid flags
gkc mash qid --qid Q42 --qid Q5 --qid Q30

# Using a file list
echo "Q42
Q5
Q30" > items.txt
gkc mash qid --qid-list items.txt

Transform to shell for new item creation

gkc mash qid Q42 --transform shell -o new_item_template.json

Strips all identifiers (id, pageid, ns, title, statement IDs, hashes) to create a clean template for submitting as a new item.

Transform to QuickStatements

# For editing existing item
gkc mash qid Q42 --transform qsv1

# Extract specific properties only
gkc mash qid Q42 --transform qsv1 --include-properties P31,P21,P569

Filter properties and save

gkc mash qid Q42 \
  --exclude-properties P18,P373 \
  --exclude-qualifiers \
  --exclude-references \
  -o filtered_item.json

Load Wikidata Properties by PID

gkc mash pid <PID> [options]

Load one or more Wikidata properties by PID and output them in various formats.

Arguments

  • pid: Positional argument for a single property ID (e.g., P31)
  • --pid <PID>: Repeatable flag for multiple properties (e.g., --pid P31 --pid P279)
  • --pid-list <file>: Path to file containing property IDs (one per line)

Output Options

  • -o, --output <file>: Write output to file instead of stdout
  • --raw: Output raw JSON to stdout (default behavior)
  • --summary: Output a summary of the property(ies) with labels, descriptions, and datatype
  • --transform <type>: Transform the output
  • shell: Strip all identifiers for new property creation
  • gkc_entity_profile: Convert to GKC Entity Profile (not yet implemented)

Examples

Load a single property and display summary

gkc mash pid P31 --summary

Output: JSON summary with labels, descriptions, datatype, and formatter URL

Load a single property (raw JSON)

gkc mash pid P31 --raw

Output: Raw Wikidata JSON for property P31

Load multiple properties

# Using repeatable --pid flags
gkc mash pid --pid P31 --pid P279 --pid P21

# Using a file list
echo "P31
P279
P21" > properties.txt
gkc mash pid --pid-list properties.txt

Transform to shell for new property creation

gkc mash pid P31 --transform shell -o new_property_template.json

Load Wikidata EntitySchemas by EID

gkc mash eid <EID> [options]

Load a Wikidata EntitySchema by EID and output it in various formats.

Arguments

  • eid: The EntitySchema ID (e.g., E502)

Output Options

  • -o, --output <file>: Write output to file instead of stdout
  • --raw: Output raw JSON to stdout (default behavior)
  • --summary: Output a summary of the EntitySchema with labels and descriptions
  • --transform <type>: Transform the output
  • shell: Strip all identifiers for new EntitySchema creation
  • gkc_entity_profile: Convert to GKC Entity Profile

Examples

Load an EntitySchema and display summary

gkc mash eid E502 --summary

Output: JSON summary with labels, descriptions, and schema text length

Load an EntitySchema (raw JSON)

gkc mash eid E502 --raw

Output: Raw EntitySchema JSON including labels, descriptions, and ShEx schema text

Transform to GKC Entity Profile

gkc mash eid E502 --transform gkc_entity_profile -o tribe_profile.json

Converts the EntitySchema's ShEx specification into a GKC Entity Profile that can be used for data validation and transformation.

Transform to shell for reuse

gkc mash eid E502 --transform shell -o new_schema_template.json

Load Wikipedia Templates

gkc mash wp_template <TEMPLATE_NAME> [options]

Load a Wikipedia template from en.wikipedia.org and output it in various formats.

Arguments

  • template_name: The Wikipedia template name (e.g., Infobox_settlement)

Output Options

  • -o, --output <file>: Write output to file instead of stdout
  • --raw: Output raw JSON response from the Wikimedia API
  • Default (no flags): Output summary of the template with title, description, and parameter count

Examples

Load a Wikipedia template and display summary

gkc mash wp_template Infobox_settlement

Output:

{
  "title": "Infobox settlement",
  "description": "An infobox used to summarize information about places or geographic entities",
  "param_count": 47
}

Get the full template structure

gkc mash wp_template Infobox_settlement --raw -o settlement_template.json

Output: Full JSON structure including title, description (multilingual), params, and paramOrder

Explore template parameters

gkc mash wp_template Infobox_settlement --raw | jq '.paramOrder[:10]'

Lists the first 10 parameters in order, useful for understanding template structure.



Batch Processing Patterns

Load multiple items from a file

# Create a file with QIDs
cat > items.txt <<EOF_MARKER
Q42
Q5
Q30
# Comments are ignored
Q515
EOF_MARKER

# Process all items
gkc mash qid --qid-list items.txt -o batch_items.json

Transform multiple items to QuickStatements

# Load multiple items and convert to QS for batch editing
gkc mash qid --qid Q42 --qid Q5 --transform qsv1 -o batch_statements.qs

Property metadata extraction

# Extract metadata for a set of properties
cat > props.txt <<EOF_MARKER
P31
P279
P21
P569
P570
EOF_MARKER

gkc mash pid --pid-list props.txt -o property_metadata.json

Output Formats

Raw JSON (default)

The raw Wikidata entity JSON as returned by the API. This format preserves all structure and identifiers, suitable for: - Round-trip processing - Integration with other tools - Detailed inspection

Shell (--transform shell)

Strips all system identifiers and metadata: - Removes: id, pageid, lastrevid, modified, ns, title - Removes: statement IDs (GUIDs) - Removes: all hashes from snaks, qualifiers, and references

Use this when creating templates for new entity creation on Wikidata or Wikibase instances.

QuickStatements V1 (--transform qsv1, items only)

Converts item data to QuickStatements V1 format for bulk operations: - Format: QID|property|value - Includes property labels as comments for readability - Use on QuickStatements for batch editing

GKC Entity Profile (--transform gkc_entity_profile, EntitySchemas only)

Converts EntitySchemas to GKC Entity Profiles: - Extracts properties and constraints from ShEx - Creates portable profiles for validation and transformation - Currently only implemented for EntitySchemas


Common Workflows

Creating Similar Items

  1. Find anexemplar item on Wikidata (e.g., Q42)
  2. Load with shell transform: gkc mash qid Q42 --transform shell -o template.json
  3. Edit the template JSON to modify labels/values
  4. Submit to Wikidata using the wbeditentity API or QuickStatements

Property Documentation

# Extract metadata for all properties in a domain
gkc mash pid --pid-list biological_properties.txt -o bio_props.json

EntitySchema Development

# Load existing schema as starting point
gkc mash eid E502 --transform shell -o new_schema.json

# Or convert to profile for analysis
gkc mash eid E502 --transform gkc_entity_profile -o tribe_profile.json

Migration from Previous CLI

The mash CLI has been refactored for consistency. Key changes:

Old:

gkc mash qid Q42 --output qsv1 --new
gkc mash qid Q42 --output json

New:

gkc mash qid Q42 --transform qsv1
gkc mash qid Q42  # raw JSON is default
gkc mash qid Q42 --transform shell  # for new items

Changes: - --output now means output file path, not format - --transform specifies the transformation type - --new flag removed (use --transform shell or --transform qsv1 with for_new_item) - --save-profile replaced with -o, --output - Added support for batch loading with --qid-list, --pid-list - Added mash pid command for properties - Simplified mash eid command