data.gv.at MCP Server Logodata.gv.at MCP

Previewing Data Structures

Inspect schemas and sample data before downloading full datasets

Verify data structure, column types, and content before committing to full dataset download.

When to use this guide

Use this guide when you need to:

  • Verify required columns exist
  • Check data types before processing
  • Estimate data quality from samples
  • Avoid downloading incompatible formats

Prerequisites

  • Dataset ID or download URL
  • Understanding of your data requirements (required columns, types)

Quick schema preview

Ask Claude to preview

"Show me the schema for dataset [dataset-id]"

Claude fetches distributions, selects downloadable format, and previews schema automatically.

What you'll see:

  • Column names
  • Inferred data types
  • Sample values
  • Row count estimate

Direct schema preview

# First, get download URL
distributions = get_dataset_distributions(dataset_id="bev-stat-wien-2024")
csv_url = next(d['downloadURL'] for d in distributions
               if d.get('format') == 'CSV')

# Preview schema
schema = preview_schema(url=csv_url, format="csv")

Parameters:

Prop

Type

Returns:

{
  "url": "https://data.wien.gv.at/.../data.csv",
  "format": "csv",
  "partial_fetch": true,
  "bytes_fetched": 65536,
  "columns": [
    {
      "name": "jahr",
      "type": "integer",
      "sample_values": [2022, 2023, 2024]
    },
    {
      "name": "bezirk",
      "type": "string",
      "sample_values": ["Innere Stadt", "Leopoldstadt"]
    }
  ]
}

Error handling:

NetworkError:

{"error": "NetworkError", "message": "Failed to fetch URL"}

Solution: URL may be stale, fetch fresh distributions

FormatError:

{"error": "FormatError", "message": "Could not detect delimiter"}

Solution: Specify format explicitly: format="csv"

Validating required columns

Ask Claude to verify columns

"Does dataset [id] have columns X, Y, Z?"

Claude fetches schema and checks for required columns.

Programmatic column validation

# Get schema
schema = preview_schema(url=csv_url)

# Define requirements
required_columns = ["jahr", "bezirk", "faelle"]
actual_columns = [c['name'] for c in schema['columns']]

# Check presence
missing = set(required_columns) - set(actual_columns)
if missing:
    print(f"Missing columns: {missing}")
else:
    print("All required columns present")

# Verify types
jahr_col = next(c for c in schema['columns'] if c['name'] == 'jahr')
if jahr_col['type'] != 'integer':
    print("Warning: Jahr column not integer type")

Error handling:

Column not found:

try:
    jahr_col = next(c for c in schema['columns'] if c['name'] == 'jahr')
except StopIteration:
    print("Jahr column not found in schema")

Previewing data samples

Ask for sample rows

"Show me sample rows from dataset [id]"

Claude fetches first 10-20 rows to show data structure.

Direct data preview

data = preview_data(url=csv_url, format="csv", max_rows=10)

Parameters:

Prop

Type

Returns:

{
  "url": "...",
  "format": "csv",
  "rows": [
    {"jahr": 2024, "bezirk": "Wien", "faelle": 150},
    {"jahr": 2024, "bezirk": "Graz", "faelle": 120}
  ],
  "row_count": 10,
  "estimated_total_rows": 1000
}

Error handling:

ParseError:

{"error": "ParseError", "message": "Invalid CSV at row 5"}

Solution: File may be corrupted or malformed

Checking data quality from samples

Quality assessment questions

Ask Claude to check data quality:

  • "Are there null values in the dataset?"
  • "What's the date range of this data?"
  • "How many unique regions are in the sample?"

Claude analyzes preview data to answer.

Programmatic quality checks

preview = preview_data(url=csv_url, max_rows=50)

# Check for null values
for row in preview['rows']:
    for key, value in row.items():
        if value is None or value == "":
            print(f"Null value in column: {key}")

# Check date range
dates = [row['jahr'] for row in preview['rows'] if 'jahr' in row]
print(f"Date range: {min(dates)} to {max(dates)}")

# Check unique values
bezirke = {row['bezirk'] for row in preview['rows'] if 'bezirk' in row}
print(f"Unique districts in sample: {len(bezirke)}")

Common issues:

Sparse data:

null_count = sum(1 for row in preview['rows']
                 for v in row.values() if v is None)
if null_count > len(preview['rows']) * 0.5:
    print("Warning: >50% null values in sample")

Inconsistent types:

# Check if numeric column has non-numeric values
faelle = [row.get('faelle') for row in preview['rows']]
non_numeric = [c for c in faelle if not isinstance(c, (int, float))]
if non_numeric:
    print(f"Warning: Non-numeric values in faelle: {non_numeric}")

Troubleshooting

Preview fails on valid URL

Symptom: URL works in browser but preview fails

Cause: Server doesn't support HTTP Range requests

Solutions:

  1. Preview falls back to full download (may be slow)
  2. Use smaller max_bytes parameter
  3. Check if different format available

Type inference incorrect

Symptom: Column shows "string" but contains numbers

Cause: Sample rows have inconsistent types or headers

Solutions:

  1. Increase max_bytes to sample more rows
  2. Check if first rows are headers or metadata
  3. Use preview_data to see actual values

Cannot detect format

Symptom: FormatError on auto-detection

Cause: Unusual file format or delimiter

Solutions:

  1. Specify format explicitly: format="csv"
  2. Check file extension matches actual content
  3. Download sample manually to inspect

Partial fetch warning

Symptom: Response shows partial_fetch: true

Cause: File larger than max_bytes limit

Solutions:

  1. This is normal - preview uses partial fetch by design
  2. Increase max_bytes if you need more sample data
  3. Schema should still be accurate from partial data

Next steps

How is this guide?

Last updated on

On this page