Search Examples
Common search patterns and use cases
Learn dataset discovery with practical search patterns you can copy.
Basic text search
Search across all datasets with keywords:
# Find Vienna population datasets
search_datasets(query="Bevölkerung Wien", limit=10)
# Find air quality measurements
search_datasets(query="Luftqualität", limit=10)
# Find traffic counting data
search_datasets(query="Verkehrszählungen", limit=10)Tips:
- Use German keywords (most datasets are in German)
- The API searches titles, descriptions, and keywords
- Relevance ranking ranks results automatically
Use fuzzy matching and wildcards:
# Fuzzy search finds similar terms
search_datasets(query="Gesundheit~", limit=10)
# Wildcard matches multiple terms
search_datasets(query="Energie*", limit=10)
# Exact phrase with quotes
search_datasets(query='"Kriminalstatistik Österreich"', limit=10)
# Boost term importance with ^N
search_datasets(query="Umwelt^2 Klima", limit=10)Advanced features:
~suffix enables fuzzy matching*matches zero or more characters"..."requires exact phrase^Nboosts term importance
Filtered search
By category
Filter by one EU DCAT-AP theme:
# Health datasets only
search_datasets(
query="Krankenhaus",
themes=["HEAL"],
limit=10
)
# Environment datasets
search_datasets(
query="Emissionen",
themes=["ENVI"],
limit=10
)
# Population and society data
search_datasets(
query="Einwohner",
themes=["SOCI"],
limit=10
)Combine themes with OR logic:
# Environment OR Energy datasets
search_datasets(
query="Klimadaten",
themes=["ENVI", "ENER"],
limit=10
)
# Health OR Social datasets
search_datasets(
query="Demografie",
themes=["HEAL", "SOCI"],
limit=10
)
# Government OR Justice datasets
search_datasets(
query="Verwaltung",
themes=["GOVE", "JUST"],
limit=10
)All 13 EU theme codes:
AGRI- Agriculture, fisheries, forestry and foodECON- Economy and financeEDUC- Education, culture and sportENER- EnergyENVI- EnvironmentGOVE- Government and public sectorHEAL- HealthINTR- International issuesJUST- Justice, legal system and public safetyREGI- Regions and citiesSOCI- Population and societyTECH- Science and technologyTRAN- Transport
By format
Filter by file format:
# CSV datasets only
search_datasets(
query="Bevölkerung",
formats=["CSV"],
limit=10
)
# Excel files
search_datasets(
query="Statistik",
formats=["XLSX"],
limit=10
)Common formats: CSV, JSON, XML, XLSX, PDF, HTML, RDF, SHP (shapefile)
Find datasets in multiple formats:
# JSON or XML datasets
search_datasets(
query="Energieverbrauch",
formats=["JSON", "XML"],
limit=10
)
# CSV or Excel datasets
search_datasets(
query="Arbeitslosigkeit",
formats=["CSV", "XLSX"],
limit=10
)
# API-friendly formats
search_datasets(
query="Wetter",
formats=["JSON", "XML", "RDF"],
limit=10
)By publisher
Filter by publishing organization:
# Data from Vienna
search_datasets(
query="",
publishers=["Stadt Wien"],
limit=10
)
# Federal government datasets
search_datasets(
query="",
publishers=["Bundesministerium"],
limit=10
)Combine publishers and filters:
# Vienna or Graz traffic data
search_datasets(
query="Verkehr",
publishers=["Stadt Wien", "Stadt Graz"],
limit=10
)
# Federal CSV datasets
search_datasets(
query="",
publishers=["Bundesministerium", "Statistik Austria"],
formats=["CSV"],
limit=10
)Combined filters
Combine 2-3 filters:
# CSV health data from Vienna
search_datasets(
query="Krankenhaus",
themes=["HEAL"],
formats=["CSV"],
publishers=["Stadt Wien"],
limit=10
)
# Recent environment data
search_datasets(
query="Luftqualität",
themes=["ENVI"],
min_date="2024-01-01",
limit=10
)Multi-filter queries:
# High-quality CSV health data from Vienna, recently updated
search_datasets(
query="Gesundheit Statistik",
themes=["HEAL"],
formats=["CSV"],
publishers=["Stadt Wien"],
min_date="2024-01-01",
boost_quality=True,
sort_by="modified_desc",
limit=10
)
# Multi-theme, multi-format recent datasets
search_datasets(
query="Klima Energie",
themes=["ENVI", "ENER"],
formats=["CSV", "JSON"],
min_date="2023-01-01",
boost_quality=True,
limit=20
)Date range filtering
Filter by modification date:
# Datasets modified in 2024
search_datasets(
query="Bevölkerung",
min_date="2024-01-01",
max_date="2024-12-31",
limit=10
)
# Recently updated (last 30 days)
search_datasets(
query="Verkehr",
min_date="2024-12-15",
limit=10
)
# Historical datasets
search_datasets(
query="Statistik",
max_date="2020-01-01",
limit=10
)Date format: ISO 8601 (YYYY-MM-DD)
Combine date filters with other criteria:
# Recent health datasets from Vienna
search_datasets(
query="Krankenhaus",
themes=["HEAL"],
publishers=["Stadt Wien"],
min_date="2024-01-01",
sort_by="modified_desc",
limit=10
)
# Datasets modified this year
search_datasets(
query="Energie",
min_date="2024-01-01",
max_date="2024-12-31",
limit=10
)
# Legacy datasets still maintained
search_datasets(
query="Demografie",
max_date="2020-01-01",
min_date="2024-01-01",
sort_by="modified_desc",
limit=10
)Sorting options
Control result ordering:
# Most relevant first (default with query)
search_datasets(
query="Bevölkerung Wien",
sort_by="relevance",
limit=10
)
# Most recently modified first
search_datasets(
query="Luftqualität",
sort_by="modified_desc",
limit=10
)
# Alphabetical by title
search_datasets(
query="Verkehr",
sort_by="title_asc",
limit=10
)Available sort options:
relevance- By search relevance (requires query)modified_desc/modified_asc- By modification dateissued_desc/issued_asc- By publication datetitle_asc/title_desc- Alphabetical
Combine sorting with filters:
# Most relevant high-quality health data
search_datasets(
query="Krankenhaus Statistik",
themes=["HEAL"],
boost_quality=True,
sort_by="relevance",
limit=10
)
# Recently published climate datasets
search_datasets(
query="Klimadaten",
min_date="2024-01-01",
sort_by="relevance",
limit=10
)
# Historical data from Statistik Austria
search_datasets(
query="",
publishers=["Statistik Austria"],
sort_by="issued_asc",
limit=10
)
# Recently updated across all categories
search_datasets(
query="",
sort_by="modified_desc",
limit=50
)Pagination
Work with large result sets:
# Get first page (10 results)
page1 = search_datasets(
query="Bevölkerung",
limit=10,
page=0
)
# Get second page (next 10 results)
page2 = search_datasets(
query="Bevölkerung",
limit=10,
page=1
)
# Get third page
page3 = search_datasets(
query="Bevölkerung",
limit=10,
page=2
)Result metadata:
count- Total matching datasetsresults- Datasets for current page
Calculate total pages: total_pages = ceil(count / limit)
Iterate through all results:
# Paginate through all results
all_results = []
limit = 20
page = 0
while True:
response = search_datasets(
query="Umwelt",
themes=["ENVI"],
limit=limit,
page=page
)
all_results.extend(response['results'])
# Check if more pages exist
if (page + 1) * limit >= response['count']:
break
page += 1
print(f"Retrieved {len(all_results)} total datasets")
# Pagination with progress tracking
def fetch_all_datasets(query, **filters):
results = []
limit = 50
page = 0
total = None
while True:
response = search_datasets(
query=query,
limit=limit,
page=page,
**filters
)
if total is None:
total = response['count']
results.extend(response['results'])
progress = len(results) / total * 100
print(f"Progress: {progress:.1f}% ({len(results)}/{total})")
if (page + 1) * limit >= response['count']:
break
page += 1
return resultsQuality-aware search
Boost high-quality datasets in results:
# Prioritize complete, well-documented datasets
search_datasets(
query="Gesundheit",
boost_quality=True,
limit=10
)
# Compare with and without quality boost
without_boost = search_datasets(query="Gesundheit", limit=10)
with_boost = search_datasets(query="Gesundheit", boost_quality=True, limit=10)Quality boost considers:
- Metadata completeness (8 components)
- Title and description presence
- Keywords and categories
- Publisher information
- License clarity
- Contact information
- Temporal and spatial coverage
- Documentation quality
Score range: 0-100
Build workflows around quality metrics:
# Find high-quality recent datasets
search_datasets(
query="Krankenhaus",
themes=["HEAL"],
min_date="2024-01-01",
boost_quality=True,
sort_by="modified_desc",
limit=20
)
# Production-ready datasets
search_datasets(
query="Arbeitsmarkt",
formats=["CSV", "JSON"],
boost_quality=True,
sort_by="relevance",
limit=10
)
# Well-documented environmental data
search_datasets(
query="Luftqualität Emissionen",
themes=["ENVI", "ENER"],
formats=["CSV"],
min_date="2023-01-01",
boost_quality=True,
sort_by="relevance",
limit=20
)Quality boost strategy:
- Use for production workflows
- Combine with format filters (CSV, JSON) for API-ready data
- Sort by relevance to get best quality AND most relevant
- Filter by recent modification for maintained datasets
Advanced search patterns
Multi-theme discovery
Find datasets spanning multiple domains:
# Climate and energy data
search_datasets(
query="Erneuerbare Energie",
themes=["ENVI", "ENER"],
limit=10
)
# Health and social data
search_datasets(
query="Demografie Gesundheit",
themes=["HEAL", "SOCI"],
limit=10
)Cross-domain discovery:
# Climate, energy, and transport
search_datasets(
query="Nachhaltige Mobilität",
themes=["ENVI", "ENER", "TRAN"],
boost_quality=True,
sort_by="relevance",
limit=20
)
# Government, justice, and society
search_datasets(
query="Öffentliche Verwaltung",
themes=["GOVE", "JUST", "SOCI"],
formats=["CSV"],
min_date="2023-01-01",
limit=20
)
# Economy, education, and technology
search_datasets(
query="Innovation Forschung",
themes=["ECON", "EDUC", "TECH"],
boost_quality=True,
limit=20
)Format comparison
Compare availability across formats:
# Check CSV vs JSON availability
csv_results = search_datasets(query="Bevölkerung", formats=["CSV"])
json_results = search_datasets(query="Bevölkerung", formats=["JSON"])
print(f"CSV: {csv_results['count']} datasets")
print(f"JSON: {json_results['count']} datasets")Analyze format distribution:
# Compare format availability for health data
formats = ["CSV", "JSON", "XML", "XLSX", "PDF"]
format_counts = {}
for fmt in formats:
results = search_datasets(
query="Krankenhaus",
themes=["HEAL"],
formats=[fmt]
)
format_counts[fmt] = results['count']
# Display format distribution
for fmt, count in sorted(format_counts.items(), key=lambda x: x[1], reverse=True):
print(f"{fmt}: {count} datasets")
# Find datasets in multiple formats
all_results = search_datasets(query="Krankenhaus", themes=["HEAL"], limit=50)
multi_format_datasets = []
for dataset in all_results['results']:
# Get distributions to count formats
# (Use get_dataset_distributions in real workflow)
passPublisher exploration
Explore what a publisher offers:
# All datasets from Stadt Wien catalogue
search_datasets(
query="",
catalogues=["l9"], # Stadt Wien catalogue ID
limit=50
)
# Datasets from Vienna
search_datasets(
query="",
publishers=["Stadt Wien"],
sort_by="modified_desc",
limit=50
)Publisher analysis:
# Vienna's recent health datasets
search_datasets(
query="",
publishers=["Stadt Wien"],
themes=["HEAL"],
min_date="2023-01-01",
sort_by="modified_desc",
limit=50
)
# Publisher comparison
vienna_data = search_datasets(
query="Verkehr",
publishers=["Stadt Wien"],
formats=["CSV"]
)
federal_data = search_datasets(
query="Verkehr",
publishers=["Bundesministerium"],
formats=["CSV"]
)
print(f"Vienna: {vienna_data['count']} traffic datasets")
print(f"Federal: {federal_data['count']} traffic datasets")
# Category distribution for a publisher
all_vienna = search_datasets(
query="",
publishers=["Stadt Wien"],
limit=100
)
# Analyze themes
theme_counts = {}
for dataset in all_vienna['results']:
for category in dataset.get('categories', []):
theme_id = category.get('id')
theme_counts[theme_id] = theme_counts.get(theme_id, 0) + 1
print("\nVienna theme distribution:")
for theme, count in sorted(theme_counts.items(), key=lambda x: x[1], reverse=True):
print(f" {theme}: {count} datasets")Next steps
- Preview Examples - Data inspection
- Workflows - Complete scenarios
- API Reference - Full parameter documentation
How is this guide?
Last updated on