Skip to content

User Instructions

User instructions allow you to provide specific guidance to OntoCast about what to focus on during ontology and facts extraction. This feature is particularly useful when you want to direct the AI's attention to specific types of entities, relationships, or concepts in your documents.


Overview

User instructions work by injecting custom instructions into the AI prompts used during:

  • Ontology Selection: When the system chooses the best catalog ontology for a text unit
  • Ontology Extraction: When the system extracts domain concepts and relationships
  • Facts Extraction: When the system extracts specific facts from your documents

This allows you to customize the extraction process based on your specific needs and domain requirements.


How User Instructions Work

1. Ontology User Instructions

Ontology user instructions guide the AI when extracting domain concepts and relationships from your documents. These instructions help focus on specific types of entities or relationships.

Example:

Focus on extracting geographical locations, organizations, and their relationships. Pay special attention to company mergers, acquisitions, and partnerships.

2. Ontology Selection User Instructions

Ontology selection user instructions guide the AI when choosing a catalog ontology for each text segment. These instructions help prioritize one ontology style or domain when several ontologies could match.

Example:

Prefer ontologies focused on legal and compliance terminology. Avoid generic business ontologies unless no legal ontology applies.

3. Facts User Instructions

Facts user instructions guide the AI when extracting specific facts and instances from your documents. These instructions help focus on particular types of facts or data points.

They supplement the built-in operational guidelines (two-namespace contract, entity matching, class vs instance rules). They do not replace them.

Example:

Extract financial data, dates, and numerical values. Focus on revenue, profit, and growth metrics. Include all monetary amounts with proper currency information.

Facts extraction guidelines

OntoCast always applies the following rules during facts rendering (see also Facts extraction model):

  1. cd: for new data — Instances found in the source text use the configured facts namespace (cd: by default), with lowercase_snake_case local names and rdfs:label.
  2. Domain ontology is read-only — Do not mint new IRIs under the catalog ontology prefix. Reuse an ontology IRI only when that exact reference individual already exists in the provided ontology context.
  3. Class ≠ instance — Ontology classes type cd: instances; class IRIs must not appear as factual subjects/objects, and text mentions do not become ontology-prefixed individuals just because a matching class exists.
  4. Typing — Every cd: entity gets rdf:type to a domain or core class (e.g. schema:Person); never type facts as rdfs:Class or rdf:Property.
  5. Opaque IDs (Wikidata-style Q/P codes) — Map mentions via labels and the term index; never guess or construct IRIs from label strings.

Use facts_user_instruction to steer what to extract (domains, metrics, relationships), not to override namespace or IRI policy.


Usage Methods

1. JSON API Requests

When sending JSON requests to the API, include user instructions in your payload:

{
  "text": "Your document text here...",
  "ontology_selection_user_instruction": "Prefer legal/compliance ontologies when multiple options are relevant",
  "ontology_user_instruction": "Focus on extracting geographical locations and organizations",
  "facts_user_instruction": "Extract financial data and numerical values with proper currency information"
}

2. Form Data (Multipart)

When using multipart form data, include user instructions as form fields:

curl -X POST http://localhost:8999/process \
  -F "file=@document.pdf" \
  -F "ontology_selection_user_instruction=Prefer legal/compliance ontologies when multiple options are relevant" \
  -F "ontology_user_instruction=Focus on extracting geographical locations and organizations" \
  -F "facts_user_instruction=Extract financial data and numerical values"

3. Programmatic Usage

When using OntoCast programmatically, set user instructions in the AgentState:

from ontocast.onto.state import AgentState

# Create state with user instructions
state = AgentState(
    input_text="Your document text...",
    ontology_selection_user_instruction="Prefer legal/compliance ontologies when multiple options are relevant",
    ontology_user_instruction="Focus on extracting geographical locations and organizations",
    facts_user_instruction="Extract financial data and numerical values"
)

Best Practices

1. Be Specific and Clear

Good:

Focus on extracting company names, financial metrics, and business relationships. Pay special attention to revenue, profit, and growth data.

Avoid:

Extract everything important.

2. Use Domain-Specific Language

Good:

Extract medical diagnoses, symptoms, treatments, and patient information. Focus on ICD-10 codes and medical terminology.

Avoid:

Extract medical stuff.

3. Provide Context

Good:

Extract legal entities, court cases, and legal relationships. Focus on case numbers, dates, and legal precedents mentioned in the document.

Avoid:

Extract legal information.

4. Specify Data Types

Good:

Extract numerical data with proper units (currency, percentages, measurements). Include dates in ISO format and geographical coordinates.

Avoid:

Extract numbers and dates.


Common Use Cases

1. Financial Documents

Ontology Instruction:

Focus on extracting financial concepts, business entities, and economic relationships. Pay attention to revenue streams, cost structures, and financial metrics.

Facts Instruction:

Extract all monetary amounts with currency codes, percentages, and financial ratios. Include dates for financial periods and growth rates.

2. Medical Documents

Ontology Instruction:

Focus on extracting medical conditions, treatments, symptoms, and healthcare relationships. Pay attention to medical terminology and clinical concepts.

Facts Instruction:

Extract patient information, medical codes (ICD-10, CPT), dosages, and treatment timelines. Include all medical measurements and lab values.

Ontology Instruction:

Focus on extracting legal entities, court cases, legal relationships, and regulatory frameworks. Pay attention to legal terminology and precedents.

Facts Instruction:

Extract case numbers, court dates, legal citations, and regulatory compliance information. Include all legal references and precedents.

4. Scientific Papers

Ontology Instruction:

Focus on extracting scientific concepts, methodologies, and research relationships. Pay attention to scientific terminology and theoretical frameworks.

Facts Instruction:

Extract experimental data, measurements, statistical results, and research findings. Include all numerical data with proper units and significance levels.


Advanced Examples

1. Multi-Domain Extraction

{
  "text": "Your document text...",
  "ontology_selection_user_instruction": "Prefer ontologies that model technology products and companies when there is overlap.",
  "ontology_user_instruction": "Extract both business and technical concepts. Focus on companies, products, technologies, and their relationships.",
  "facts_user_instruction": "Extract business metrics, technical specifications, and performance data. Include all numerical values with proper context."
}

2. Temporal Focus

{
  "text": "Your document text...",
  "ontology_selection_user_instruction": "Prefer ontologies with strong temporal/event modeling when available.",
  "ontology_user_instruction": "Focus on extracting entities and relationships that are time-sensitive or have temporal aspects.",
  "facts_user_instruction": "Extract all dates, time periods, and temporal relationships. Pay special attention to historical events and chronological data."
}

3. Geographic Focus

{
  "text": "Your document text...",
  "ontology_selection_user_instruction": "Prefer ontologies specialized in locations and geospatial entities.",
  "ontology_user_instruction": "Focus on extracting geographical entities, locations, and spatial relationships.",
  "facts_user_instruction": "Extract all geographical coordinates, addresses, and location-specific data. Include all spatial and geographical information."
}

Integration with Workflow

User instructions are integrated into the OntoCast workflow at specific points:

  1. Document Processing: Instructions are extracted from JSON input during document conversion
  2. Ontology Selection: Selection instructions guide catalog ontology choice per unit
  3. Ontology Extraction: Ontology instructions guide domain concept extraction
  4. Facts Extraction: Facts instructions guide specific fact extraction
  5. Critique Phase: Instructions are used during the critique and improvement phases

Troubleshooting

Common Issues

  1. Instructions Not Applied: Ensure instructions are properly formatted in your JSON payload
  2. Vague Results: Make instructions more specific and detailed
  3. Missing Data: Check if instructions are too restrictive or unclear

Debug Tips

  1. Check Logs: Look for debug messages about user instructions in the server logs
  2. Test with Simple Instructions: Start with basic instructions and refine
  3. Validate JSON: Ensure your JSON payload is properly formatted

Example Debug Output

DEBUG - Set ontology user instruction: Focus on extracting geographical locations and organizations
DEBUG - Set ontology selection user instruction: Prefer legal/compliance ontologies when multiple options are relevant
DEBUG - Set facts user instruction: Extract financial data and numerical values

API Reference

Request Format

{
  "text": "string",
  "ontology_selection_user_instruction": "string (optional)",
  "ontology_user_instruction": "string (optional)",
  "facts_user_instruction": "string (optional)"
}

Response Format

The response includes the extracted ontology and facts, with user instructions influencing the extraction process:

{
  "status": "success",
  "ontology": "...",
  "facts": "...",
  "metadata": {
    "ontology_selection_user_instruction": "Prefer legal/compliance ontologies when multiple options are relevant",
    "ontology_user_instruction": "Focus on extracting geographical locations and organizations",
    "facts_user_instruction": "Extract financial data and numerical values"
  }
}

Best Practices Summary

  1. Be Specific: Provide clear, detailed instructions
  2. Use Domain Language: Include relevant terminology
  3. Provide Context: Explain what you're looking for
  4. Test and Refine: Start simple and improve based on results
  5. Document Your Instructions: Keep track of what works best for your use case

Examples by Domain

Healthcare

  • Ontology: "Focus on medical conditions, treatments, and healthcare relationships"
  • Facts: "Extract patient data, medical codes, and clinical measurements"

Finance

  • Ontology: "Focus on financial entities, business relationships, and economic concepts"
  • Facts: "Extract monetary amounts, financial ratios, and economic indicators"
  • Ontology: "Focus on legal entities, court cases, and regulatory frameworks"
  • Facts: "Extract case numbers, legal citations, and compliance information"

Scientific

  • Ontology: "Focus on scientific concepts, methodologies, and research relationships"
  • Facts: "Extract experimental data, measurements, and research findings"

Technical

  • Ontology: "Focus on technical concepts, systems, and technological relationships"
  • Facts: "Extract technical specifications, performance metrics, and system data"