Skip to content

OntoCast Agentic Ontology Triplecast logo

Agentic ontology-assisted framework for semantic triple extraction

Python PyPI version PyPI Downloads License pre-commit


Overview

OntoCast is a framework for extracting semantic triples (creating a Knowledge Graph) from documents using an agentic, ontology-driven approach. It combines ontology management, natural language processing, and knowledge graph serialization to turn unstructured text into structured, queryable data.


Key Features

  • Ontology-Guided Extraction: Ensures semantic consistency and co-evolves ontologies
  • Entity Disambiguation: Resolves references across document chunks
  • Multi-Format Support: Handles text, JSON, PDF, and Markdown
  • Semantic Chunking: Splits text based on semantic similarity
  • MCP Compatibility: Implements Model Control Protocol endpoints
  • RDF Output: Produces standardized RDF/Turtle
  • Triple Store Integration: Supports Neo4j (n10s) and Apache Fuseki

Applications

OntoCast can be used for:

  • Knowledge Graph Construction: Build domain-specific or general-purpose knowledge graphs from documents
  • Semantic Search: Power search and retrieval with structured triples
  • GraphRAG: Enable retrieval-augmented generation over knowledge graphs (e.g., with LLMs)
  • Ontology Management: Automate ontology creation, validation, and refinement
  • Data Integration: Unify data from diverse sources into a semantic graph

Installation

uv add ontocast 
# or
pip install ontocast

Configuration

Environment Variables

Copy the example file and edit as needed:

cp env.example .env
# Edit with your values

Main options:

# LLM Configuration
# common
LLM_PROVIDER=openai # or ollama
LLM_MODEL_NAME=gpt-4o-mini # ollama model
LLM_TEMPERATURE=0.0

# openai
OPENAI_API_KEY=your_openai_api_key_here

# ollama
LLM_BASE_URL=

# Server
PORT=8999
RECURSION_LIMIT=1000
ESTIMATED_CHUNKS=30

# Optional: Triple Store Configuration (Fuseki preferred over Neo4j)
FUSEKI_URI=http://localhost:3032/test
FUSEKI_AUTH=admin/abc123-qwe

NEO4J_URI=bolt://localhost:7689
NEO4J_AUTH=neo4j/test!passfortesting


Triple Store Setup

OntoCast supports multiple triple store backends. When both Fuseki and Neo4j are configured, Fuseki is preferred.

  • See Triple Store Setup for detailed Docker Compose instructions and sample .env.example files.
  • Quick summary: copy and edit the provided .env.example in docker/fuseki or docker/neo4j, then run docker compose --env-file .env <service> up -d in the respective directory.

Running OntoCast Server

uv run serve \
    --ontology-directory ONTOLOGY_DIR \
    --working-directory WORKING_DIR \
    --clean
  • --clean (optional): If set, the triple store (Neo4j or Fuseki) will be initialized as clean (all data deleted on startup). Warning: Use with caution in production!

API Usage

  • POST /process: Accepts application/json or file uploads (multipart/form-data).
  • Returns: JSON with extracted facts (Turtle), ontology (Turtle), and processing metadata. Triples are also serialized to the configured triple store.

Example:

curl -X POST http://localhost:8999/process \
    -H "Content-Type: application/json" \
    -d '{"text": "Your document text here"}'

# Process a PDF file
curl -X POST http://url:port/process -F "file=@data/pdf/sample.pdf"

# Process a json file
curl -X POST http://url:port/process -F "file=@test2/sample.json"


MCP Endpoints

  • GET /health: Health check
  • GET /info: Service info
  • POST /process: Document processing

Filesystem Mode

If no triple store is configured, OntoCast stores ontologies and facts as Turtle files in the working directory.


Notes

  • JSON documents must contain a text field, e.g.:
    { "text": "abc" }
    
  • recursion_limit is calculated as max_visits * estimated_chunks (default 30, or set via .env)
  • Default port: 8999

Docker

To build the OntoCast Docker image:

docker buildx build -t growgraph/ontocast:0.1.4 . 2>&1 | tee build.log


Project Structure

ontocast/
├── agent/           # Agent workflow and orchestration
├── cli/             # CLI utilities and server
├── prompt/          # LLM prompt templates
├── stategraph/      # State graph logic
├── tool/            # Triple store, chunking, and ontology tools
├── toolbox.py       # Toolbox for agent tools
├── onto.py          # Ontology and RDF graph handling
├── util.py          # Utilities
Other directories: - docker/ – Docker Compose and .env.example files for triple stores - data/ – Example data, ontologies, and test files - docs/ – Documentation and user guides - test/ – Test suite


Workflow

The extraction follows a multi-stage workflow:

Workflow diagram

  1. Document Preparation
    • [Optional] Convert to Markdown
    • Text chunking
  2. Ontology Processing
    • Ontology selection
    • Text to ontology triples
    • Ontology critique
  3. Fact Extraction
    • Text to facts
    • Facts critique
    • Ontology sublimation
  4. Chunk Normalization
    • Chunk KG aggregation
    • Entity/Property Disambiguation
  5. Storage
    • Triple/KG serialization

Roadmap

  • Add Jena Fuseki triple store for triple serialization
  • Add Neo4j n10s for triple serialization
  • Replace triple prompting with a tool for local graph retrieval

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Acknowledgments

  • Uses RDFlib for semantic triple management
  • Uses docling for pdf/pptx conversion
  • Uses OpenAI language models / open models served via Ollama for fact extraction
  • Uses langchain/langgraph