Skip to content

OntoCast Agentic Ontology Triplecast logo

Agentic ontology-assisted framework for semantic triple extraction

Python PyPI version PyPI Downloads License pre-commit DOI


Overview

OntoCast extracts semantic triples from documents using an agentic, ontology-driven pipeline. It co-evolves ontologies and facts graphs with parallel per-chunk processing, RDF 1.2 provenance, and optional vector-backed ontology retrieval.


Key Features

  • Parallel map/reduce pipeline — concurrent per-unit ontology and facts loops
  • Robust entity disambiguation — embedding + symbolic alignment across chunks
  • RDF 1.2 provenance — quoted triples, provenance artifacts, optional strip_provenance
  • GraphUpdate operations — token-efficient SPARQL insert/delete instead of full graph regeneration
  • JSON-LD wire format — optional LLM_GRAPH_FORMAT=jsonld for LLM payloads
  • Ontology context modes — catalog selection, vector retrieval, or fixed ontology
  • Triple store integration — Fuseki, Neo4j (n10s), or filesystem fallback
  • Tenancy — partition datasets/collections by tenant and project
  • REST API — document processing, ontology catalog management, graph matching
  • Automatic LLM caching — built-in response caching

Documentation


Installation

uv add ontocast
# or
pip install ontocast

Optional PDF/DOCX conversion: pip install "ontocast[doc-processing]"


Quick Start

cp .env.example .env
# Edit LLM_API_KEY and paths

ontocast --env-path .env

curl -X POST http://localhost:8999/process -F "file=@document.pdf"

See Quick Start Guide for full configuration.


REST API (Summary)

Method Path Purpose
GET /health Health check
GET /info Service metadata
POST /process Full document pipeline
POST /process_unit Single content unit
POST /flush Clear triple store data
POST /ontologies Upload catalog ontology
PUT/DELETE /ontologies/{iri} Replace or delete ontology
POST /match/entities Global entity alignment
POST /match/derive-matches Pairwise entity matching
POST /match/evaluate Triple/entity metrics

Details: API Endpoints.


Workflow

Document-level pipeline (regenerated via uv run plot-graph):

Workflow diagram

Landscape variant: graph.lr.png. Per-unit render/critic loops are documented in Workflow.

  1. Convert → chunk document
  2. Parallel ontology render per unit → normalize → optional consolidate → validate
  3. Parallel facts render per unit → merge with disambiguation
  4. Serialize to triple store; return Turtle in API response

Project Structure

ontocast/
├── agent/           # Render, critic, normalize, serialize agents
├── api/             # FastAPI routers (ontologies, schemas, tenancy)
├── cli/             # Server and utility CLIs
├── onto/            # Ontology, RDFGraph, state models
├── prompt/          # LLM prompt templates
├── stategraph/      # LangGraph workflow
├── tool/            # Triple stores, chunking, vector store, aggregation
├── config.py        # Pydantic settings
└── toolbox.py       # Tool dependency container

Contributing

See Contributing and CHANGELOG.