Configuration System¶
OntoCast configuration is powered by Pydantic BaseSettings and is loaded from environment variables (typically via .env).
Overview¶
- Typed config sections with defaults
- Environment variable parsing (including lists and booleans)
- Validation for provider/model compatibility
- Unified
Configobject shared across tools and server
Configuration Shape¶
Config
├── tool_config: ToolConfig
│ ├── llm_config: LLMConfig
│ ├── chunk_config: ChunkConfig
│ ├── path_config: PathConfig
│ ├── neo4j: Neo4jConfig
│ ├── fuseki: FusekiConfig
│ ├── domain: DomainConfig
│ ├── web_search: WebSearchConfig
│ └── aggregation: AggregationConfig
└── server: ServerConfig
Environment Variables¶
LLM¶
LLM_PROVIDER=openai # openai | ollama
LLM_MODEL_NAME=gpt-4o-mini
LLM_TEMPERATURE=0.0
LLM_API_KEY=your_openai_api_key_here # required for openai provider
LLM_BASE_URL=http://localhost:11434 # optional (mainly for ollama)
Server¶
PORT=8999
BASE_RECURSION_LIMIT=1000
ESTIMATED_CHUNKS=30
MAX_VISITS=3 # alias for max_visits_per_node
RENDER_MODE=ontology_and_facts # ontology | facts | ontology_and_facts
ONTOLOGY_MAX_TRIPLES=50000 # empty/unset for unlimited
PARALLEL_WORKERS=4
PARALLEL_FACTS_RETRIES=3
PARALLEL_ONTOLOGY_RETRIES=3
ENABLE_ONTOLOGY_CONSOLIDATION=false
Chunking¶
CHUNK_BREAKPOINT_THRESHOLD_TYPE=percentile # percentile | standard_deviation | interquartile | gradient
CHUNK_BREAKPOINT_THRESHOLD_AMOUNT=95.0
CHUNK_MIN_SIZE=3000
CHUNK_MAX_SIZE=12000
Triple Stores¶
# Fuseki
FUSEKI_URI=http://localhost:3030/test
FUSEKI_AUTH=admin/admin
FUSEKI_DATASET=dataset_name
FUSEKI_ONTOLOGIES_DATASET=ontologies
# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_AUTH=neo4j/test
NEO4J_PORT=7476
NEO4J_BOLT_PORT=7689
Paths and Domain¶
CURRENT_DOMAIN=https://example.com
ONTOCAST_WORKING_DIRECTORY=/path/to/working/directory
ONTOCAST_ONTOLOGY_DIRECTORY=/path/to/ontology/files
ONTOCAST_CACHE_DIR=/path/to/cache/directory
Aggregation¶
Web Search¶
WEB_SEARCH_ENABLED=false
WEB_SEARCH_PROVIDER=duckduckgo
WEB_SEARCH_TOP_K=3
WEB_SEARCH_TIMEOUT_SECONDS=8.0
WEB_SEARCH_MAX_SNIPPET_CHARS=400
WEB_SEARCH_MAX_TOTAL_CHARS=1800
WEB_SEARCH_ONTOLOGY_RENDER_ENABLED=true
WEB_SEARCH_ONTOLOGY_CRITIC_ENABLED=true
WEB_SEARCH_FACTS_RENDER_ENABLED=false
WEB_SEARCH_FACTS_CRITIC_ENABLED=false
WEB_SEARCH_PLANNER_ENABLED=true
WEB_SEARCH_PLANNER_MAX_QUERIES=3
WEB_SEARCH_PLANNER_MIN_QUERY_CHARS=12
WEB_SEARCH_PLANNER_MIN_CONFIDENCE=0.35
WEB_SEARCH_REUSE_EVIDENCE_ACROSS_ATTEMPT=true
WEB_SEARCH_MIN_SNIPPET_CHARS=40
WEB_SEARCH_ALLOWED_DOMAINS= # comma-separated
WEB_SEARCH_BLOCKED_DOMAINS= # comma-separated
WEB_SEARCH_REGION=wt-wt
WEB_SEARCH_SAFESEARCH=moderate
Search is "search-later": nodes run without search first, and only request external evidence when needed.
Usage¶
from ontocast.config import Config
config = Config()
tool_config = config.get_tool_config()
print(config.server.port)
print(config.server.max_visits_per_node)
print(tool_config.llm_config.provider)
print(tool_config.path_config.cache_dir)
Validation Notes¶
LLM_PROVIDER=openairequiresLLM_API_KEY.LLM_MODEL_NAMEmust match the selected provider family.MAX_VISITSis supported as an alias formax_visits_per_node.WEB_SEARCH_ALLOWED_DOMAINSandWEB_SEARCH_BLOCKED_DOMAINSaccept comma-separated values.
Recommended Workflow¶
- Copy
.env.exampleto.env. - Fill in LLM credentials and backend settings.
- Start with defaults for chunking/search/aggregation.
- Tune only after inspecting extraction quality and runtime.