GraFlo
¶
GraFlo is a Graph Schema Transformation Language (GSTL) for Labeled Property Graphs (LPG) - a domain-specific language (DSL) for defining graph structure and transformation logic in one manifest.
It combines a database-independent graph model, DB-specific details, and ingestion pipeline into a graph manifest and runs it across many systems. With declarative schemas and reusable Resource pipelines, GraFlo maps CSV/SQL, JSON/XML, RDF/SPARQL, REST APIs, and in-memory data into a single database-independent LPG model (GraphContainer), then projects it to supported graph databases: ArangoDB, Neo4j, TigerGraph, FalkorDB, Memgraph, and NebulaGraph. This keeps schema and transform logic portable across targets and helps teams avoid vendor lock-in.
Pipeline¶
Source Instance → Resource → Graph Schema → Covariant Graph Representation → Graph DB
| Stage | Role | Code |
|---|---|---|
| Source Instance | A concrete data artifact — a CSV file, a PostgreSQL table, a SPARQL endpoint, a .ttl file. |
AbstractDataSource subclasses with a DataSourceType (FILE, SQL, SPARQL, API, IN_MEMORY). |
| Resource | A reusable transformation pipeline — actor steps (descend, transform, vertex, edge, vertex_router, edge_router) that map raw records to graph elements. Data sources bind to Resources by name via the DataSourceRegistry. |
Resource (part of IngestionModel). |
| Graph Schema | Declarative logical vertex/edge definitions, identities, typed fields, and DB profile. | Schema, VertexConfig, EdgeConfig. |
| Covariant Graph Representation | A database-independent collection of vertices and edges. | GraphContainer. |
| DB-aware Projection | Resolves DB-specific naming/default/index behavior from logical schema + DatabaseProfile. |
Schema.resolve_db_aware(), VertexConfigDBAware, EdgeConfigDBAware. |
| Graph DB | The target LPG store — same API for all supported databases. | ConnectionManager, DBWriter, DB connectors. |
Core Concepts¶
Labeled Property Graphs¶
GraFlo targets the LPG model:
- Vertices — nodes with typed properties and unique identifiers.
- Edges — directed relationships between vertices, carrying their own properties and weights.
Schema¶
The Schema is the single source of truth for the graph structure:
- Vertex definitions — vertex types, fields (optionally typed:
INT,FLOAT,STRING,DATETIME,BOOL), and indexes. - Edge definitions — relationships between vertex types, with optional weight fields.
- Schema inference — generate schemas from PostgreSQL 3NF databases (PK/FK heuristics) or from OWL/RDFS ontologies.
Resources and transforms are part of IngestionModel, not Schema.
IngestionModel¶
IngestionModel defines how source records are transformed into graph entities:
- Resources — reusable actor pipelines that map raw records to vertices and edges.
- Transforms — reusable named transforms referenced by resource steps.
Resource¶
A Resource is the central abstraction that bridges data sources and the graph schema. Each Resource defines a reusable pipeline of actors (descend, transform, vertex, edge) that maps raw records to graph elements. Data sources bind to Resources by name via the DataSourceRegistry, so the same transformation logic applies regardless of whether data arrives from a file, an API, or a SPARQL endpoint.
DataSourceRegistry¶
The DataSourceRegistry manages AbstractDataSource adapters, each carrying a DataSourceType:
DataSourceType |
Adapter | Sources |
|---|---|---|
FILE |
FileDataSource |
CSV, JSON, JSONL, Parquet files |
SQL |
SQLDataSource |
PostgreSQL and other SQL databases via SQLAlchemy |
SPARQL |
RdfFileDataSource |
Turtle/RDF/N3/JSON-LD files via rdflib |
SPARQL |
SparqlEndpointDataSource |
Remote SPARQL endpoints (e.g. Apache Fuseki) via SPARQLWrapper |
API |
APIDataSource |
REST API endpoints with pagination and authentication |
IN_MEMORY |
InMemoryDataSource |
Python objects (lists, DataFrames) |
GraphEngine¶
GraphEngine orchestrates end-to-end operations: schema inference, schema definition in the target database, connector creation from data sources, and data ingestion.
Key Features¶
- Declarative LPG schema DSL — Define vertices, edges, indexes, weights, and transforms in YAML or Python. The
Schemais the single source of truth, independent of source or target. - Database abstraction — One logical schema and transformation DSL, one API. Target ArangoDB, Neo4j, TigerGraph, FalkorDB, Memgraph, or NebulaGraph without rewriting pipelines. DB idiosyncrasies are handled in DB-aware projection (
Schema.resolve_db_aware(...)) and connector/writer stages. - Resource abstraction — Each
Resourcedefines a reusable actor pipeline that maps raw records to graph elements. Actor types include descend, transform, vertex, edge, plus VertexRouter and EdgeRouter for dynamic type-based routing (see Concepts — Actor). Data sources bind to Resources by name via theDataSourceRegistry, decoupling transformation logic from data retrieval. - DataSourceRegistry — Register
FILE,SQL,API,IN_MEMORY, orSPARQLdata sources. EachDataSourceTypeplugs into the same Resource pipeline. - SPARQL & RDF support — Query SPARQL endpoints (e.g. Apache Fuseki), read
.ttl/.rdf/.n3files, and auto-infer schemas from OWL/RDFS ontologies (rdflibandSPARQLWrapperare included in the default install). - Schema inference — Generate graph schemas from PostgreSQL 3NF databases (PK/FK heuristics) or from OWL/RDFS ontologies. See Example 5.
- Schema migration planning/execution — Generate typed migration plans between schema versions, apply low-risk additive changes with risk gates, and track revision history via
migrate_schema. - Compare
fromandtoschemas before execution to preview structural deltas and blocked high-risk operations. - Typed fields — Vertex fields and edge weights carry types for validation and database-specific optimisation.
- Parallel batch processing — Configurable batch sizes and multi-core execution.
- Advanced filtering — Server-side filtering (e.g. TigerGraph REST++ API), client-side filter expressions, and SelectSpec for declarative SQL view/filter control before data reaches Resources.
- Blank vertices — Create intermediate nodes for complex relationship modelling.
Quick Links¶
- Installation
- Quick Start Guide
- Concepts (architecture diagrams)
- Concepts — Schema Migration
- Concepts — Comparing Two Schemas
- API Reference
- Examples
Note: Mermaid diagrams are kept in section pages (for example
concepts/) rather than on this landing page.
Use Cases¶
- Data Migration — Transform relational data into LPG structures. Infer schemas from PostgreSQL 3NF databases and migrate data directly.
- RDF-to-LPG — Read RDF triples from files or SPARQL endpoints, auto-infer schemas from OWL ontologies, and ingest into ArangoDB, Neo4j, etc.
- Knowledge Graphs — Build knowledge representations from heterogeneous sources (SQL, files, APIs, RDF/SPARQL).
- Data Integration — Combine multiple data sources into a unified labeled property graph.
- Graph Views — Create graph views of existing PostgreSQL databases without schema changes.
Requirements¶
- Python 3.11 or higher (3.11 and 3.12 officially supported)
- A graph database (ArangoDB, Neo4j, TigerGraph, FalkorDB, Memgraph, or NebulaGraph) as target
- Optional: PostgreSQL for SQL data sources and schema inference
- Optional extras (see Installation):
dev(tests and typing),docs(MkDocs),plot(plot_manifestviapygraphviz; system Graphviz required) - Full dependency list in
pyproject.toml
Contributing¶
We welcome contributions! Please check out our Contributing Guide for details on how to get started.