GraFlo
¶
graflo is a framework for transforming tabular data (CSV, SQL) and hierarchical data (JSON, XML) into property graphs and ingesting them into graph databases (ArangoDB, Neo4j, TigerGraph, FalkorDB, Memgraph). Automatically infer schemas from normalized PostgreSQL databases (3NF) with proper primary keys (PK) and foreign keys (FK) - uses intelligent heuristics to detect vertices and edges!
Core Concepts¶
Property Graphs¶
graflo works with property graphs, which consist of:
- Vertices: Nodes with properties and optional unique identifiers
- Edges: Relationships between vertices with their own properties
- Properties: Both vertices and edges may have properties
Schema¶
The Schema defines how your data should be transformed into a graph and contains:
- Vertex Definitions: Specify vertex types, their properties, and unique identifiers
- Fields can be specified as strings (backward compatible) or typed
Fieldobjects with types (INT, FLOAT, STRING, DATETIME, BOOL) - Type information enables better validation and database-specific optimizations
- Edge Definitions: Define relationships between vertices and their properties
- Weight fields support typed definitions for better type safety
- Resource Mapping: describe how data sources map to vertices and edges
- Transforms: Modify data during the casting process
- Automatic Schema Inference: Generate schemas automatically from PostgreSQL 3NF databases
Data Sources¶
Data Sources define where data comes from:
- File Sources: JSON, JSONL, CSV/TSV files
- API Sources: REST API endpoints with pagination and authentication
- SQL Sources: SQL databases via SQLAlchemy
- In-Memory Sources: Python objects (lists, DataFrames)
Resources¶
Resources define how data is transformed into a graph (semantic mapping). They work with data from any DataSource type:
- Table-like processing: CSV files, SQL tables, API responses
- JSON-like processing: JSON files, nested data structures, hierarchical API responses
GraphEngine¶
The GraphEngine orchestrates graph database operations, providing a unified interface for:
- Schema inference from PostgreSQL databases
- Schema definition in target graph databases (moved from Caster)
- Pattern creation from data sources
- Data ingestion with async support
Key Features¶
- 🚀 PostgreSQL Schema Inference: Automatically generate schemas from normalized PostgreSQL databases (3NF) - No manual schema definition needed!
- Requirements: Works with normalized databases (3NF) that have proper primary keys (PK) and foreign keys (FK) decorated
- Uses intelligent heuristics to classify tables as vertices or edges based on structure
- Introspect PostgreSQL schemas to identify vertex-like and edge-like tables automatically
- Automatically map PostgreSQL data types to graflo Field types (INT, FLOAT, STRING, DATETIME, BOOL)
- Infer vertex configurations from table structures with proper indexes (primary keys become vertex indexes)
- Infer edge configurations from foreign key relationships (foreign keys become edge source/target mappings)
- Create Resource mappings from PostgreSQL tables automatically
- Direct database access - ingest data without exporting to files first
- See Example 5: PostgreSQL Schema Inference for a complete walkthrough
- Graph Transformation Meta-language: A powerful declarative language to describe how your data becomes a property graph:
- Define vertex and edge structures with typed fields
- Set compound indexes for vertices and edges
- Use blank vertices for complex relationships
- Specify edge constraints and properties with typed weight fields
- Apply advanced filtering and transformations
- Typed Schema Definitions: Enhanced type support throughout the schema system
- Vertex fields support types (INT, FLOAT, STRING, DATETIME, BOOL) for better validation
- Edge weight fields can specify types for improved type safety
- Backward compatible: fields without types default to None (suitable for databases like ArangoDB)
- Async Ingestion: Efficient async/await-based ingestion pipeline for better performance
- Parallel Processing: Efficient processing with multi-threading
- Database Integration: Seamless integration with Neo4j, ArangoDB, TigerGraph, FalkorDB, Memgraph, and PostgreSQL (as source)
- Advanced Filtering: Powerful filtering capabilities for data transformation with server-side filtering support
- Blank Node Support: Create intermediate vertices for complex relationships
Quick Links¶
Use Cases¶
- Data Migration: Transform relational data into graph structures
- PostgreSQL to Graph: Automatically infer schemas from normalized PostgreSQL databases (3NF) with proper PK/FK constraints and migrate data directly
- Uses intelligent heuristics to detect vertices and edges - no manual schema definition required
- Perfect for migrating existing relational databases that follow normalization best practices
- Knowledge Graphs: Build complex knowledge representations
- Data Integration: Combine multiple data sources into a unified graph
- Graph Views: Create graph views of existing PostgreSQL databases without schema changes
Requirements¶
- Python 3.11 or higher (Python 3.11 and 3.12 are officially supported)
- Graph database (Neo4j, ArangoDB, TigerGraph, FalkorDB, or Memgraph) for storage
- Optional: PostgreSQL or other SQL databases for data sources (with automatic schema inference support)
- Dependencies as specified in pyproject.toml
Contributing¶
We welcome contributions! Please check out our Contributing Guide for details on how to get started.