Skip to content

GraFlo graflo logo

graflo is a framework for transforming tabular data (CSV) and hierarchical data (JSON, XML) into property graphs and ingesting them into graph databases (ArangoDB, Neo4j, TigerGraph).

Python PyPI version PyPI Downloads License: BSL pre-commit DOI

Core Concepts

Property Graphs

graflo works with property graphs, which consist of:

  • Vertices: Nodes with properties and optional unique identifiers
  • Edges: Relationships between vertices with their own properties
  • Properties: Both vertices and edges may have properties

Schema

The Schema defines how your data should be transformed into a graph and contains:

  • Vertex Definitions: Specify vertex types, their properties, and unique identifiers
  • Fields can be specified as strings (backward compatible) or typed Field objects with types (INT, FLOAT, STRING, DATETIME, BOOL)
  • Type information enables better validation and database-specific optimizations
  • Edge Definitions: Define relationships between vertices and their properties
  • Weight fields support typed definitions for better type safety
  • Resource Mapping: describe how data sources map to vertices and edges
  • Transforms: Modify data during the casting process
  • Automatic Schema Inference: Generate schemas automatically from PostgreSQL 3NF databases

Data Sources

Data Sources define where data comes from:

  • File Sources: JSON, JSONL, CSV/TSV files
  • API Sources: REST API endpoints with pagination and authentication
  • SQL Sources: SQL databases via SQLAlchemy
  • In-Memory Sources: Python objects (lists, DataFrames)

Resources

Resources define how data is transformed into a graph (semantic mapping). They work with data from any DataSource type:

  • Table-like processing: CSV files, SQL tables, API responses
  • JSON-like processing: JSON files, nested data structures, hierarchical API responses

Key Features

  • Graph Transformation Meta-language: A powerful declarative language to describe how your data becomes a property graph:
    • Define vertex and edge structures with typed fields
    • Set compound indexes for vertices and edges
    • Use blank vertices for complex relationships
    • Specify edge constraints and properties with typed weight fields
    • Apply advanced filtering and transformations
  • Typed Schema Definitions: Enhanced type support throughout the schema system
    • Vertex fields support types (INT, FLOAT, STRING, DATETIME, BOOL) for better validation
    • Edge weight fields can specify types for improved type safety
    • Backward compatible: fields without types default to None (suitable for databases like ArangoDB)
  • PostgreSQL Schema Inference: Automatically generate schemas from PostgreSQL 3NF databases
    • Introspect PostgreSQL schemas to identify vertex-like and edge-like tables
    • Automatically map PostgreSQL data types to graflo Field types
    • Infer vertex configurations from table structures
    • Infer edge configurations from foreign key relationships
    • Create Resource mappings from PostgreSQL tables
  • Parallel Processing: Efficient processing with multi-threading
  • Database Integration: Seamless integration with Neo4j, ArangoDB, TigerGraph, and PostgreSQL (as source)
  • Advanced Filtering: Powerful filtering capabilities for data transformation with server-side filtering support
  • Blank Node Support: Create intermediate vertices for complex relationships

Use Cases

  • Data Migration: Transform relational data into graph structures
  • Knowledge Graphs: Build complex knowledge representations
  • Data Integration: Combine multiple data sources into a unified graph

Requirements

  • Python 3.10 or higher
  • Graph database (Neo4j, ArangoDB, or TigerGraph) for storage
  • Optional: PostgreSQL or other SQL databases for data sources
  • Dependencies as specified in pyproject.toml

Contributing

We welcome contributions! Please check out our Contributing Guide for details on how to get started.