Graph export and migration¶

GraFlo 1.8.7 adds GraFloBackendConfig — a first-class source and target that persists graphs as a chunked on-disk directory. Use it for dry-runs without a live database, large-graph exports, and replay into any supported LPG or PostgreSQL target.

Overview¶

Direction	Entry point	Source backends	Target backends
Graph → file backend	`GraphEngine.migrate_graph()`	Neo4j, ArangoDB, file backend	`GraFloBackendConfig`
File backend → graph	`GraphEngine.migrate_graph()`	`GraFloBackendConfig`	Any supported LPG target
File backend → relational	`GraphEngine.migrate_graph()`	`GraFloBackendConfig`	PostgreSQL (vertex + junction edge tables)
Resources → file backend	`GraphEngine.ingest()` / `define_and_ingest()`	CSV / JSON / SQL / API manifest resources	`GraFloBackendConfig`
Graph schema only	`GraphEngine.infer_schema_from_graph()`	Neo4j, ArangoDB, file backend	— (returns `Schema`)
In-memory export	`GraphEngine.export_graph()`	Neo4j, ArangoDB	— (returns `GraFloOutput`)

PostgreSQL remains a source for 3NF schema inference and SQL ingestion (Example 5). As a target, PostgreSQL stores the logical graph as relational tables rather than native LPG structures.

File backend layout¶

A GraFlo file backend directory is self-describing:

artifacts/neo4j-backend/
├── INDEX.json              # manifest: version, schema hash, chunk inventory
├── schema.yaml             # Schema only (no data)
├── vertices/
│   ├── person.000.jsonl.gz
│   └── person.001.jsonl.gz
└── edges/
    └── person__knows__person.000.jsonl.gz

Chunks — gzip-compressed JSONL (one JSON document per line)
Edges — filenames use {source}__{relation}__{target} when names are safe; INDEX.json keys follow the same convention
Default chunk size — 50 000 records per file (configurable on GraFloBackendConfig)

Quick start¶

from pathlib import Path

from graflo import GraphEngine, DBType
from graflo.db import Neo4jConfig, ArangoConfig, PostgresConfig
from graflo.db.graflo_backend.config import GraFloBackendConfig

neo4j = Neo4jConfig.from_env()       # or Neo4jConfig.from_docker_env()
arango = ArangoConfig.from_env()
postgres = PostgresConfig.from_env()
backend = GraFloBackendConfig(output_dir=Path("artifacts/neo4j-backend"))

engine = GraphEngine(target_db_flavor=DBType.ARANGO)

# --- Neo4j → file backend (export to disk) ---
engine.migrate_graph(neo4j, backend, recreate_schema=True)

# --- File backend → Arango migration ---
engine.migrate_graph(backend, arango, recreate_schema=True)

# --- File backend → Postgres (relational vertex + junction edge tables) ---
pg_engine = GraphEngine(target_db_flavor=DBType.POSTGRES)
pg_engine.migrate_graph(backend, postgres, recreate_schema=True)

Config loading. All *Config classes support from_env(), from_docker_env() (reads docker/<backend>/.env), or direct constructor arguments.

Pre-sanitize for a future target. Set target_flavor_hint on the file-backend config so schema.yaml is sanitized before it is written:

backend = GraFloBackendConfig(
    output_dir=Path("artifacts/for-arango"),
    target_flavor_hint=DBType.ARANGO,
)
engine.migrate_graph(neo4j, backend, recreate_schema=True)

See Example 13: GraFlo file backend for a runnable script including ingest() to disk.

Ingest resources into a file backend¶

GraFloBackendConfig works as an ingest() target the same way as ArangoDB or Neo4j — only the config changes:

from pathlib import Path

from suthing import FileHandle

from graflo import GraphEngine, GraphManifest, DBType
from graflo.db.graflo_backend.config import GraFloBackendConfig
from graflo.hq.caster import IngestionParams

manifest = GraphManifest.from_config(FileHandle.load("manifest.yaml"))
manifest.finish_init()

backend = GraFloBackendConfig(output_dir=Path("artifacts/csv-backend"))
engine = GraphEngine(target_db_flavor=DBType.GRAFLO_BACKEND)

engine.define_and_ingest(
    manifest=manifest,
    target_db_config=backend,
    ingestion_params=IngestionParams(clear_data=True),
    recreate_schema=True,
)

Inspect the result without loading everything into memory:

from graflo.architecture.backend import GraFloBackendReader

reader = GraFloBackendReader(Path("artifacts/csv-backend"))
index = reader.read_index()
print(index.vertices)  # record counts and chunk paths per vertex type

GraFloBackendConfig and Connection API¶

Type	Role
`GraFloBackendConfig`	`DBConfig` subclass: `output_dir`, `chunk_size`, optional `target_flavor_hint`
`GraFloBackendConnection`	`Connection` implementation registered in `ConnectionManager`
`GraFloBackendWriter` / `GraFloBackendReader`	Low-level I/O primitives in `graflo.architecture.backend`
`GraFloIndex`	Pydantic model for `INDEX.json`

Write path (target): init_db → upsert_docs_batch / insert_edges_batch → close() flushes INDEX.json.

Read path (source): introspect_graph_schema() reads schema.yaml; fetch_all_docs / fetch_all_edges stream from gzip JSONL chunks.

Use ConnectionManager.graph_export_flavors() to list backends with graph export support — includes DBType.GRAFLO_BACKEND alongside Neo4j and ArangoDB.

GraFloOutput (in-memory)¶

GraFloOutput pairs a full Schema with a GraphContainer. GraphEngine.export_graph() still returns it for small graphs or programmatic use:

from graflo.hq import GraphEngine
from graflo.db.connection import Neo4jConfig

output = engine.export_graph(Neo4jConfig(...))
assert output.core_schema is output.graph_schema.core_schema
assert output.data.vertices

For durable, large exports prefer migrate_graph(source, GraFloBackendConfig(...)) instead of holding the full graph in memory or a single YAML file.

GraphContainer edge keys in JSON¶

In Python, edge keys are tuples (source, target, relation). When serialized to JSON, each key becomes a compact JSON array string, for example ["person","department","works_in"].

Migrate graph → graph¶

GraphEngine.migrate_graph() exports from the source in one connection pass, sanitizes the schema for the target flavor (unless the target is a file backend without target_flavor_hint), defines DDL, and loads data:

engine.migrate_graph(
    Neo4jConfig(...),          # source
    ArangoConfig(...),         # target
    recreate_schema=True,
    clear_data=False,
    sample_limit=100,
)

The engine reuses a single source connection for introspection and export, and applies target Sanitizer rules once before writing.

Migrate graph → PostgreSQL¶

PostgreSQL targets map each vertex type to a table and each edge type to a junction table named {source}_{target}_{relation}_edges with source_id, target_id, optional weight columns, and a surrogate primary key for parallel edges.

Capability guard¶

ConnectionManager.open_graph_connection() rejects backends without graph export support:

from graflo.db.manager import ConnectionManager

ConnectionManager.graph_export_flavors()
# e.g. [DBType.NEO4J, DBType.ARANGO, DBType.GRAFLO_BACKEND]

TigerGraph, FalkorDB, Memgraph, and NebulaGraph remain supported targets for manifest-driven ingestion; native graph-source introspection/export is not yet implemented for those live databases (use a file backend as an intermediate store).

Core components — Schema, GraphContainer, GraphEngine
Features and practices — schema inference from PostgreSQL and RDF
Example 5 — PostgreSQL as a source (3NF inference)
Example 13 — file backend export, migration, and ingest walkthrough
PostgreSQL reference — connection and SQL-side introspection