Creating a Manifest¶
This guide explains how to create a GraFlo GraphManifest, the canonical config artifact used for ingestion and orchestration.
A full manifest combines three concerns in one file:
schema: logical graph model (metadata, vertices, edges, DB profile)ingestion_model: resources and transformsbindings: mapping resources to physical data sources
GraphManifest also supports partial payloads (for example, schema-only or
ingestion-only files). At least one block is required.
Why manifest-first¶
GraphManifest is the top-level contract passed through the runtime (GraphEngine, CLI ingest, plotting). Keeping all needed blocks in one document makes validation and execution deterministic.
Manifest structure¶
A typical manifest file is named manifest.yaml and has this shape:
schema:
metadata:
name: my_graph
version: "1.0.0"
graph:
vertex_config:
vertices:
- name: person
properties: [id, name, age]
identity: [id]
- name: department
properties: [name]
identity: [name]
edge_config:
edges:
- source: person
target: department
db_profile: {}
ingestion_model:
resources:
- name: people
apply:
- vertex: person
- name: departments
apply:
- vertex: person
"from": {id: person_id, name: person}
- vertex: department
"from": {name: department}
transforms: []
bindings: {}
Block-by-block reference¶
schema¶
Defines the graph contract.
metadata: human-facing identity (name, optionalversion)graph.vertex_config: vertex types,properties, identity keysgraph.edge_config: source/target relationships, optionalrelation, edgeproperties,identitiesdb_profile: DB-specific physical behavior (indexes, naming,default_property_valuesfor TigerGraph GSQLDEFAULTon vertex/edge attributes, backend details)
Use schema for what graph exists.
ingestion_model¶
Defines ingestion behavior.
resources: named pipelines (name) with ordered actor stepstransforms: reusable named transforms as a list (each entry must definename) and referenced from resources viatransform.call.use- Optional per-resource flags include
drop_trivial_input_fields(defaultfalse): whentrue, top-level keys whose value isnullor""are removed before the actor pipeline runs. Only the top-level dict is filtered (nested structures are not recursed); numeric zero and boolean false are kept. Useful for sparse wide tables (CSV/SQL) without custom transforms.
TigerGraph attribute defaults (schema / db_profile, not ingestion): under schema.db_profile, optional default_property_values declares GSQL DEFAULT literals per logical vertex property and per logical edge type, for example:
db_profile:
db_flavor: tigergraph
default_property_values:
vertices:
Sensor:
reading: -1.0
edges:
- source: Person
target: Company
relation: works_at
values:
since_year: 0
This corresponds to overriding TigerGraph’s built-in defaults (e.g. reading FLOAT DEFAULT -1.0); see the TigerGraph “Defining a Graph Schema” documentation.
Use ingestion_model for how source records become vertices/edges.
bindings¶
Defines source wiring (Bindings).
connectors: list ofFileConnector,TableConnector, orSparqlConnectorentries (where each row points at paths, tables, or RDF/SPARQL sources).resource_connector: list of{"resource": "<ingestion resource name>", "connector": "<connector name or hash>"}rows linkingIngestionModel.resources[*].nameto a connector. The sameresourcemay appear on multiple rows with differentconnectorvalues (several physical sources for one pipeline).connector_connection(optional): list of{"connector": "<connector name or hash>", "conn_proxy": "<label>"}rows. This keeps manifests non-secret: only proxy names appear in YAML; runtime code registers eachconn_proxyon aConnectionProviderwith the realGeneralizedConnConfig(PostgreSQL, SPARQL, etc.).
Connector references in resource_connector / connector_connection must match a connector’s declared name or canonical hash. Ingestion resource names are not connector references (they can map 1→n). Duplicate connector name values and conflicting conn_proxy mappings for the same connector hash are rejected at validation time.
The block can be left empty in-file (bindings: {}) and supplied at runtime for env-specific deployments.
Use bindings for where data comes from (and optionally which proxy label supplies runtime credentials for each SQL/SPARQL connector).
Runtime proxy wiring (example)¶
The manifest contains proxy labels only. At runtime you register the real connection config and bind manifest connectors to those proxy labels:
from graflo.hq.connection_provider import (
InMemoryConnectionProvider,
PostgresGeneralizedConnConfig,
)
provider = InMemoryConnectionProvider()
provider.bind_single_config_for_bindings(
bindings=bindings,
conn_proxy="postgres_source",
config=PostgresGeneralizedConnConfig(config=postgres_conf),
)
engine.define_and_ingest(
manifest=manifest,
target_db_config=target_db_config,
connection_provider=provider,
)
Authoring tips¶
- Keep resource names unique across
ingestion_model.resources. - Ensure every
vertex/source/targetreferenced by resources exists inschema.core_schema. - Quote
"from"in YAML becausefromis a reserved keyword. - Prefer explicit
relationnames for multi-edge models. - Keep
ingestion_model.transformsordered intentionally; transforms are applied in declaration/appearance order within pipelines.
Load and validate¶
from suthing import FileHandle
from graflo import GraphManifest
manifest = GraphManifest.from_config(FileHandle.load("manifest.yaml"))
manifest.finish_init()
schema = manifest.require_schema()
ingestion_model = manifest.require_ingestion_model()
finish_init() performs runtime wiring and consistency checks across schema and ingestion model.
Evolving a manifest¶
To apply structured changes to an existing manifest (remove vertex types, merge types into one name, and update resources and db_profile in sync), use graflo.architecture.evolution. That layer operates on the manifest contract only; it does not migrate data already stored in a graph database—plan to reingest after deploying the new manifest. See Manifest evolution for operations, manifest_hash, and examples.
Minimal run path¶
from graflo.hq import GraphEngine
from graflo.hq.caster import IngestionParams
engine = GraphEngine()
engine.define_and_ingest(
manifest=manifest,
target_db_config=conn_conf,
ingestion_params=IngestionParams(clear_data=False),
recreate_schema=False,
)