Creating a Manifest¶

This guide explains how to create a GraFlo GraphManifest, the canonical config artifact used for ingestion and orchestration.

A full manifest combines three concerns in one file:

schema: logical graph model (metadata, vertices, edges, DB profile)
ingestion_model: resources and transforms
bindings: mapping resources to physical data sources

GraphManifest also supports partial payloads (for example, schema-only or ingestion-only files). At least one block is required.

Why manifest-first¶

GraphManifest is the top-level contract passed through the runtime (GraphEngine, CLI ingest, plotting). Keeping all needed blocks in one document makes validation and execution deterministic.

Manifest structure¶

A typical manifest file is named manifest.yaml and has this shape:

schema:
  metadata:
    name: my_graph
    version: "1.0.0"
  graph:
    vertex_config:
      vertices:
        - name: person
          properties: [id, name, age]
          identity: [id]
        - name: department
          properties: [name]
          identity: [name]
    edge_config:
      edges:
        - source: person
          target: department
  db_profile: {}

ingestion_model:
  resources:
    - name: people
      apply:
        - vertex: person
    - name: departments
      apply:
        - vertex: person
          "from": {id: person_id, name: person}
        - vertex: department
          "from": {name: department}
  transforms: []

bindings: {}

Block-by-block reference¶

`schema`¶

Defines the graph contract.

metadata: human-facing identity (name, optional version)
graph.vertex_config: vertex types, properties, identity keys; optional blank: true for placeholder vertices (auto id identity)
graph.edge_config: source/target relationships, optional relation, optional directed (default true), edge properties, identities
db_profile: DB-specific physical behavior (indexes, naming, default_property_values for TigerGraph GSQL DEFAULT on vertex/edge attributes, backend details)

Use schema for what graph exists.

`ingestion_model`¶

Defines ingestion behavior.

resources: named pipelines (name) with ordered actor steps
transforms: reusable named transforms as a list (each entry must define name) and referenced from resources via transform.call.use
Optional per-resource flags include:
drop_trivial_input_fields (default false): when true, top-level keys whose value is null or "" are removed before the actor pipeline runs. Only the top-level dict is filtered (nested structures are not recursed); numeric zero and boolean false are kept. Useful for sparse wide tables (CSV/SQL) without custom transforms.
fail_fast (default false): when true, transform steps fail if required input keys are missing (rename: every source key must be present; call: every input key). When false, rename applies only to keys present in the row and functional transforms skip the step when inputs are missing.
tolerate_transform_errors (default true): when true, a failing transform nulls its declared outputs and the pipeline continues; when false, transform exceptions fail the document (subject to caster on_doc_error). See Document cast errors.

TigerGraph attribute defaults (schema / db_profile, not ingestion): under schema.db_profile, optional default_property_values declares GSQL DEFAULT literals per logical vertex property and per logical edge type, for example:

db_profile:
  db_flavor: tigergraph
  default_property_values:
    vertices:
      Sensor:
        reading: -1.0
    edges:
      - source: Person
        target: Company
        relation: works_at
        values:
          since_year: 0

This corresponds to overriding TigerGraph’s built-in defaults (e.g. reading FLOAT DEFAULT -1.0); see the TigerGraph “Defining a Graph Schema” documentation.

TigerGraph edge direction (schema / db_profile): logical directed: false on an edge becomes UNDIRECTED EDGE in GSQL. For bidirectional directed pairs on TigerGraph only, declare one forward logical edge and set reverse_edge on the matching edge_specs entry (GSQL WITH REVERSE_EDGE) instead of authoring a second logical edge — or use manifest evolution AddInverseEdgesOp for a portable second edge. See Core components — Edge.

Use ingestion_model for how source records become vertices/edges.

`bindings`¶

Defines source wiring (Bindings).

connectors: list of FileConnector, TableConnector, SparqlConnector, or APIConnector entries (paths, tables, RDF/SPARQL sources, or REST API paths). For TableConnector, optional filters push down SQL WHERE clauses using the same FilterExpression shorthand as vertex filters in the schema (AND, OR, NOT, IF_THEN as YAML keys). Optional nested time_filter (ColumnTimeFilter) restricts rows by a date/time column. APIConnector declares the endpoint path, HTTP method, static params, and optional pagination (offset, page, or cursor strategy — see API connector and pagination). Register base_url and credentials at runtime via connector_connection → conn_proxy (manually or with register_all_api_configs_from_env — see Example 14). See also Runtime connector updates and Table connector views.
resource_connector: list of {"resource": "<ingestion resource name>", "connector": "<connector name or hash>"} rows linking IngestionModel.resources[*].name to a connector. The same resource may appear on multiple rows with different connector values (several physical sources for one pipeline).
connector_connection (optional): list of {"connector": "<connector name or hash>", "conn_proxy": "<label>"} rows. This keeps manifests non-secret: only proxy names appear in YAML; runtime code registers each conn_proxy on a ConnectionProvider with the real GeneralizedConnConfig (PostgreSQL, SPARQL, REST API, etc.).

Connector references in resource_connector / connector_connection must match a connector’s declared name or canonical hash. Ingestion resource names are not connector references (they can map 1→n). Duplicate connector name values and conflicting conn_proxy mappings for the same connector hash are rejected at validation time.

The block can be left empty in-file (bindings: {}) and supplied at runtime for env-specific deployments.

Use bindings for where data comes from (and optionally which proxy label supplies runtime credentials for each SQL/SPARQL/API connector).

Runtime proxy wiring (example)¶

The manifest contains proxy labels only. At runtime you register the real connection config and bind manifest connectors to those proxy labels:

from graflo.hq.connection_provider import (
    InMemoryConnectionProvider,
    PostgresGeneralizedConnConfig,
)

provider = InMemoryConnectionProvider()
provider.bind_single_config_for_bindings(
    bindings=bindings,
    conn_proxy="postgres_source",
    config=PostgresGeneralizedConnConfig(config=postgres_conf),
)

engine.define_and_ingest(
    manifest=manifest,
    target_db_config=target_db_config,
    connection_provider=provider,
)

Authoring tips¶

Keep resource names unique across ingestion_model.resources.
Ensure every vertex/source/target referenced by resources exists in schema.core_schema.
Quote "from" in YAML because from is a reserved keyword.
Prefer explicit relation names for multi-edge models.
Keep ingestion_model.transforms ordered intentionally; transforms are applied in declaration/appearance order within pipelines.

Load and validate¶

from suthing import FileHandle
from graflo import GraphManifest

manifest = GraphManifest.from_config(FileHandle.load("manifest.yaml"))
manifest.finish_init()

schema = manifest.require_schema()
ingestion_model = manifest.require_ingestion_model()

finish_init() performs runtime wiring and consistency checks across schema and ingestion model.

Evolving a manifest¶

To apply structured changes to an existing manifest (remove vertex types, merge types into one name, and update resources and db_profile in sync), use graflo.architecture.evolution. That layer operates on the manifest contract only; it does not migrate data already stored in a graph database—plan to reingest after deploying the new manifest. See Manifest evolution for operations, manifest_hash, and examples.

Minimal run path¶

from graflo.hq import GraphEngine
from graflo.hq.caster import IngestionParams

engine = GraphEngine()
engine.define_and_ingest(
    manifest=manifest,
    target_db_config=conn_conf,
    ingestion_params=IngestionParams(clear_data=False),
    recreate_schema=False,
)