Concepts¶
Here we introduce the main concepts of GraphCast, a framework for transforming data into property graphs.
System Overview¶
GraphCast transforms data sources into property graphs through a pipeline of components:
- Data Sources → Resources → Actors → Vertices/Edges → Graph Database
Each component plays a specific role in this transformation process.
Core Components¶
Schema¶
The Schema
is the central configuration that defines how data sources are transformed into a property graph. It encapsulates:
- Vertex and edge definitions
- Resource mappings
- Data transformations
- Index configurations
Vertex¶
A Vertex
describes vertices and their database indexes. It supports:
- Single or compound indexes (e.g., ["first_name", "last_name"]
instead of "full_name"
)
- Property definitions
- Filtering conditions
- Optional blank vertex configuration
Edge¶
An Edge
describes edges and their database indexes. It allows:
- Definition at any level of a hierarchical document
- Reliance on vertex principal index
- Weight configuration using source_fields
, target_fields
, and direct
parameters
- Uniqueness constraints with respect to source
, target
, and weight
fields
Resource¶
A Resource
is a set of mappings and transformations of a data source to vertices and edges, defined as a hierarchical structure of Actors
. It supports:
- Table-like data (CSV, SQL)
- Tree-like data (JSON, XML)
- Complex nested structures
Actor¶
An Actor
describes how the current level of the document should be mapped/transformed to the property graph vertices and edges. There are four types that act on the provided document in this order:
- DescendActor
: Navigates to the next level in the hierarchy
- TransformActor
: Applies data transformations
- VertexActor
: Creates vertices from the current level
- EdgeActor
: Creates edges between vertices
Transform¶
A Transform
defines data transforms, from renaming and type-casting to arbitrary transforms defined as Python functions. Transforms can be:
- Provided in the transforms
section of Schema
- Referenced by their name
- Applied to both vertices and edges
Key Features¶
Schema Features¶
- Flexible Indexing: Support for compound indexes on vertices and edges
- Hierarchical Edge Definition: Define edges at any level of nested documents
- Weighted Edges: Configure edge weights from document fields or vertex properties
- Blank Vertices: Create intermediate vertices for complex relationships
- Actor Pipeline: Process documents through a sequence of specialized actors
- Smart Navigation: Automatic handling of both single documents and lists
- Edge Constraints: Ensure edge uniqueness based on source, target, and weight
- Reusable Transforms: Define and reference transformations by name
- Vertex Filtering: Filter vertices based on custom conditions
Performance Optimization¶
- Batch Processing: Process large datasets in configurable batches (
batch_size
parameter ofCaster
) - Parallel Execution: Utilize multiple cores for faster processing (
n_cores
parameter ofCaster
) - Efficient Resource Handling: Optimized processing of both table and tree-like data
- Smart Caching: Minimize redundant operations
Best Practices¶
- Use compound indexes for frequently queried vertex properties
- Leverage blank vertices for complex relationship modeling
- Define transforms at the schema level for reusability
- Configure appropriate batch sizes based on your data volume
- Enable parallel processing for large datasets