Skip to content

graphcast.architecture.onto

Core ontology and data structures for graph database operations.

This module defines the fundamental data structures and types used throughout the graphcast package for working with graph databases. It provides:

  • Core data types for vertices and edges
  • Database index configurations
  • Graph container implementations
  • Edge mapping and casting utilities
  • Action context for graph transformations

The module is designed to be database-agnostic, supporting both ArangoDB and Neo4j through the DBFlavor enum. It provides a unified interface for working with graph data structures while allowing for database-specific optimizations and features.

Key Components
  • EdgeMapping: Defines how edges are mapped between vertices
  • IndexType: Supported database index types
  • EdgeType: Types of edge handling in the graph database
  • GraphContainer: Main container for graph data
  • ActionContext: Context for graph transformation operations
Example

container = GraphContainer(vertices={}, edges={}, linear=[]) index = Index(fields=["name", "age"], type=IndexType.PERSISTENT) context = ActionContext()

ABCFields dataclass

Bases: BaseDataclass

Abstract base class for entities that have fields.

Attributes:

Name Type Description
name Optional[str]

Optional name of the entity

fields list[str]

List of field names

Source code in graphcast/architecture/onto.py
@dataclasses.dataclass
class ABCFields(BaseDataclass, metaclass=ABCMeta):
    """Abstract base class for entities that have fields.

    Attributes:
        name: Optional name of the entity
        fields: List of field names
    """

    name: Optional[str] = None
    fields: list[str] = dataclasses.field(default_factory=list)

    def cfield(self, x: str) -> str:
        """Creates a composite field name by combining the entity name with a field name.

        Args:
            x: Field name to combine with entity name

        Returns:
            Composite field name in format "entity@field"
        """
        return f"{self.name}@{x}"

cfield(x)

Creates a composite field name by combining the entity name with a field name.

Parameters:

Name Type Description Default
x str

Field name to combine with entity name

required

Returns:

Type Description
str

Composite field name in format "entity@field"

Source code in graphcast/architecture/onto.py
def cfield(self, x: str) -> str:
    """Creates a composite field name by combining the entity name with a field name.

    Args:
        x: Field name to combine with entity name

    Returns:
        Composite field name in format "entity@field"
    """
    return f"{self.name}@{x}"

ActionContext dataclass

Bases: BaseDataclass

Context for graph transformation actions.

Attributes:

Name Type Description
acc_v_local defaultdict[str, defaultdict[Optional[str], list]]

Local accumulation of vertices

acc_vertex defaultdict[str, defaultdict[Optional[str], list]]

Global accumulation of vertices

acc_global defaultdict[GraphEntity, list]

Global accumulation of graph entities

buffer_vertex defaultdict[GraphEntity, dict]

Buffer for vertex data

cdoc dict

Current document being processed

Source code in graphcast/architecture/onto.py
@dataclasses.dataclass(kw_only=True)
class ActionContext(BaseDataclass):
    """Context for graph transformation actions.

    Attributes:
        acc_v_local: Local accumulation of vertices
        acc_vertex: Global accumulation of vertices
        acc_global: Global accumulation of graph entities
        buffer_vertex: Buffer for vertex data
        cdoc: Current document being processed
    """

    acc_v_local: defaultdict[str, defaultdict[Optional[str], list]] = dataclasses.field(
        default_factory=outer_factory
    )
    acc_vertex: defaultdict[str, defaultdict[Optional[str], list]] = dataclasses.field(
        default_factory=outer_factory
    )
    acc_global: defaultdict[GraphEntity, list] = dataclasses.field(
        default_factory=dd_factory
    )
    buffer_vertex: defaultdict[GraphEntity, dict] = dataclasses.field(
        default_factory=lambda: defaultdict(dict)
    )
    cdoc: dict = dataclasses.field(default_factory=dict)

EdgeCastingType

Bases: BaseEnum

Types of edge casting supported.

PAIR_LIKE: Edges are cast as pairs of vertices PRODUCT_LIKE: Edges are cast as products of vertex sets

Source code in graphcast/architecture/onto.py
class EdgeCastingType(BaseEnum):
    """Types of edge casting supported.

    PAIR_LIKE: Edges are cast as pairs of vertices
    PRODUCT_LIKE: Edges are cast as products of vertex sets
    """

    PAIR_LIKE = "pair"
    PRODUCT_LIKE = "product"

EdgeMapping

Bases: BaseEnum

Defines how edges are mapped between vertices.

ALL: Maps all vertices to all vertices ONE_N: Maps one vertex to many vertices

Source code in graphcast/architecture/onto.py
class EdgeMapping(BaseEnum):
    """Defines how edges are mapped between vertices.

    ALL: Maps all vertices to all vertices
    ONE_N: Maps one vertex to many vertices
    """

    ALL = "all"
    ONE_N = "1-n"

EdgeType

Bases: BaseEnum

Defines how edges are handled in the graph database.

INDIRECT: Defined as a collection with indexes, may be used after data ingestion DIRECT: In addition to indexes, these edges are generated during ingestion

Source code in graphcast/architecture/onto.py
class EdgeType(BaseEnum):
    """Defines how edges are handled in the graph database.

    INDIRECT: Defined as a collection with indexes, may be used after data ingestion
    DIRECT: In addition to indexes, these edges are generated during ingestion
    """

    INDIRECT = "indirect"
    DIRECT = "direct"

EncodingType

Bases: BaseEnum

Supported character encodings for data input/output.

Source code in graphcast/architecture/onto.py
class EncodingType(BaseEnum):
    """Supported character encodings for data input/output."""

    ISO_8859 = "ISO-8859-1"
    UTF_8 = "utf-8"

GraphContainer dataclass

Bases: BaseDataclass

Container for graph data including vertices and edges.

Attributes:

Name Type Description
vertices dict[str, list]

Dictionary mapping vertex names to lists of vertex data

edges dict[tuple[str, str, str | None], list]

Dictionary mapping edge IDs to lists of edge data

linear list[defaultdict[str | tuple[str, str, str | None], list[Any]]]

List of default dictionaries containing linear data

Source code in graphcast/architecture/onto.py
@dataclasses.dataclass
class GraphContainer(BaseDataclass):
    """Container for graph data including vertices and edges.

    Attributes:
        vertices: Dictionary mapping vertex names to lists of vertex data
        edges: Dictionary mapping edge IDs to lists of edge data
        linear: List of default dictionaries containing linear data
    """

    vertices: dict[str, list]
    edges: dict[tuple[str, str, str | None], list]
    linear: list[defaultdict[str | tuple[str, str, str | None], list[Any]]]

    def __post_init__(self):
        pass

    def items(self):
        """Get an ItemsView of the container's contents."""
        return ItemsView(self)

    def pick_unique(self):
        """Remove duplicate entries from vertices and edges."""
        for k, v in self.vertices.items():
            self.vertices[k] = pick_unique_dict(v)
        for k, v in self.edges.items():
            self.edges[k] = pick_unique_dict(v)

    def loop_over_relations(self, edge_def: tuple[str, str, str | None]):
        """Iterate over edges matching the given edge definition.

        Args:
            edge_def: Tuple of (source, target, optional_purpose)

        Returns:
            Generator yielding matching edge IDs
        """
        source, target, _ = edge_def
        return (ed for ed in self.edges if source == ed[0] and target == ed[1])

    @classmethod
    def from_docs_list(
        cls, list_default_dicts: list[defaultdict[GraphEntity, list]]
    ) -> GraphContainer:
        """Create a GraphContainer from a list of default dictionaries.

        Args:
            list_default_dicts: List of default dictionaries containing vertex and edge data

        Returns:
            New GraphContainer instance

        Raises:
            AssertionError: If edge IDs are not properly formatted
        """
        vdict: defaultdict[str, list] = defaultdict(list)
        edict: defaultdict[tuple[str, str, str | None], list] = defaultdict(list)

        for d in list_default_dicts:
            for k, v in d.items():
                if isinstance(k, str):
                    vdict[k].extend(v)
                elif isinstance(k, tuple):
                    assert (
                        len(k) == 3
                        and all(isinstance(item, str) for item in k[:-1])
                        and isinstance(k[-1], (str, type(None)))
                    )
                    edict[k].extend(v)
        return GraphContainer(
            vertices=dict(vdict.items()),
            edges=dict(edict.items()),
            linear=list_default_dicts,
        )

from_docs_list(list_default_dicts) classmethod

Create a GraphContainer from a list of default dictionaries.

Parameters:

Name Type Description Default
list_default_dicts list[defaultdict[GraphEntity, list]]

List of default dictionaries containing vertex and edge data

required

Returns:

Type Description
GraphContainer

New GraphContainer instance

Raises:

Type Description
AssertionError

If edge IDs are not properly formatted

Source code in graphcast/architecture/onto.py
@classmethod
def from_docs_list(
    cls, list_default_dicts: list[defaultdict[GraphEntity, list]]
) -> GraphContainer:
    """Create a GraphContainer from a list of default dictionaries.

    Args:
        list_default_dicts: List of default dictionaries containing vertex and edge data

    Returns:
        New GraphContainer instance

    Raises:
        AssertionError: If edge IDs are not properly formatted
    """
    vdict: defaultdict[str, list] = defaultdict(list)
    edict: defaultdict[tuple[str, str, str | None], list] = defaultdict(list)

    for d in list_default_dicts:
        for k, v in d.items():
            if isinstance(k, str):
                vdict[k].extend(v)
            elif isinstance(k, tuple):
                assert (
                    len(k) == 3
                    and all(isinstance(item, str) for item in k[:-1])
                    and isinstance(k[-1], (str, type(None)))
                )
                edict[k].extend(v)
    return GraphContainer(
        vertices=dict(vdict.items()),
        edges=dict(edict.items()),
        linear=list_default_dicts,
    )

items()

Get an ItemsView of the container's contents.

Source code in graphcast/architecture/onto.py
def items(self):
    """Get an ItemsView of the container's contents."""
    return ItemsView(self)

loop_over_relations(edge_def)

Iterate over edges matching the given edge definition.

Parameters:

Name Type Description Default
edge_def tuple[str, str, str | None]

Tuple of (source, target, optional_purpose)

required

Returns:

Type Description

Generator yielding matching edge IDs

Source code in graphcast/architecture/onto.py
def loop_over_relations(self, edge_def: tuple[str, str, str | None]):
    """Iterate over edges matching the given edge definition.

    Args:
        edge_def: Tuple of (source, target, optional_purpose)

    Returns:
        Generator yielding matching edge IDs
    """
    source, target, _ = edge_def
    return (ed for ed in self.edges if source == ed[0] and target == ed[1])

pick_unique()

Remove duplicate entries from vertices and edges.

Source code in graphcast/architecture/onto.py
def pick_unique(self):
    """Remove duplicate entries from vertices and edges."""
    for k, v in self.vertices.items():
        self.vertices[k] = pick_unique_dict(v)
    for k, v in self.edges.items():
        self.edges[k] = pick_unique_dict(v)

Index dataclass

Bases: BaseDataclass

Configuration for database indexes.

Attributes:

Name Type Description
name str | None

Optional name of the index

fields list[str]

List of fields to index

unique bool

Whether the index enforces uniqueness

type IndexType

Type of index to create

deduplicate bool

Whether to deduplicate index entries

sparse bool

Whether to create a sparse index

exclude_edge_endpoints bool

Whether to exclude edge endpoints from index

Source code in graphcast/architecture/onto.py
@dataclasses.dataclass
class Index(BaseDataclass):
    """Configuration for database indexes.

    Attributes:
        name: Optional name of the index
        fields: List of fields to index
        unique: Whether the index enforces uniqueness
        type: Type of index to create
        deduplicate: Whether to deduplicate index entries
        sparse: Whether to create a sparse index
        exclude_edge_endpoints: Whether to exclude edge endpoints from index
    """

    name: str | None = None
    fields: list[str] = dataclasses.field(default_factory=list)
    unique: bool = True
    type: IndexType = IndexType.PERSISTENT
    deduplicate: bool = True
    sparse: bool = False
    exclude_edge_endpoints: bool = False

    def __iter__(self):
        """Iterate over the indexed fields."""
        return iter(self.fields)

    def db_form(self, db_type: DBFlavor) -> dict:
        """Convert index configuration to database-specific format.

        Args:
            db_type: Type of database (ARANGO or NEO4J)

        Returns:
            Dictionary of index configuration in database-specific format

        Raises:
            ValueError: If db_type is not supported
        """
        r = self.to_dict()
        if db_type == DBFlavor.ARANGO:
            _ = r.pop("name")
            _ = r.pop("exclude_edge_endpoints")
        elif db_type == DBFlavor.NEO4J:
            pass
        else:
            raise ValueError(f"Unknown db_type {db_type}")

        return r

__iter__()

Iterate over the indexed fields.

Source code in graphcast/architecture/onto.py
def __iter__(self):
    """Iterate over the indexed fields."""
    return iter(self.fields)

db_form(db_type)

Convert index configuration to database-specific format.

Parameters:

Name Type Description Default
db_type DBFlavor

Type of database (ARANGO or NEO4J)

required

Returns:

Type Description
dict

Dictionary of index configuration in database-specific format

Raises:

Type Description
ValueError

If db_type is not supported

Source code in graphcast/architecture/onto.py
def db_form(self, db_type: DBFlavor) -> dict:
    """Convert index configuration to database-specific format.

    Args:
        db_type: Type of database (ARANGO or NEO4J)

    Returns:
        Dictionary of index configuration in database-specific format

    Raises:
        ValueError: If db_type is not supported
    """
    r = self.to_dict()
    if db_type == DBFlavor.ARANGO:
        _ = r.pop("name")
        _ = r.pop("exclude_edge_endpoints")
    elif db_type == DBFlavor.NEO4J:
        pass
    else:
        raise ValueError(f"Unknown db_type {db_type}")

    return r

IndexType

Bases: BaseEnum

Types of database indexes supported.

PERSISTENT: Standard persistent index HASH: Hash-based index for fast lookups SKIPLIST: Sorted index using skip list data structure FULLTEXT: Index optimized for text search

Source code in graphcast/architecture/onto.py
class IndexType(BaseEnum):
    """Types of database indexes supported.

    PERSISTENT: Standard persistent index
    HASH: Hash-based index for fast lookups
    SKIPLIST: Sorted index using skip list data structure
    FULLTEXT: Index optimized for text search
    """

    PERSISTENT = "persistent"
    HASH = "hash"
    SKIPLIST = "skiplist"
    FULLTEXT = "fulltext"

ItemsView

View class for iterating over vertices and edges in a GraphContainer.

Source code in graphcast/architecture/onto.py
class ItemsView:
    """View class for iterating over vertices and edges in a GraphContainer."""

    def __init__(self, gc: GraphContainer):
        self._dictlike = gc

    def __iter__(self):
        """Iterate over vertices and edges in the container."""
        for key in self._dictlike.vertices:
            yield key, self._dictlike.vertices[key]
        for key in self._dictlike.edges:
            yield key, self._dictlike.edges[key]

__iter__()

Iterate over vertices and edges in the container.

Source code in graphcast/architecture/onto.py
def __iter__(self):
    """Iterate over vertices and edges in the container."""
    for key in self._dictlike.vertices:
        yield key, self._dictlike.vertices[key]
    for key in self._dictlike.edges:
        yield key, self._dictlike.edges[key]

Weight dataclass

Bases: ABCFields

Defines weight configuration for edges.

Attributes:

Name Type Description
discriminant Optional[str]

Optional field used to discriminate between weights

map dict

Dictionary mapping field values to weights

filter dict

Dictionary of filter conditions for weights

Source code in graphcast/architecture/onto.py
@dataclasses.dataclass
class Weight(ABCFields):
    """Defines weight configuration for edges.

    Attributes:
        discriminant: Optional field used to discriminate between weights
        map: Dictionary mapping field values to weights
        filter: Dictionary of filter conditions for weights
    """

    discriminant: Optional[str] = None
    map: dict = dataclasses.field(default_factory=dict)
    filter: dict = dataclasses.field(default_factory=dict)

dd_factory()

Create a default dictionary for graph entity data.

Source code in graphcast/architecture/onto.py
def dd_factory() -> defaultdict[GraphEntity, list]:
    """Create a default dictionary for graph entity data."""
    return defaultdict(list)

inner_factory_vertex()

Create a default dictionary for vertex data.

Source code in graphcast/architecture/onto.py
def inner_factory_vertex() -> defaultdict[Optional[str], list]:
    """Create a default dictionary for vertex data."""
    return defaultdict(list)

outer_factory()

Create a nested default dictionary for vertex data.

Source code in graphcast/architecture/onto.py
def outer_factory() -> defaultdict[str, defaultdict[Optional[str], list]]:
    """Create a nested default dictionary for vertex data."""
    return defaultdict(inner_factory_vertex)