graflo.db.postgres.inference_utils¶
Inference utilities for PostgreSQL schema analysis.
This module provides utility functions for inferring relationships and patterns from PostgreSQL table and column names using heuristics and fuzzy matching.
detect_separator(text)
¶
Detect the most common separator character in a text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
str
|
Most common separator character, defaults to '_' |
Source code in graflo/db/postgres/inference_utils.py
fuzzy_match_fragment(fragment, vertex_names, threshold=0.6)
¶
Fuzzy match a fragment to vertex names.
Backward-compatible wrapper function that uses the improved FuzzyMatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fragment
|
str
|
Fragment to match |
required |
vertex_names
|
list[str]
|
List of vertex table names to match against |
required |
threshold
|
float
|
Similarity threshold (0.0 to 1.0) |
0.6
|
Returns:
| Type | Description |
|---|---|
str | None
|
Best matching vertex name or None if no match above threshold |
Source code in graflo/db/postgres/inference_utils.py
infer_edge_vertices_from_table_name(table_name, pk_columns, fk_columns, vertex_table_names=None, match_cache=None)
¶
Infer source and target vertex names from table name and structure.
Uses fuzzy matching to identify vertex names in table name fragments and key names. Handles patterns like: - rel_cluster_containment_host -> cluster, host, containment - rel_cluster_containment_cluster_2 -> cluster, cluster, containment (self-reference) - user_follows_user -> user, user, follows (self-reference) - product_category_mapping -> product, category, mapping
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Name of the table |
required |
pk_columns
|
list[str]
|
List of primary key column names |
required |
fk_columns
|
list[dict[str, Any]]
|
List of foreign key dictionaries with 'column' and 'references_table' keys |
required |
vertex_table_names
|
list[str] | None
|
Optional list of known vertex table names for fuzzy matching |
None
|
match_cache
|
FuzzyMatchCache | None
|
Optional pre-computed fuzzy match cache for better performance |
None
|
Returns:
| Type | Description |
|---|---|
tuple[str | None, str | None, str | None]
|
Tuple of (source_table, target_table, relation_name) or (None, None, None) if cannot infer |
Source code in graflo/db/postgres/inference_utils.py
infer_vertex_from_column_name(column_name, vertex_table_names=None, match_cache=None)
¶
Infer vertex table name from a column name using robust pattern matching.
Uses the same logic as infer_edge_vertices_from_table_name but focused on extracting vertex names from column names. Handles patterns like: - user_id -> user - product_id -> product - customer_fk -> customer - source_vertex -> source_vertex (if matches)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_name
|
str
|
Name of the column |
required |
vertex_table_names
|
list[str] | None
|
Optional list of known vertex table names for fuzzy matching |
None
|
match_cache
|
FuzzyMatchCache | None
|
Optional pre-computed fuzzy match cache for better performance |
None
|
Returns:
| Type | Description |
|---|---|
str | None
|
Inferred vertex table name or None if cannot infer |
Source code in graflo/db/postgres/inference_utils.py
split_by_separator(text, separator)
¶
Split text by separator, handling multiple consecutive separators.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to split |
required |
separator
|
str
|
Separator character |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of non-empty fragments |