Object storage (S3-compatible)¶
GraFlo uses S3-compatible object storage (MinIO, AWS S3, etc.) in one primary way today: TigerGraph bulk staging — uploading staged CSV files so a LOADING JOB can reference s3:// URLs. This is separate from ingestion sources (files, SQL, SPARQL) described elsewhere.
Staging (TigerGraph bulk load)¶
When TigergraphConfig.bulk_load uploads staged CSVs:
- The manifest can declare
bindings.staging_proxy: logical names (e.g.bulk_s3) map toconn_proxykeys (non-secret). - At runtime,
InMemoryConnectionProviderregistersS3GeneralizedConnConfigfor that proxy. graflo.object_storage.upload_staged_csvsuploads files with boto3; GSQL uses the returneds3://bucket/keypaths behind aCREATE DATA_SOURCEso TigerGraph can read MinIO (not bares3://against AWS). OptionalMINIO_LOADER_ENDPOINTindocker/minio/.envsets the endpoint as seen from the TigerGraph server when it differs from the host URL used by boto3.
Secrets stay out of YAML; credentials come from environment / docker/minio/.env via MinioConfig.from_docker_env().
flowchart LR
M[GraphManifest staging_proxy]
P[ConnectionProvider conn_proxy]
S3[S3GeneralizedConnConfig]
U[graflo.object_storage upload]
TG[TigerGraph LOADING JOB]
M --> P --> S3 --> U --> TG
Configuration: MinioConfig / S3EndpointConfig¶
MinioConfig (alias S3EndpointConfig) in graflo.object_storage holds endpoint URL, access key, secret key, default bucket, and region. It is not a graph database target and does not use ConnectionManager for graph backends.
Loading from the repo’s Docker stack matches other services:
from graflo.db import MinioConfig
# or: from graflo.object_storage import MinioConfig
cfg = MinioConfig.from_docker_env() # reads docker/minio/.env
The graflo.object_storage package exposes bucket helpers (ensure_staging_bucket_for_config, ensure_bucket_exists), boto3 client factories (boto3_s3_client_from_generalized, boto3_s3_client_from_minio), and upload_staged_csvs.
Not a ResourceConnector¶
FileConnector and other resource connectors describe ingestion inputs (local paths, tables, SPARQL). Staging does not add a new BoundSourceKind: it is an output path for bulk CSV upload, not a row source.
Future “ingest from S3” (read objects as a source) would be a new connector type and data source implementation, not an overload of FileConnector.
See also¶
- TigerGraph bulk load
- Concepts overview (bindings and
staging_proxy) - Example:
examples/10-tigergraph-bulk-s3/in the repository