Transforms¶
Transform is the core normalization mechanism in GraFlo ingestion pipelines. It handles value conversion, field renaming, reshaping, and key normalization before vertex and edge actors consume data.
This page documents the transform DSL as implemented in:
graflo.architecture.contract.declarations.transform.Transformgraflo.architecture.pipeline.runtime.actor.transform.TransformActorgraflo.architecture.pipeline.runtime.actor.config.models.TransformCallConfig
Mental model¶
Two layers work together:
ProtoTransform: function wrapper (module,foo,params) with invocation logic.Transform: adds input selection, output mapping, dressing, key targeting, and execution strategy.
In a resource pipeline, each transform step applies to the current document and emits an update payload for downstream actor steps.
Where transforms can be defined¶
1) Inline (local) transform in a resource¶
Use this when logic is specific to one place in one pipeline.
resources:
- name: papers
apply:
- transform:
call:
module: builtins
foo: int
input: citations
output: citations_count
- vertex: paper
2) Reusable transform in ingestion_model.transforms¶
Use this when the same transform should be referenced from multiple resources or steps.
ingestion_model:
transforms:
- name: keep_suffix_id
module: graflo.util.transform
foo: split_keep_part
input: id
output: _key
params: {sep: "/", keep: -1}
Then reference it from a transform step:
Local override of reusable transform¶
A call.use step can override input, output, params, and/or dress while reusing module + foo from the vocabulary entry. Put shared dress and params on the named transform when several steps only differ by input:
# ingestion_model.transforms
- name: round_metric
module: graflo.util.transform
foo: round_str
params: {ndigits: 3}
dress: {key: name, value: value}
# resource apply
- transform:
call: {use: round_metric, input: [Open]}
- transform:
call: {use: round_metric, input: [Close]}
Rename-style override (same idea, different fields):
- transform:
call:
use: keep_suffix_id
input: doi
output: work_id
params: {sep: "/", keep: [-2, -1]}
Transform forms¶
A) Rename-only transform (transform.rename)¶
Pure field mapping with no function call.
Equivalent behavior to a map-based transform (map).
B) Function-call transform (transform.call)¶
Function-backed transform using module + foo, or a reusable use reference.
- transform:
call:
module: builtins
foo: round
input: confidence
output: confidence_rounded
params:
ndigits: 3
Output behavior¶
Direct output mapping¶
inputselects fields from the current document.- function result is assigned to
output. - if
outputis omitted andinputexists, output defaults to input field names.
Dress output (dress)¶
Use dress when a single-input transform should emit a {key, value} style payload.
- transform:
call:
module: graflo.util.transform
foo: round_str
input: Open
params: {ndigits: 3}
dress:
key: name
value: value
For input field Open, this emits:
dress rules:
- requires a function transform (
module+fooorusethat resolves to one) - requires exactly one input field
- sets output field names to
(dress.key, dress.value)
Multi-field transforms¶
Grouped calls (input_groups / output_groups)¶
Use groups when the same function should run multiple times on different argument tuples (not the same as strategy: each, which runs once per single input field from a flat input list).
- Each inner group is a list of field names whose values are read from the document and passed as
*argsto the function for that call. - The function is invoked once per group, in order.
output: list of field names, one per group, when each call returns a single value.output_groups: list of field-name lists, parallel toinput_groups, when each call returns multiple values (e.g. a tuple mapped to several outputs).- Omitting outputs: only valid when every group has exactly one input field; results are written back to those same keys (passthrough). If any group has more than one field, you must set
outputoroutput_groups.
YAML accepts a shorthand for unary groups: a group can be a single string, and input_groups can be a list of strings (one field per group):
Grouped mode is incompatible with dress and with strategy: each or strategy: all. Omit strategy or use single (default).
Strategy: single (default)¶
Call function once with all selected input values (flat input, no groups).
- transform:
call:
module: graflo.util.transform
foo: parse_date_ibes
input: [ANNDATS, ANNTIMS]
output: datetime_announce
Explicit grouped calls (nested field lists)¶
When the same function should run repeatedly on explicit argument tuples, use
nested lists in input_groups (see Grouped calls above).
- transform:
call:
module: my_pkg.transforms
foo: join_name
input_groups:
- [fname_parent, lname_parent]
- [fname_child, lname_child]
output: [parent_name, child_name]
input_groups can also use grouped outputs:
- transform:
call:
module: my_pkg.transforms
foo: split_name
input_groups:
- [parent_name]
- [child_name]
output_groups:
- [parent_fname, parent_lname]
- [child_fname, child_lname]
Grouped passthrough is supported when outputs are omitted and each group maps back to its own keys (for example unary casts):
Strategy: each¶
Call function independently for each selected input field.
Strategy: all¶
Pass the whole document as one argument to the transform function.
strategy: all rules:
- do not provide
input - incompatible with
dress
Key transforms (target: keys)¶
Transforms can operate on document keys instead of values:
Key selection¶
call.keys.mode supports:
all: apply to every keyinclude: apply only tokeys.namesexclude: apply to all keys exceptkeys.names
Example with include:
- transform:
call:
module: graflo.util.transform
foo: remove_prefix
params: {prefix: "raw_"}
target: keys
keys:
mode: include
names: [raw_id, raw_label]
target: keys rules:
- requires a function transform
- does not allow
input,output, ordress - does not allow
input_groupsoroutput_groups - does not allow explicit
strategy(key mode is implicit per-key execution) - transformed keys must remain unique (collisions raise an error)
Config reference (transform DSL)¶
transform.rename¶
- Type:
dict[str, str] - Meaning:
{source_field: target_field}
transform.call¶
use: str | null- named transform fromingestion_model.transformsmodule: str | null- python module path for inline functionfoo: str | null- function name in moduleparams: dict- keyword args passed to functioninput: str | list[str] | null- input fields (not used for key mode)output: str | list[str] | null- output fields (not used for key mode)input_groups: list[list[str]] | null— grouped calls (values mode only); each entry is a group. YAML may use a list of strings as shorthand for unary groups (one field name per group).output_groups: list[list[str]] | null- grouped outputs aligned toinput_groupsstrategy: single | each | all | null- function execution mode (withinput_groups, omit or usesingleonly;each/allare rejected)target: values | keys | null- operate on values or keys. Withuse, omit to inherit defaults from the matchingingestion_model.transformsentry; inline calls (nouse) default tovalueswhen omitted.keys:mode: all | include | excludenames: list[str]dress:key: strvalue: str
Transform (Python API only)¶
Named transforms in ingestion_model.transforms are ProtoTransform entries (module, foo, params, flat/grouped input / output, dress, and optional target / keys for key-mode defaults). A transform.call with use inherits those defaults; set call.target or call.keys to override. Inline transform.call steps supply execution options (target, keys, strategy) and may override IO; TransformActor assembles a runtime Transform, which adds:
passthrough_group_output: bool(defaulttrue) — wheninput_groupsis used and neitheroutputnoroutput_groupsis set, allow writing unary group results back onto the input keys. Not exposed on manifesttransform.calltoday; omit outputs in YAML only for unary groups.
When the effective target is keys (from the call or the named proto), call.input / call.output / call.input_groups / call.output_groups / call.dress are rejected at merge time so invalid combinations are not silently ignored.
Validation and compatibility rules¶
- A transform step must define exactly one of:
transform.renametransform.callcall.usecannot be combined withcall.moduleorcall.foo.- If
call.useis absent, bothcall.moduleandcall.fooare required. map/rename and function mode are mutually exclusive.- Use either
call.inputorcall.input_groups, not both. - With
call.input_groups, do not setcall.strategytoeachorall. - For grouped calls, use either
call.output(one output per input group) orcall.output_groups(full per-group output tuples), not both. call.output_groupsmust have the same number of groups ascall.input_groups.- Passthrough (no
output/output_groups) requires every group to contain exactly one input field. - Legacy
switchis not supported. - List-style
dressis not supported (dressmust be a dict withkeyandvalue).
Practical patterns¶
- Keep keys stable early:
- run one
target: keystransform near pipeline start. - Use reusable named transforms for:
- ID normalization
- date/time parsing
- repeated casting logic
- shared
target: keys+keysselection so resources only referenceuse:without repeating key-mode config - Use local overrides when:
- same function, different input/output fields per resource
- Use
strategy: eachwith a flatinputlist for repeated unary casting (for example, multiple numeric columns). For the same callable over different argument tuples, useinput_groupsinstead. - Use
dressto pivot wide metrics into tidy key/value records before routing into vertices/edges.