graflo.util.transform¶
Data transformation utilities for graph operations.
This module provides utility functions for transforming and standardizing data in various formats, particularly for graph database operations. It includes functions for date parsing, string standardization, and data cleaning.
Key Functions
- standardize: Standardize string keys and names
- parse_date_*: Various date parsing functions for different formats
- cast_ibes_analyst: Parse and standardize analyst names
- clear_first_level_nones: Clean dictionaries by removing None values
- parse_multi_item: Parse complex multi-item strings
- pick_unique_dict: Remove duplicate dictionaries
Example
name = standardize("John. Doe, Smith") date = parse_date_standard("2023-01-01") analyst = cast_ibes_analyst("ADKINS/NARRA")
cast_ibes_analyst(s)
¶
Splits and normalizes analyst name strings.
Handles various name formats like 'ADKINS/NARRA' or 'ARFSTROM J'.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
str
|
Analyst name string. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(last_name, first_initial) |
Examples:
>>> cast_ibes_analyst('ADKINS/NARRA')
('ADKINS', 'N')
>>> cast_ibes_analyst('ARFSTROM J')
('ARFSTROM', 'J')
Source code in graflo/util/transform.py
clear_first_level_nones(docs, keys_keep_nones=None)
¶
Removes None values from dictionaries, with optional key exceptions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
docs
|
list
|
List of dictionaries to clean. |
required |
keys_keep_nones
|
list
|
Keys to keep even if their value is None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
Cleaned list of dictionaries. |
Example
docs = [{"a": 1, "b": None}, {"a": None, "b": 2}] clear_first_level_nones(docs, keys_keep_nones=["a"]) [{"a": 1}, {"a": None, "b": 2}]
Source code in graflo/util/transform.py
parse_date_conf(input_str)
¶
Parse a date string in YYYYMMDD format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_str
|
str
|
Date string in YYYYMMDD format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(year, month, day) as integers. |
Example
parse_date_conf("20230101") (2023, 1, 1)
Source code in graflo/util/transform.py
parse_date_ibes(date0, time0)
¶
Converts IBES date and time to ISO 8601 format datetime.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date0
|
str / int
|
Date in YYYYMMDD format. |
required |
time0
|
str
|
Time in HH:MM:SS format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
Datetime in ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ). |
Example
parse_date_ibes(20160126, "9:35:52") '2016-01-26T09:35:52Z'
Source code in graflo/util/transform.py
parse_date_reference(input_str)
¶
Extract year from a date reference string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_str
|
str
|
Date reference string. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
Year from the date reference. |
Example
parse_date_reference("1923, May 10") 1923
Source code in graflo/util/transform.py
parse_date_standard(input_str)
¶
Parse a date string in YYYY-MM-DD format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_str
|
str
|
Date string in YYYY-MM-DD format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(year, month, day) as integers. |
Example
parse_date_standard("2023-01-01") (2023, 1, 1)
Source code in graflo/util/transform.py
parse_date_standard_to_epoch(input_str)
¶
Convert standard date string to Unix epoch timestamp.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_str
|
str
|
Date string in YYYY-MM-DD format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
Unix epoch timestamp. |
Example
parse_date_standard_to_epoch("2023-01-01") 1672531200.0
Source code in graflo/util/transform.py
parse_date_yahoo(date0)
¶
Convert Yahoo Finance date to ISO 8601 format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date0
|
str
|
Date in YYYY-MM-DD format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
Datetime in ISO 8601 format with noon time. |
Example
parse_date_yahoo("2023-01-01") '2023-01-01T12:00:00Z'
Source code in graflo/util/transform.py
parse_multi_item(s, mapper, direct)
¶
Parses complex multi-item strings into structured data.
Supports parsing strings with quoted or bracketed items.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
str
|
Input string to parse. |
required |
mapper
|
dict
|
Mapping of input keys to output keys. |
required |
direct
|
list
|
Direct keys to extract. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
defaultdict |
Parsed items with lists as values. |
Example
s = '[name: John, age: 30][name: Jane, age: 25]' mapper = {"name": "full_name"} direct = ["age"] parse_multi_item(s, mapper, direct) defaultdict(list, {'full_name': ['John', 'Jane'], 'age': ['30', '25']})
Source code in graflo/util/transform.py
pick_unique_dict(docs)
¶
Removes duplicate dictionaries from a list.
Uses a hash-based approach to identify unique dictionaries, which is more efficient than JSON serialization and preserves original object types.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
docs
|
list
|
List of dictionaries. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
List of unique dictionaries (preserving original objects). |
Example
docs = [{"a": 1}, {"a": 1}, {"b": 2}] pick_unique_dict(docs) [{"a": 1}, {"b": 2}]
Source code in graflo/util/transform.py
round_str(x, **kwargs)
¶
Round a string number to specified precision.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
str
|
String representation of a number. |
required |
**kwargs
|
Additional arguments for round() function. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
Rounded number. |
Example
round_str("3.14159", ndigits=2) 3.14
Source code in graflo/util/transform.py
split_keep_part(s, sep='/', keep=-1)
¶
Split a string and keep specified parts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
str
|
String to split. |
required |
sep
|
str
|
Separator to split on. |
'/'
|
keep
|
int or list
|
Index or indices to keep. |
-1
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Joined string of kept parts. |
Example
split_keep_part("a/b/c", keep=0) 'a' split_keep_part("a/b/c", keep=[0, 2]) 'a/c'
Source code in graflo/util/transform.py
standardize(k)
¶
Standardizes a string key by removing periods and splitting.
Handles comma and space-separated strings, normalizing their format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
str
|
Input string to be standardized. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
Cleaned and standardized string. |
Example
standardize("John. Doe, Smith") 'John,Doe,Smith' standardize("John Doe Smith") 'John,Doe,Smith'
Source code in graflo/util/transform.py
try_int(x)
¶
Attempt to convert a value to integer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Value to convert. |
required |
Returns:
| Type | Description |
|---|---|
|
int or original value: Integer if conversion successful, original value otherwise. |
Example
try_int("123") 123 try_int("abc") 'abc'