graphcast.util.transform
¶
Data transformation utilities for graph operations.
This module provides utility functions for transforming and standardizing data in various formats, particularly for graph database operations. It includes functions for date parsing, string standardization, and data cleaning.
Key Functions
- standardize: Standardize string keys and names
- parse_date_*: Various date parsing functions for different formats
- cast_ibes_analyst: Parse and standardize analyst names
- clear_first_level_nones: Clean dictionaries by removing None values
- parse_multi_item: Parse complex multi-item strings
- pick_unique_dict: Remove duplicate dictionaries
Example
name = standardize("John. Doe, Smith") date = parse_date_standard("2023-01-01") analyst = cast_ibes_analyst("ADKINS/NARRA")
cast_ibes_analyst(s)
¶
Splits and normalizes analyst name strings.
Handles various name formats like 'ADKINS/NARRA' or 'ARFSTROM J'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s
|
str
|
Analyst name string. |
required |
Returns:
Name | Type | Description |
---|---|---|
tuple |
(last_name, first_initial) |
Examples:
>>> cast_ibes_analyst('ADKINS/NARRA')
('ADKINS', 'N')
>>> cast_ibes_analyst('ARFSTROM J')
('ARFSTROM', 'J')
Source code in graphcast/util/transform.py
clear_first_level_nones(docs, keys_keep_nones=None)
¶
Removes None values from dictionaries, with optional key exceptions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docs
|
list
|
List of dictionaries to clean. |
required |
keys_keep_nones
|
list
|
Keys to keep even if their value is None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
list |
Cleaned list of dictionaries. |
Example
docs = [{"a": 1, "b": None}, {"a": None, "b": 2}] clear_first_level_nones(docs, keys_keep_nones=["a"]) [{"a": 1}, {"a": None, "b": 2}]
Source code in graphcast/util/transform.py
parse_date_conf(input_str)
¶
Parse a date string in YYYYMMDD format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_str
|
str
|
Date string in YYYYMMDD format. |
required |
Returns:
Name | Type | Description |
---|---|---|
tuple |
(year, month, day) as integers. |
Example
parse_date_conf("20230101") (2023, 1, 1)
Source code in graphcast/util/transform.py
parse_date_ibes(date0, time0)
¶
Converts IBES date and time to ISO 8601 format datetime.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
date0
|
str / int
|
Date in YYYYMMDD format. |
required |
time0
|
str
|
Time in HH:MM:SS format. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
Datetime in ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ). |
Example
parse_date_ibes(20160126, "9:35:52") '2016-01-26T09:35:52Z'
Source code in graphcast/util/transform.py
parse_date_reference(input_str)
¶
Extract year from a date reference string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_str
|
str
|
Date reference string. |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
Year from the date reference. |
Example
parse_date_reference("1923, May 10") 1923
Source code in graphcast/util/transform.py
parse_date_standard(input_str)
¶
Parse a date string in YYYY-MM-DD format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_str
|
str
|
Date string in YYYY-MM-DD format. |
required |
Returns:
Name | Type | Description |
---|---|---|
tuple |
(year, month, day) as integers. |
Example
parse_date_standard("2023-01-01") (2023, 1, 1)
Source code in graphcast/util/transform.py
parse_date_standard_to_epoch(input_str)
¶
Convert standard date string to Unix epoch timestamp.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_str
|
str
|
Date string in YYYY-MM-DD format. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
Unix epoch timestamp. |
Example
parse_date_standard_to_epoch("2023-01-01") 1672531200.0
Source code in graphcast/util/transform.py
parse_date_yahoo(date0)
¶
Convert Yahoo Finance date to ISO 8601 format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
date0
|
str
|
Date in YYYY-MM-DD format. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
Datetime in ISO 8601 format with noon time. |
Example
parse_date_yahoo("2023-01-01") '2023-01-01T12:00:00Z'
Source code in graphcast/util/transform.py
parse_multi_item(s, mapper, direct)
¶
Parses complex multi-item strings into structured data.
Supports parsing strings with quoted or bracketed items.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s
|
str
|
Input string to parse. |
required |
mapper
|
dict
|
Mapping of input keys to output keys. |
required |
direct
|
list
|
Direct keys to extract. |
required |
Returns:
Name | Type | Description |
---|---|---|
defaultdict |
Parsed items with lists as values. |
Example
s = '[name: John, age: 30][name: Jane, age: 25]' mapper = {"name": "full_name"} direct = ["age"] parse_multi_item(s, mapper, direct) defaultdict(list, {'full_name': ['John', 'Jane'], 'age': ['30', '25']})
Source code in graphcast/util/transform.py
pick_unique_dict(docs)
¶
Removes duplicate dictionaries from a list.
Uses JSON serialization to identify unique dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docs
|
list
|
List of dictionaries. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
List of unique dictionaries. |
Example
docs = [{"a": 1}, {"a": 1}, {"b": 2}] pick_unique_dict(docs) [{"a": 1}, {"b": 2}]
Source code in graphcast/util/transform.py
round_str(x, **kwargs)
¶
Round a string number to specified precision.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
str
|
String representation of a number. |
required |
**kwargs
|
Additional arguments for round() function. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
float |
Rounded number. |
Example
round_str("3.14159", ndigits=2) 3.14
Source code in graphcast/util/transform.py
split_keep_part(s, sep='/', keep=-1)
¶
Split a string and keep specified parts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s
|
str
|
String to split. |
required |
sep
|
str
|
Separator to split on. |
'/'
|
keep
|
int or list
|
Index or indices to keep. |
-1
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Joined string of kept parts. |
Example
split_keep_part("a/b/c", keep=0) 'a' split_keep_part("a/b/c", keep=[0, 2]) 'a/c'
Source code in graphcast/util/transform.py
standardize(k)
¶
Standardizes a string key by removing periods and splitting.
Handles comma and space-separated strings, normalizing their format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
k
|
str
|
Input string to be standardized. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
Cleaned and standardized string. |
Example
standardize("John. Doe, Smith") 'John,Doe,Smith' standardize("John Doe Smith") 'John,Doe,Smith'
Source code in graphcast/util/transform.py
try_int(x)
¶
Attempt to convert a value to integer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Value to convert. |
required |
Returns:
Type | Description |
---|---|
int or original value: Integer if conversion successful, original value otherwise. |
Example
try_int("123") 123 try_int("abc") 'abc'