graphcast.cli.xml2json
¶
XML to JSON conversion tool for data preprocessing.
This module provides a command-line tool for converting XML files to JSON format, with support for different data sources and chunking options. It's particularly useful for preprocessing scientific literature data from sources like Web of Science and PubMed.
Key Features
- Support for Web of Science and PubMed XML formats
- Configurable chunking for large files
- Batch processing of multiple files
- Customizable output format
Example
$ uv run xml2json --source-path data/wos.xml --chunk-size 1000 --mode wos_csv
do(source_path, chunk_size, max_chunks, mode)
¶
Convert XML files to JSON format.
This command processes XML files and converts them to JSON format, with support for different data sources and chunking options.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_path
|
Path to source XML file or directory |
required | |
chunk_size
|
Number of records per output file (default: 1000) |
required | |
max_chunks
|
Maximum number of chunks to process (default: None) |
required | |
mode
|
Data source mode ('wos_csv' or 'pubmed') |
required |
Example
$ uv run xml2json --source-path data/wos.xml --chunk-size 1000 --mode wos_csv