`ontocast.tool.converter`¶

Document conversion tools for OntoCast.

This module provides functionality for converting various document formats into structured data that can be processed by the OntoCast system.

`ConverterTool` ¶

Bases: Tool

Tool for converting documents to structured data.

This class provides functionality for converting various document formats into structured data that can be processed by the OntoCast system.

Attributes:

Name	Type	Description
`supported_extensions`	`set[str]`	Set of supported file extensions.

Source code in ontocast/tool/converter.py

class ConverterTool(Tool):
    """Tool for converting documents to structured data.

    This class provides functionality for converting various document formats
    into structured data that can be processed by the OntoCast system.

    Attributes:
        supported_extensions: Set of supported file extensions.
    """

    supported_extensions: set[str] = {".pdf", ".ppt", ".pptx"}

    def __init__(
        self,
        **kwargs,
    ):
        """Initialize the converter tool.

        Args:
            **kwargs: Additional keyword arguments passed to the parent class.
        """
        super().__init__(**kwargs)
        self._converter = DocumentConverter()

    def __call__(self, file_input: Union[bytes, str]) -> Dict[str, Any]:
        """Convert a document to structured data.

        Args:
            file_input: The input file as either a BytesIO object or file path.

        Returns:
            Dict[str, Any]: The converted document data.
        """
        if isinstance(file_input, bytes):
            ds = DocumentStream(name="doc", stream=BytesIO(file_input))
            result = self._converter.convert(ds)
            doc = result.document.export_to_markdown()
            return {"text": doc}
        else:
            # For non-BytesIO input (like plain text), return as is
            return {"text": file_input}

`call(file_input)` ¶

Convert a document to structured data.

Parameters:

Name	Type	Description	Default
`file_input`	`Union[bytes, str]`	The input file as either a BytesIO object or file path.	required

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The converted document data.

Source code in ontocast/tool/converter.py

def __call__(self, file_input: Union[bytes, str]) -> Dict[str, Any]:
    """Convert a document to structured data.

    Args:
        file_input: The input file as either a BytesIO object or file path.

    Returns:
        Dict[str, Any]: The converted document data.
    """
    if isinstance(file_input, bytes):
        ds = DocumentStream(name="doc", stream=BytesIO(file_input))
        result = self._converter.convert(ds)
        doc = result.document.export_to_markdown()
        return {"text": doc}
    else:
        # For non-BytesIO input (like plain text), return as is
        return {"text": file_input}

`init(**kwargs)` ¶

Initialize the converter tool.

Parameters:

Name	Type	Description	Default
`**kwargs`		Additional keyword arguments passed to the parent class.	`{}`

Source code in ontocast/tool/converter.py

def __init__(
    self,
    **kwargs,
):
    """Initialize the converter tool.

    Args:
        **kwargs: Additional keyword arguments passed to the parent class.
    """
    super().__init__(**kwargs)
    self._converter = DocumentConverter()

ontocast.tool.converter¶

ConverterTool ¶

__call__(file_input) ¶

__init__(**kwargs) ¶

`ontocast.tool.converter`¶

`ConverterTool` ¶

`call(file_input)` ¶

`init(**kwargs)` ¶