class documentation

AbstractRunner is the base class for all Pipeline runner implementations.

Method __init__ Instantiates the runner classs.
Method create_default_data_set Factory method for creating the default dataset for the runner.
Method run Run the Pipeline using the datasets provided by catalog and save results back to the same objects.
Method run_only_missing Run only the missing outputs from the Pipeline using the datasets provided by catalog, and save results back to the same objects.
Method _run The abstract interface for running pipelines, assuming that the inputs have already been checked and normalized by run().
Method _suggest_resume_scenario Suggest a command to the user to resume a run after it fails. The run should be started from the point closest to the failure for which persisted input exists.
Instance Variable _is_async Undocumented
Property _logger Undocumented
def __init__(self, is_async: bool = False): (source)

Instantiates the runner classs.

Parameters
is_async:boolIf True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False.
@abstractmethod
def create_default_data_set(self, ds_name: str) -> AbstractDataSet: (source)

Factory method for creating the default dataset for the runner.

Parameters
ds_name:strName of the missing dataset.
Returns
AbstractDataSetAn instance of an implementation of AbstractDataSet to be used for all unregistered datasets.
def run(self, pipeline: Pipeline, catalog: DataCatalog, hook_manager: PluginManager = None, session_id: str = None) -> Dict[str, Any]: (source)

Run the Pipeline using the datasets provided by catalog and save results back to the same objects.

Parameters
pipeline:PipelineThe Pipeline to run.
catalog:DataCatalogThe DataCatalog from which to fetch data.
hook_manager:PluginManagerThe PluginManager to activate hooks.
session_id:strThe id of the session.
Returns
Dict[str, Any]Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.
Raises
ValueErrorRaised when Pipeline inputs cannot be satisfied.
def run_only_missing(self, pipeline: Pipeline, catalog: DataCatalog, hook_manager: PluginManager) -> Dict[str, Any]: (source)

Run only the missing outputs from the Pipeline using the datasets provided by catalog, and save results back to the same objects.

Parameters
pipeline:PipelineThe Pipeline to run.
catalog:DataCatalogThe DataCatalog from which to fetch data.
hook_manager:PluginManagerThe PluginManager to activate hooks.
Returns
Dict[str, Any]Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.
Raises
ValueErrorRaised when Pipeline inputs cannot be satisfied.
@abstractmethod
def _run(self, pipeline: Pipeline, catalog: DataCatalog, hook_manager: PluginManager, session_id: str = None): (source)

The abstract interface for running pipelines, assuming that the inputs have already been checked and normalized by run().

Parameters
pipeline:PipelineThe Pipeline to run.
catalog:DataCatalogThe DataCatalog from which to fetch data.
hook_manager:PluginManagerThe PluginManager to activate hooks.
session_id:strThe id of the session.
def _suggest_resume_scenario(self, pipeline: Pipeline, done_nodes: Iterable[Node], catalog: DataCatalog): (source)

Suggest a command to the user to resume a run after it fails. The run should be started from the point closest to the failure for which persisted input exists.

Parameters
pipeline:Pipelinethe Pipeline of the run.
done_nodes:Iterable[Node]the ``Node``s that executed successfully.
catalog:DataCatalogthe DataCatalog of the run.
_is_async = (source)

Undocumented

Undocumented