class documentation

ThreadRunner is an AbstractRunner implementation. It can be used to run the Pipeline in parallel groups formed by toposort using threads.

Method __init__ Instantiates the runner.
Method create_default_data_set Factory method for creating the default dataset for the runner.
Method _get_required_workers_count Calculate the max number of processes required for the pipeline
Method _run The abstract interface for running pipelines.
Instance Variable _max_workers Undocumented

Inherited from AbstractRunner:

Method run Run the Pipeline using the datasets provided by catalog and save results back to the same objects.
Method run_only_missing Run only the missing outputs from the Pipeline using the datasets provided by catalog, and save results back to the same objects.
Method _suggest_resume_scenario Suggest a command to the user to resume a run after it fails. The run should be started from the point closest to the failure for which persisted input exists.
Instance Variable _is_async Undocumented
Property _logger Undocumented
def __init__(self, max_workers: int = None, is_async: bool = False): (source)

Instantiates the runner.

Parameters
max_workers:intNumber of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count.
is_async:boolIf True, set to False, because ThreadRunner doesn't support loading and saving the node inputs and outputs asynchronously with threads. Defaults to False.
Raises
ValueErrorbad parameters passed
def create_default_data_set(self, ds_name: str) -> MemoryDataSet: (source)

Factory method for creating the default dataset for the runner.

Parameters
ds_name:strName of the missing dataset.
Returns
MemoryDataSetAn instance of MemoryDataSet to be used for all unregistered datasets.
def _get_required_workers_count(self, pipeline: Pipeline): (source)

Calculate the max number of processes required for the pipeline

def _run(self, pipeline: Pipeline, catalog: DataCatalog, hook_manager: PluginManager, session_id: str = None): (source)

The abstract interface for running pipelines.

Parameters
pipeline:PipelineThe Pipeline to run.
catalog:DataCatalogThe DataCatalog from which to fetch data.
hook_manager:PluginManagerThe PluginManager to activate hooks.
session_id:strThe id of the session.
Raises
Exceptionin case of any downstream node failure.
_max_workers = (source)

Undocumented