class ParallelRunner(AbstractRunner): (source)
ParallelRunner is an AbstractRunner implementation. It can
be used to run the Pipeline in parallel groups formed by toposort.
Please note that this runner
implementation validates dataset using the
_validate_catalog method, which checks if any of the datasets are
single process only using the _SINGLE_PROCESS
dataset attribute.
Method | __del__ |
Undocumented |
Method | __init__ |
Instantiates the runner by creating a Manager. |
Method | create |
Factory method for creating the default dataset for the runner. |
Class Method | _validate |
Ensure that all data sets are serialisable and that we do not have any non proxied memory data sets being used as outputs as their content will not be synchronized across threads. |
Class Method | _validate |
Ensure all tasks are serialisable. |
Method | _get |
Calculate the max number of processes required for the pipeline, limit to the number of CPU cores. |
Method | _run |
The abstract interface for running pipelines. |
Instance Variable | _manager |
Undocumented |
Instance Variable | _max |
Undocumented |
Inherited from AbstractRunner
:
Method | run |
Run the Pipeline using the datasets provided by catalog and save results back to the same objects. |
Method | run |
Run only the missing outputs from the Pipeline using the datasets provided by catalog, and save results back to the same objects. |
Method | _suggest |
Suggest a command to the user to resume a run after it fails. The run should be started from the point closest to the failure for which persisted input exists. |
Instance Variable | _is |
Undocumented |
Property | _logger |
Undocumented |
kedro.runner.AbstractRunner.__init__
Instantiates the runner by creating a Manager.
Parameters | |
maxint | Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count. On windows machines, the max_workers value cannot be larger than 61 and will be set to min(61, max_workers). |
isbool | If True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False. |
Raises | |
ValueError | bad parameters passed |
Factory method for creating the default dataset for the runner.
Parameters | |
dsstr | Name of the missing dataset. |
Returns | |
_SharedMemoryDataSet | An instance of _SharedMemoryDataSet to be used for all unregistered datasets. |
Ensure that all data sets are serialisable and that we do not have any non proxied memory data sets being used as outputs as their content will not be synchronized across threads.
Pipeline
, catalog: DataCatalog
, hook_manager: PluginManager
, session_id: str
= None):
(source)
¶
kedro.runner.AbstractRunner._run
The abstract interface for running pipelines.
Parameters | |
pipeline:Pipeline | The Pipeline to run. |
catalog:DataCatalog | The DataCatalog from which to fetch data. |
hookPluginManager | The PluginManager to activate hooks. |
sessionstr | The id of the session. |
Raises | |
AttributeError | When the provided pipeline is not suitable for parallel execution. |
RuntimeError | If the runner is unable to schedule the execution of all pipeline nodes. |
Exception | In case of any downstream node failure. |