package documentation

kedro.pipeline provides functionality to define and execute data-driven pipelines.

Module modular_pipeline Helper to integrate modular pipelines into a master pipeline.

From __init__.py:

Function node Create a node in the pipeline by providing a function to be called along with variable names for inputs and/or outputs.
Function pipeline Create a Pipeline from a collection of nodes and/or ``Pipeline``s.
def node(func: Callable, inputs: Union[None, str, List[str], Dict[str, str]], outputs: Union[None, str, List[str], Dict[str, str]], *, name: str = None, tags: Union[str, Iterable[str]] = None, confirms: Union[str, List[str]] = None, namespace: str = None) -> Node: (source)

Create a node in the pipeline by providing a function to be called along with variable names for inputs and/or outputs.

Example:

>>> import pandas as pd
>>> import numpy as np
>>>
>>> def clean_data(cars: pd.DataFrame,
>>>                boats: pd.DataFrame) -> Dict[str, pd.DataFrame]:
>>>     return dict(cars_df=cars.dropna(), boats_df=boats.dropna())
>>>
>>> def halve_dataframe(data: pd.DataFrame) -> List[pd.DataFrame]:
>>>     return np.array_split(data, 2)
>>>
>>> nodes = [
>>>     node(clean_data,
>>>          inputs=['cars2017', 'boats2017'],
>>>          outputs=dict(cars_df='clean_cars2017',
>>>                       boats_df='clean_boats2017')),
>>>     node(halve_dataframe,
>>>          'clean_cars2017',
>>>          ['train_cars2017', 'test_cars2017']),
>>>     node(halve_dataframe,
>>>          dict(data='clean_boats2017'),
>>>          ['train_boats2017', 'test_boats2017'])
>>> ]
Parameters
func:CallableA function that corresponds to the node logic. The function should have at least one input or output.
inputs:Union[None, str, List[str], Dict[str, str]]The name or the list of the names of variables used as inputs to the function. The number of names should match the number of arguments in the definition of the provided function. When Dict[str, str] is provided, variable names will be mapped to function argument names.
outputs:Union[None, str, List[str], Dict[str, str]]The name or the list of the names of variables used as outputs to the function. The number of names should match the number of outputs returned by the provided function. When Dict[str, str] is provided, variable names will be mapped to the named outputs the function returns.
name:strOptional node name to be used when displaying the node in logs or any other visualisations.
tags:Union[str, Iterable[str]]Optional set of tags to be applied to the node.
confirms:Union[str, List[str]]Optional name or the list of the names of the datasets that should be confirmed. This will result in calling confirm() method of the corresponding data set instance. Specified dataset names do not necessarily need to be present in the node inputs or outputs.
namespace:strOptional node namespace.
Returns
NodeA Node object with mapped inputs, outputs and function.
def pipeline(pipe: Union[Iterable[Union[Node, Pipeline]], Pipeline], *, inputs: Union[str, Set[str], Dict[str, str]] = None, outputs: Union[str, Set[str], Dict[str, str]] = None, parameters: Union[str, Set[str], Dict[str, str]] = None, tags: Union[str, Iterable[str]] = None, namespace: str = None) -> Pipeline: (source)

Create a Pipeline from a collection of nodes and/or ``Pipeline``s.

Parameters
pipe:Union[Iterable[Union[Node, Pipeline]], Pipeline]The nodes the Pipeline will be made of. If you provide pipelines among the list of nodes, those pipelines will be expanded and all their nodes will become part of this new pipeline.
inputs:Union[str, Set[str], Dict[str, str]]A name or collection of input names to be exposed as connection points to other pipelines upstream. This is optional; if not provided, the pipeline inputs are automatically inferred from the pipeline structure. When str or Set[str] is provided, the listed input names will stay the same as they are named in the provided pipeline. When Dict[str, str] is provided, current input names will be mapped to new names. Must only refer to the pipeline's free inputs.
outputs:Union[str, Set[str], Dict[str, str]]A name or collection of names to be exposed as connection points to other pipelines downstream. This is optional; if not provided, the pipeline inputs are automatically inferred from the pipeline structure. When str or Set[str] is provided, the listed output names will stay the same as they are named in the provided pipeline. When Dict[str, str] is provided, current output names will be mapped to new names. Can refer to both the pipeline's free outputs, as well as intermediate results that need to be exposed.
parameters:Union[str, Set[str], Dict[str, str]]A name or collection of parameters to namespace. When str or Set[str] are provided, the listed parameter names will stay the same as they are named in the provided pipeline. When Dict[str, str] is provided, current parameter names will be mapped to new names. The parameters can be specified without the params: prefix.
tags:Union[str, Iterable[str]]Optional set of tags to be applied to all the pipeline nodes.
namespace:strA prefix to give to all dataset names, except those explicitly named with the inputs/outputs arguments, and parameter references (params: and parameters).
Returns
PipelineA new Pipeline object.
Raises
ModularPipelineErrorWhen inputs, outputs or parameters are incorrectly specified, or they do not exist on the original pipeline.
ValueErrorWhen underlying pipeline nodes inputs/outputs are not any of the expected types (str, dict, list, or None).