kedro.extras.datasets.pandas.GBQTableDataSet

class documentation

class GBQTableDataSet(AbstractDataSet[None, pd.DataFrame]): (source)

GBQTableDataSet loads and saves data from/to Google BigQuery. It uses pandas-gbq to read and write from/to BigQuery table.

Example usage for the YAML API:

vehicles:
  type: pandas.GBQTableDataSet
  dataset: big_query_dataset
  table_name: big_query_table
  project: my-project
  credentials: gbq-creds
  load_args:
    reauth: True
  save_args:
    chunk_size: 100

Example usage for the Python API:

>>> from kedro.extras.datasets.pandas import GBQTableDataSet
>>> import pandas as pd
>>>
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>>                      'col3': [5, 6]})
>>>
>>> data_set = GBQTableDataSet('dataset',
>>>                            'table_name',
>>>                            project='my-project')
>>> data_set.save(data)
>>> reloaded = data_set.load()
>>>
>>> assert data.equals(reloaded)

Method	`__init__`	Creates a new instance of `GBQTableDataSet`.
Constant	`DEFAULT_LOAD_ARGS`	Undocumented
Constant	`DEFAULT_SAVE_ARGS`	Undocumented
Method	`_describe`	Undocumented
Method	`_exists`	Undocumented
Method	`_load`	Undocumented
Method	`_save`	Undocumented
Method	`_validate_location`	Undocumented
Instance Variable	`_client`	Undocumented
Instance Variable	`_credentials`	Undocumented
Instance Variable	`_dataset`	Undocumented
Instance Variable	`_load_args`	Undocumented
Instance Variable	`_project_id`	Undocumented
Instance Variable	`_save_args`	Undocumented
Instance Variable	`_table_name`	Undocumented

Inherited from AbstractDataSet:

Class Method	`from_config`	Create a data set instance using the configuration provided.
Method	`__str__`	Undocumented
Method	`exists`	Checks whether a data set's output already exists by calling the provided _exists() method.
Method	`load`	Loads data by delegation to the provided load method.
Method	`release`	Release any cached data.
Method	`save`	Saves data by delegation to the provided save method.
Method	`_copy`	Undocumented
Method	`_release`	Undocumented
Property	`_logger`	Undocumented

def __init__(self, dataset: str, table_name: str, project: str = None, credentials: Union[Dict[str, Any], Credentials] = None, load_args: Dict[str, Any] = None, save_args: Dict[str, Any] = None): (source) ¶

Creates a new instance of GBQTableDataSet.

Parameters
dataset:`str`	Google BigQuery dataset.
table_name:`str`	Google BigQuery table name.
project:`str`	Google BigQuery Account project ID. Optional when available from the environment. https://cloud.google.com/resource-manager/docs/creating-managing-projects
credentials:`Union[Dict[str, Any], Credentials]`	Credentials for accessing Google APIs. Either `google.auth.credentials.Credentials` object or dictionary with parameters required to instantiate `google.oauth2.credentials.Credentials`. Here you can find all the arguments: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.credentials.html
load_args:`Dict[str, Any]`	Pandas options for loading BigQuery table into DataFrame. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html All defaults are preserved.
save_args:`Dict[str, Any]`	Pandas options for saving DataFrame to BigQuery table. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_gbq.html All defaults are preserved, but "progress_bar", which is set to False.
Raises
`DataSetError`	When `load_args['location']` and `save_args['location']` are different.