class documentation

class GBQTableDataSet(AbstractDataSet[None, pd.DataFrame]): (source)

View In Hierarchy

GBQTableDataSet loads and saves data from/to Google BigQuery. It uses pandas-gbq to read and write from/to BigQuery table.

Example usage for the YAML API:

vehicles:
  type: pandas.GBQTableDataSet
  dataset: big_query_dataset
  table_name: big_query_table
  project: my-project
  credentials: gbq-creds
  load_args:
    reauth: True
  save_args:
    chunk_size: 100

Example usage for the Python API:

>>> from kedro.extras.datasets.pandas import GBQTableDataSet
>>> import pandas as pd
>>>
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>>                      'col3': [5, 6]})
>>>
>>> data_set = GBQTableDataSet('dataset',
>>>                            'table_name',
>>>                            project='my-project')
>>> data_set.save(data)
>>> reloaded = data_set.load()
>>>
>>> assert data.equals(reloaded)
Method __init__ Creates a new instance of GBQTableDataSet.
Constant DEFAULT_LOAD_ARGS Undocumented
Constant DEFAULT_SAVE_ARGS Undocumented
Method _describe Undocumented
Method _exists Undocumented
Method _load Undocumented
Method _save Undocumented
Method _validate_location Undocumented
Instance Variable _client Undocumented
Instance Variable _credentials Undocumented
Instance Variable _dataset Undocumented
Instance Variable _load_args Undocumented
Instance Variable _project_id Undocumented
Instance Variable _save_args Undocumented
Instance Variable _table_name Undocumented

Inherited from AbstractDataSet:

Class Method from_config Create a data set instance using the configuration provided.
Method __str__ Undocumented
Method exists Checks whether a data set's output already exists by calling the provided _exists() method.
Method load Loads data by delegation to the provided load method.
Method release Release any cached data.
Method save Saves data by delegation to the provided save method.
Method _copy Undocumented
Method _release Undocumented
Property _logger Undocumented
def __init__(self, dataset: str, table_name: str, project: str = None, credentials: Union[Dict[str, Any], Credentials] = None, load_args: Dict[str, Any] = None, save_args: Dict[str, Any] = None): (source)

Creates a new instance of GBQTableDataSet.

Parameters
dataset:strGoogle BigQuery dataset.
table_name:strGoogle BigQuery table name.
project:strGoogle BigQuery Account project ID. Optional when available from the environment. https://cloud.google.com/resource-manager/docs/creating-managing-projects
credentials:Union[Dict[str, Any], Credentials]Credentials for accessing Google APIs. Either google.auth.credentials.Credentials object or dictionary with parameters required to instantiate google.oauth2.credentials.Credentials. Here you can find all the arguments: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.credentials.html
load_args:Dict[str, Any]Pandas options for loading BigQuery table into DataFrame. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html All defaults are preserved.
save_args:Dict[str, Any]Pandas options for saving DataFrame to BigQuery table. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_gbq.html All defaults are preserved, but "progress_bar", which is set to False.
Raises
DataSetErrorWhen load_args['location'] and save_args['location'] are different.
DEFAULT_LOAD_ARGS: Dict[str, Any] = (source)

Undocumented

Value
{}
DEFAULT_SAVE_ARGS: Dict[str, Any] = (source)

Undocumented

Value
{'progress_bar': False}
def _describe(self) -> Dict[str, Any]: (source)

Undocumented

def _exists(self) -> bool: (source)

Undocumented

def _load(self) -> pd.DataFrame: (source)

Undocumented

def _save(self, data: pd.DataFrame): (source)

Undocumented

def _validate_location(self): (source)

Undocumented

Undocumented

_credentials = (source)

Undocumented

_dataset = (source)

Undocumented

_load_args = (source)

Undocumented

_project_id = (source)

Undocumented

_save_args = (source)

Undocumented

_table_name = (source)

Undocumented