kedro.extras.datasets.pandas.GBQQueryDataSet

class documentation

class GBQQueryDataSet(AbstractDataSet[None, pd.DataFrame]): (source)

GBQQueryDataSet loads data from a provided SQL query from Google BigQuery. It uses pandas.read_gbq which itself uses pandas-gbq internally to read from BigQuery table. Therefore it supports all allowed pandas options on read_gbq.

Example adding a catalog entry with the YAML API:

>>> vehicles:
>>>   type: pandas.GBQQueryDataSet
>>>   sql: "select shuttle, shuttle_id from spaceflights.shuttles;"
>>>   project: my-project
>>>   credentials: gbq-creds
>>>   load_args:
>>>     reauth: True

Example using Python API:

>>> from kedro.extras.datasets.pandas import GBQQueryDataSet
>>>
>>> sql = "SELECT * FROM dataset_1.table_a"
>>>
>>> data_set = GBQQueryDataSet(sql, project='my-project')
>>>
>>> sql_data = data_set.load()
>>>

Method	`__init__`	Creates a new instance of `GBQQueryDataSet`.
Constant	`DEFAULT_LOAD_ARGS`	Undocumented
Method	`_describe`	Undocumented
Method	`_load`	Undocumented
Method	`_save`	Undocumented
Instance Variable	`_client`	Undocumented
Instance Variable	`_credentials`	Undocumented
Instance Variable	`_filepath`	Undocumented
Instance Variable	`_load_args`	Undocumented
Instance Variable	`_project_id`	Undocumented

Inherited from AbstractDataSet:

Class Method	`from_config`	Create a data set instance using the configuration provided.
Method	`__str__`	Undocumented
Method	`exists`	Checks whether a data set's output already exists by calling the provided _exists() method.
Method	`load`	Loads data by delegation to the provided load method.
Method	`release`	Release any cached data.
Method	`save`	Saves data by delegation to the provided save method.
Method	`_copy`	Undocumented
Method	`_exists`	Undocumented
Method	`_release`	Undocumented
Property	`_logger`	Undocumented

def __init__(self, sql: str = None, project: str = None, credentials: Union[Dict[str, Any], Credentials] = None, load_args: Dict[str, Any] = None, fs_args: Dict[str, Any] = None, filepath: str = None): (source) ¶

Creates a new instance of GBQQueryDataSet.

Parameters
sql:`str`	The sql query statement.
project:`str`	Google BigQuery Account project ID. Optional when available from the environment. https://cloud.google.com/resource-manager/docs/creating-managing-projects
credentials:`Union[Dict[str, Any], Credentials]`	Credentials for accessing Google APIs. Either `google.auth.credentials.Credentials` object or dictionary with parameters required to instantiate `google.oauth2.credentials.Credentials`. Here you can find all the arguments: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.credentials.html
load_args:`Dict[str, Any]`	Pandas options for loading BigQuery table into DataFrame. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html All defaults are preserved.
fs_args:`Dict[str, Any]`	Extra arguments to pass into underlying filesystem class constructor (e.g. `{"project": "my-project"}` for `GCSFileSystem`) used for reading the SQL query from filepath.
filepath:`str`	A path to a file with a sql query statement.
Raises
`DataSetError`	When `sql` and `filepath` parameters are either both empty or both provided, as well as when the `save()` method is invoked.