class documentation

class GBQQueryDataSet(AbstractDataSet[None, pd.DataFrame]): (source)

View In Hierarchy

GBQQueryDataSet loads data from a provided SQL query from Google BigQuery. It uses pandas.read_gbq which itself uses pandas-gbq internally to read from BigQuery table. Therefore it supports all allowed pandas options on read_gbq.

Example adding a catalog entry with the YAML API:

>>> vehicles:
>>>   type: pandas.GBQQueryDataSet
>>>   sql: "select shuttle, shuttle_id from spaceflights.shuttles;"
>>>   project: my-project
>>>   credentials: gbq-creds
>>>   load_args:
>>>     reauth: True

Example using Python API:

>>> from kedro.extras.datasets.pandas import GBQQueryDataSet
>>>
>>> sql = "SELECT * FROM dataset_1.table_a"
>>>
>>> data_set = GBQQueryDataSet(sql, project='my-project')
>>>
>>> sql_data = data_set.load()
>>>
Method __init__ Creates a new instance of GBQQueryDataSet.
Constant DEFAULT_LOAD_ARGS Undocumented
Method _describe Undocumented
Method _load Undocumented
Method _save Undocumented
Instance Variable _client Undocumented
Instance Variable _credentials Undocumented
Instance Variable _filepath Undocumented
Instance Variable _load_args Undocumented
Instance Variable _project_id Undocumented

Inherited from AbstractDataSet:

Class Method from_config Create a data set instance using the configuration provided.
Method __str__ Undocumented
Method exists Checks whether a data set's output already exists by calling the provided _exists() method.
Method load Loads data by delegation to the provided load method.
Method release Release any cached data.
Method save Saves data by delegation to the provided save method.
Method _copy Undocumented
Method _exists Undocumented
Method _release Undocumented
Property _logger Undocumented
def __init__(self, sql: str = None, project: str = None, credentials: Union[Dict[str, Any], Credentials] = None, load_args: Dict[str, Any] = None, fs_args: Dict[str, Any] = None, filepath: str = None): (source)

Creates a new instance of GBQQueryDataSet.

Parameters
sql:strThe sql query statement.
project:strGoogle BigQuery Account project ID. Optional when available from the environment. https://cloud.google.com/resource-manager/docs/creating-managing-projects
credentials:Union[Dict[str, Any], Credentials]Credentials for accessing Google APIs. Either google.auth.credentials.Credentials object or dictionary with parameters required to instantiate google.oauth2.credentials.Credentials. Here you can find all the arguments: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.credentials.html
load_args:Dict[str, Any]Pandas options for loading BigQuery table into DataFrame. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html All defaults are preserved.
fs_args:Dict[str, Any]Extra arguments to pass into underlying filesystem class constructor (e.g. {"project": "my-project"} for GCSFileSystem) used for reading the SQL query from filepath.
filepath:strA path to a file with a sql query statement.
Raises
DataSetErrorWhen sql and filepath parameters are either both empty or both provided, as well as when the save() method is invoked.
DEFAULT_LOAD_ARGS: Dict[str, Any] = (source)

Undocumented

Value
{}
def _describe(self) -> Dict[str, Any]: (source)

Undocumented

def _load(self) -> pd.DataFrame: (source)

Undocumented

def _save(self, data: None) -> NoReturn: (source)

Undocumented

Undocumented

_credentials = (source)

Undocumented

_filepath = (source)

Undocumented

_load_args = (source)

Undocumented

_project_id = (source)

Undocumented