class documentation
GBQQueryDataSet loads data from a provided SQL query from Google BigQuery. It uses pandas.read_gbq which itself uses pandas-gbq internally to read from BigQuery table. Therefore it supports all allowed pandas options on read_gbq.
Example adding a catalog entry with the YAML API:
>>> vehicles: >>> type: pandas.GBQQueryDataSet >>> sql: "select shuttle, shuttle_id from spaceflights.shuttles;" >>> project: my-project >>> credentials: gbq-creds >>> load_args: >>> reauth: True
Example using Python API:
>>> from kedro.extras.datasets.pandas import GBQQueryDataSet >>> >>> sql = "SELECT * FROM dataset_1.table_a" >>> >>> data_set = GBQQueryDataSet(sql, project='my-project') >>> >>> sql_data = data_set.load() >>>
Method | __init__ |
Creates a new instance of GBQQueryDataSet. |
Constant | DEFAULT |
Undocumented |
Method | _describe |
Undocumented |
Method | _load |
Undocumented |
Method | _save |
Undocumented |
Instance Variable | _client |
Undocumented |
Instance Variable | _credentials |
Undocumented |
Instance Variable | _filepath |
Undocumented |
Instance Variable | _load |
Undocumented |
Instance Variable | _project |
Undocumented |
Inherited from AbstractDataSet
:
Class Method | from |
Create a data set instance using the configuration provided. |
Method | __str__ |
Undocumented |
Method | exists |
Checks whether a data set's output already exists by calling the provided _exists() method. |
Method | load |
Loads data by delegation to the provided load method. |
Method | release |
Release any cached data. |
Method | save |
Saves data by delegation to the provided save method. |
Method | _copy |
Undocumented |
Method | _exists |
Undocumented |
Method | _release |
Undocumented |
Property | _logger |
Undocumented |
def __init__(self, sql:
str
= None, project: str
= None, credentials: Union[ Dict[ str, Any], Credentials]
= None, load_args: Dict[ str, Any]
= None, fs_args: Dict[ str, Any]
= None, filepath: str
= None):
(source)
¶
Creates a new instance of GBQQueryDataSet.
Parameters | |
sql:str | The sql query statement. |
project:str | Google BigQuery Account project ID. Optional when available from the environment. https://cloud.google.com/resource-manager/docs/creating-managing-projects |
credentials:Union[ | Credentials for accessing Google APIs. Either google.auth.credentials.Credentials object or dictionary with parameters required to instantiate google.oauth2.credentials.Credentials. Here you can find all the arguments: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.credentials.html |
loadDict[ | Pandas options for loading BigQuery table into DataFrame. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html All defaults are preserved. |
fsDict[ | Extra arguments to pass into underlying filesystem class constructor
(e.g. {"project": "my-project"} for GCSFileSystem) used for reading the
SQL query from filepath. |
filepath:str | A path to a file with a sql query statement. |
Raises | |
DataSetError | When sql and filepath parameters are either both empty
or both provided, as well as when the save() method is invoked. |