class documentation
GBQTableDataSet loads and saves data from/to Google BigQuery. It uses pandas-gbq to read and write from/to BigQuery table.
Example usage for the YAML API:
vehicles: type: pandas.GBQTableDataSet dataset: big_query_dataset table_name: big_query_table project: my-project credentials: gbq-creds load_args: reauth: True save_args: chunk_size: 100
Example usage for the Python API:
>>> from kedro.extras.datasets.pandas import GBQTableDataSet >>> import pandas as pd >>> >>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5], >>> 'col3': [5, 6]}) >>> >>> data_set = GBQTableDataSet('dataset', >>> 'table_name', >>> project='my-project') >>> data_set.save(data) >>> reloaded = data_set.load() >>> >>> assert data.equals(reloaded)
Method | __init__ |
Creates a new instance of GBQTableDataSet. |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Method | _describe |
Undocumented |
Method | _exists |
Undocumented |
Method | _load |
Undocumented |
Method | _save |
Undocumented |
Method | _validate |
Undocumented |
Instance Variable | _client |
Undocumented |
Instance Variable | _credentials |
Undocumented |
Instance Variable | _dataset |
Undocumented |
Instance Variable | _load |
Undocumented |
Instance Variable | _project |
Undocumented |
Instance Variable | _save |
Undocumented |
Instance Variable | _table |
Undocumented |
Inherited from AbstractDataSet
:
Class Method | from |
Create a data set instance using the configuration provided. |
Method | __str__ |
Undocumented |
Method | exists |
Checks whether a data set's output already exists by calling the provided _exists() method. |
Method | load |
Loads data by delegation to the provided load method. |
Method | release |
Release any cached data. |
Method | save |
Saves data by delegation to the provided save method. |
Method | _copy |
Undocumented |
Method | _release |
Undocumented |
Property | _logger |
Undocumented |
def __init__(self, dataset:
str
, table_name: str
, project: str
= None, credentials: Union[ Dict[ str, Any], Credentials]
= None, load_args: Dict[ str, Any]
= None, save_args: Dict[ str, Any]
= None):
(source)
¶
Creates a new instance of GBQTableDataSet.
Parameters | |
dataset:str | Google BigQuery dataset. |
tablestr | Google BigQuery table name. |
project:str | Google BigQuery Account project ID. Optional when available from the environment. https://cloud.google.com/resource-manager/docs/creating-managing-projects |
credentials:Union[ | Credentials for accessing Google APIs. Either google.auth.credentials.Credentials object or dictionary with parameters required to instantiate google.oauth2.credentials.Credentials. Here you can find all the arguments: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.credentials.html |
loadDict[ | Pandas options for loading BigQuery table into DataFrame. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html All defaults are preserved. |
saveDict[ | Pandas options for saving DataFrame to BigQuery table. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_gbq.html All defaults are preserved, but "progress_bar", which is set to False. |
Raises | |
DataSetError | When load_args['location'] and save_args['location'] are different. |