captest.io.load_data

captest.io.load_data(path, group_columns=<function group_columns>, file_reader=<function file_reader>, skip_dir_load=False, name='meas', sort=True, drop_duplicates=True, reindex=True, site=None, column_groups_template=False, verbose=False, **kwargs)

Load file(s) of timeseries data from SCADA / DAS systems.

This is a convenience function to generate an instance of DataLoader and call the load method.

A single file or multiple files can be loaded. Multiple files will be joined together and may include files with different column headings.

Parameters:

path (str) – Path to either a single file to load or a directory of files to load. Supports local paths and S3 URIs (e.g. s3://bucket/path/).
group_columns (function or str, default columngroups.group_columns) – Function to use to group the columns of the loaded data. Function should accept a DataFrame and return a dictionary with keys that are ids and values that are lists of column names. Will be set to the group_columns attribute of the CapData.DataLoader object. Provide a string to load column grouping from a json, yaml, or excel file. The json or yaml file should parse to a dictionary and the excel file should have two columns with the first column containing the group ids and the second column the column names. The first column may have missing values. See function load_excel_column_groups for more details.
file_reader (function, default io.file_reader) – Function to use to load an individual file. By default will use the built in file_reader function to try to load csv files. If passing a function to read other filetypes, the kwargs should include the filetype extension e.g. ‘parquet’.
skip_dir_load (bool, default False) – Set to True to pass a custom file_reader that handles multiple files. This will skip the parsing of files in a directory by DataLoader.load and allow the function passed to file_reader to handle multiple files in a directory.
name (str) – Identifier that will be assigned to the returned CapData instance.
sort (bool, default True) – By default sorts the data by the datetime index from old to new.
drop_duplicates (bool, default True) – By default drops rows of the joined data where all the columns are duplicates of another row. Keeps the first instance of the duplicated values. This is helpful if individual data files have overlapping rows with the same data.
reindex (bool, default True) – By default will create a new index for the data using the earliest datetime, latest datetime, and the most frequent time interval ensuring there are no missing intervals.
site (dict or str, default None) – Pass a dictionary or path to a json or yaml file containing site data, which will be used to generate modeled clear sky ghi and poa values. The clear sky irradiance values are added to the data and the column_groups attribute is updated to include these two irradiance columns. The site data dictionary should be {sys: {system data}, loc: {location data}}. See the capdata.csky documentation for the format of the system data and location data.
column_groups_template (bool, default False) – If True, will call CapData.data_columns_to_excel to save a file to use to manually create column groupings at path.
verbose (bool, default False) – Set to True to print status of file loading.
**kwargs – Passed to DataLoader.load. Any kwargs not used by DataLoader.load are passed to the file_reader function, which by default passes them to pandas.read_csv. DataLoader.load accepts a summary kwarg to show files loaded from a directory without reindexing status shown when verbose is set to True. Reindexing accepts a report kwarg.