captest package
Submodules
captest.capdata module
Provides the CapData class and supporting functions.
The CapData class provides methods for loading, filtering, and regressing
solar data. A capacity test following the ASTM E2848 standard is orchestrated
by captest.CapTest, which binds a measured and a modeled CapData
instance together and exposes the cross-CapData comparison methods
(captest_results, get_summary, overlay_scatters,
residual_plot, determine_pass_or_fail).
- class captest.capdata.CapData(name)
Bases:
objectClass to store capacity test data and column grouping.
CapData objects store a pandas dataframe of measured or simulated data and a dictionary grouping columns by type of measurement.
The column_groups dictionary allows maintaining the original column names while also grouping measurements of the same type from different sensors. Many of the methods for plotting and filtering data rely on the column groupings.
- Parameters:
name (str) – Name for the CapData object.
data (pandas dataframe) – Used to store measured or simulated data imported from csv.
data_filtered (pandas dataframe) – Holds filtered data. Filtering methods act on and write to this attribute.
column_groups (dictionary) – Assigned by the group_columns method, which attempts to infer the type of measurement recorded in each column of the dataframe stored in the data attribute. For each inferred measurement type, group_columns creates an abbreviated name and a list of columns that contain measurements of that type. The abbreviated names are the keys and the corresponding values are the lists of columns.
regression_cols (dictionary) – Dictionary identifying which columns in data or groups of columns as identified by the keys of column_groups are the independent variables of the ASTM Capacity test regression equation. Set using set_regression_cols or by directly assigning a dictionary.
summary_ix (list of tuples) – Holds the row index data modified by the update_summary decorator function.
summary (list of dicts) – Holds the data modified by the update_summary decorator function.
rc (DataFrame) – Dataframe for the reporting conditions (poa, t_amb, and w_vel).
regression_results (statsmodels linear regression model) – Holds the linear regression model object.
regression_formula (str) – Regression formula to be fit to measured and simulated data. Must follow the requirements of statsmodels use of patsy.
tolerance (str) – String representing error band. Ex. ‘+ 3’, ‘+/- 3’, ‘- 5’ There must be space between the sign and number. Number is interpreted as a percent. For example, 5 percent is 5 not 0.05.
- agg_group(group_id, agg_func, verbose=True, rename_map=None, inplace=True, cutoff=10, columns=None)
Aggregate columns in a group.
- Parameters:
group_id (str) – Key from column_groups attribute.
agg_func (str or callable) – Aggregation function to apply.
verbose (bool, default True) – Set to True to print the columns that have been aggregated, the aggregation function used, and the new column name.
cutoff (int, default 10) – Maximum number of columns to list individually when
verbose=True. When the group contains more columns than this value, the first three and last three column names are printed with an ellipsis in between. Increase this value to see more columns listed individually.columns (pd.DataFrame or None, default None) – Pre-fetched DataFrame of columns to aggregate. When provided the lookup via
self._get_groupis skipped. Intended for internal use byagg_sensorsto avoid a redundant lookup.
- agg_sensors(agg_map=None, verbose=False)
Aggregate measurments of the same variable from different sensors.
- Parameters:
agg_map (dict, default None) – Dictionary specifying aggregations to be performed on the specified groups from the column_groups attribute. The dictionary keys should be keys from the column_gruops attribute. The dictionary values should be aggregation functions. See pandas API documentation of Computations / descriptive statistics for a list of all options. By default the groups of columns assigned to the ‘power’, ‘poa’, ‘t_amb’, and ‘w_vel’ keys in the regression_cols attribute are aggregated: - sum power - mean of poa, t_amb, w_vel
verbose (bool, default False) – Set to True to print the columns that have been aggregated, the aggregation function used, and the new column name. If the group being aggregated has more than 10 columns, only the group name will be printed.
- Returns:
Acts in place on the data, data_filtered, and regression_cols attributes.
- Return type:
None
Notes
This method is intended to be used before any filtering methods are applied. Filtering steps applied when this method is used will be lost.
This method modifies the data, data_filtered, and regression_cols attributes.
- column_groups_to_excel(save_to='./column_groups.xlsx')
Export the column groups attribute to an excel file.
- Parameters:
save_to (str) – File path to save column groups to. Should include .xlsx.
- copy()
Create and returns a copy of self.
- create_agg_attributes()
Create callable attributes for each aggregated column that return data views.
For each column in self.column_groups[‘agg’], creates an attribute on the instance that when called returns a view of the data for that column group using the loc indexer functionality.
- create_column_group_attributes()
Create callable attributes for each column group that return data views.
For each key in self.column_groups, creates an attribute on the instance that when called returns a view of the data for that column group using the loc indexer functionality.
- custom_param(func, *args, **kwargs)
Applies the function func with kwargs and adds result as new column to data.
Calculates and adds a new column to data using the function func with the provided arguments and keyword arguments. See the functions in the calcparams module for examples.
Called by util.process_reg_cols to add new columns to the data attribute while recursively processing and updating the regression_cols attribute.
- Parameters:
func (function) – Function that takes a DataFrame as its first argument and returns a Series.
- Returns:
Adds a new column to the data attribute.
- Return type:
None
- data_columns_to_excel(sort_by_reversed_names=True)
Write the columns of data to an excel file as a template for a column grouping.
- Parameters:
sort_by_inverted_names (bool, default False) – If true sort column names after reversing them.
- Returns:
Writes to excel file at self.data_loader.path / ‘column_groups.xlsx’.
- Return type:
None
- drop_cols(columns)
Drop columns from CapData data, data_filtered, and column_groups.
- Parameters:
columns (str or list) – Column name or list of column names to drop.
- empty()
Return a boolean indicating if the CapData object contains data.
- expand_agg_map(agg_map)
Traverses, expands, and sorts the agg_map.
If a value of agg_map is a dictionary, the items in that dictionary are added to the returned expanded agg_map at the top level. Also, the following steps are completed to aggregate the subgroups: - The column_groups attribute is updated to add a new group with the aggregated columns from the subgroups. - This new group is added to the expanded returned agg_map after the subgroup aggregations. - The resulting aggregation of the subgroups is renamed.
For example, given the following agg_map: ```python agg_map = {
‘irr_ghi’: ‘mean’, ‘irr_poa’: {
‘irr_poa_met1’: ‘mean’, ‘irr_poa_met2’: ‘mean’
},
}
The returned expanded agg_map would be: ```python agg_map = {
‘irr_ghi’: ‘mean’, ‘irr_poa_met1’: ‘mean’, ‘irr_poa_met2’: ‘mean’, ‘irr_poa_aggs’: ‘mean’,
}
and the column_groups attribute would be updated to add the group: ‘irr_poa_aggs’: [‘irr_poa_met1_mean_agg’, ‘irr_poa_met2_mean_agg’]
The column resulting from aggregating the “irr_poa_aggs” group would be “irr_poa_aggs_mean_agg”, which is renamed to “irr_poa_mean_agg”.
- param agg_map:
Dictionary specifying aggregations to be performed on the specified groups from the column_groups attribute.
- type agg_map:
dict
- returns:
agg_map
- rtype:
dict
- expanded_uncert(grp_to_term, k=1.96)
Calculate expanded uncertainty of the predicted power.
Adds instrument uncertainty and spatial uncertainty in quadrature and passes the result through the regression to calculate the Systematic Standard Uncertainty, which is then added in quadrature with the Random Standard Uncertainty of the regression and multiplied by the k factor, k.
1. Combine by adding in quadrature the spatial and instrument uncertainties for each measurand. 2. Add the absolute uncertainties from step 1 to each of the respective reporting conditions to determine a value for the reporting condition plus the uncertainty. 3. Calculate the predicted power using the RCs plus uncertainty three times i.e. calculate for each RC plus uncertainty. For example, to estimate the impact of the uncertainty of the reporting irradiance one would calculate expected power using the irradiance RC plus irradiance uncertainty at the reporting irradiance and the original temperature and wind reporting conditions that have not had any uncertainty added to them. 6. Calculate the percent difference between the three new expected power values that include uncertainty of the RCs and the expected power with the unmodified RC. 7. Take the square root of the sum of the squares of those three percent differences to obtain the Systematic Standard Uncertainty (bY).
Expects CapData to have a instrument_uncert and spatial_uncerts attributes with matching keys.
- Parameters:
grp_to_term (dict) – Map the groups of measurement types to the term in the regression formula that was regressed against an aggregated value (typically mean) from that group.
k (numeric) – Coverage factor.
- Return type:
Expanded uncertainty as a decimal value.
- filter_clearsky(ghi_col=None, inplace=True, keep_clear=True, **kwargs)
Use pvlib detect_clearsky to remove periods with unstable irradiance.
The pvlib detect_clearsky function compares modeled clear sky ghi against measured clear sky ghi to detect periods of clear sky. Refer to the pvlib documentation for additional information.
By default uses data identified by the column_groups dictionary as ghi and modeled ghi. Issues warning if there is no modeled ghi data, or the measured ghi data has not been aggregated.
- Parameters:
ghi_col (str, default None) – The name of a column name of measured GHI data. Overrides default attempt to automatically identify a column of GHI data.
inplace (bool, default True) – When true removes periods with unstable irradiance. When false returns pvlib detect_clearsky results, which by default is a series of booleans.
keep_clear (bool, default True) – Set to False to keep cloudy periods.
**kwargs – Passed to pvlib detect_clearsky. By default infer_limits is set to True, which automatically determines appropriate thresholds (including window length) based on the data’s sample interval. Pass infer_limits=False and window_length=<int> to manually control the detection parameters. See pvlib documentation for all available parameters.
- filter_custom(func, *args, **kwargs)
Apply update_summary decorator to passed function.
- Parameters:
func (function) – Any function that takes a dataframe as the first argument and returns a dataframe. Many pandas dataframe methods meet this requirement, like pd.DataFrame.between_time.
*args – Additional positional arguments passed to func.
**kwds – Additional keyword arguments passed to func.
Examples
Example use of the pandas dropna method to remove rows with missing data.
>>> das.custom_filter(pd.DataFrame.dropna, axis=0, how='any') >>> summary = das.get_summary() >>> summary['pts_before_filter'][0] 1424 >>> summary['pts_removed'][0] 16
Example use of the pandas between_time method to remove time periods.
>>> das.reset_filter() >>> das.custom_filter(pd.DataFrame.between_time, '9:00', '13:00') >>> summary = das.get_summary() >>> summary['pts_before_filter'][0] 245 >>> summary['pts_removed'][0] 1195 >>> das.data_filtered.index[0].hour 9 >>> das.data_filtered.index[-1].hour 13
- filter_days(days, drop=False, inplace=True)
Select or drop timestamps for days passed.
- Parameters:
days (list) – List of days to select or drop.
drop (bool, default False) – Set to true to drop the timestamps for the days passed instead of keeping only those days.
inplace (bool, default True) – If inplace is true, then function overwrites the filtered dataframe. If false returns a DataFrame.
- filter_irr(low, high, ref_val=None, col_name=None, inplace=True)
Filter on irradiance values.
- Parameters:
low (float or int) – Minimum value as fraction (0.8) or absolute 200 (W/m^2).
high (float or int) – Max value as fraction (1.2) or absolute 800 (W/m^2).
ref_val (float or int or 'rep_irr') – Must provide arg when low and high are fractions. Pass
'rep_irr'to use the reporting irradiance fromself.rc(set by callingrep_cond()first).col_name (str, default None) – Column name of irradiance data to filter. By default uses the POA irradiance set in regression_cols attribute or average of the POA columns.
inplace (bool, default True) – Default true write back to data_filtered or return filtered dataframe.
- Returns:
Filtered dataframe if inplace is False.
- Return type:
DataFrame
- filter_missing(columns=None)
Removes any rows where the regression columns contain missing data (NaNs).
- Parameters:
columns (list, default None) – Subset of columns to apply dropna. By default uses the regression columns identified in the regression_cols attribute.
- Returns:
Modifies data_filtered attribute.
- Return type:
None
- filter_op_state(op_state, mult_inv=None, inplace=True)
NOT CURRENTLY IMPLEMENTED - Filter on inverter operation state.
This filter is rarely useful in practice, but will be re-implemented if requested.
- Parameters:
data (str) – ‘sim’ or ‘das’ determines if filter is on sim or das data
op_state (int) – integer inverter operating state to keep
mult_inv (list of tuples, [(start, stop, op_state), ...]) – List of tuples where start is the first column of an type of inverter, stop is the last column and op_state is the operating state for the inverter type.
inplace (bool, default True) – When True writes over current filtered dataframe. When False returns CapData object.
- Returns:
Returns filtered CapData object when inplace is False.
- Return type:
- filter_outliers(inplace=True, **kwargs)
Apply eliptic envelope from scikit-learn to remove outliers.
- Parameters:
inplace (bool) – Default of true writes filtered dataframe back to data_filtered attribute.
**kwargs – Passed to sklearn EllipticEnvelope. Contamination keyword is useful to adjust proportion of outliers in dataset. Default is 0.04.
- filter_pf(pf, inplace=True)
Filter data on the power factor.
- Parameters:
pf (float) – 0.999 or similar to remove timestamps with lower power factor values. Values greater than or equal to pf are kept.
inplace (bool) – Default of true writes filtered dataframe back to data_filtered attribute.
- Return type:
Dataframe when inplace is False.
- filter_power(power, percent=None, columns=None, inplace=True)
Remove data above the specified power threshold.
- Parameters:
power (numeric) – If percent is none, all data equal to or greater than power is removed. If percent is not None, then power should be the nameplate power.
percent (None, or numeric, default None) – Data greater than or equal to percent of power is removed. Specify percentage as decimal i.e. 1% is passed as 0.01.
columns (None or str, default None) – By default filter is applied to the power data identified in the regression_cols attribute. Pass a column name or column group to filter on. When passing a column group the power filter is applied to each column in the group.
inplace (bool, default True) – Default of true writes filtered dataframe back to data_filtered attribute.
- Return type:
Dataframe when inplace is false.
- filter_pvsyst(inplace=True)
Filter pvsyst data for off max power point tracking operation.
This function is only applicable to simulated data generated by PVsyst. Filters the ‘IL Pmin’, IL Vmin’, ‘IL Pmax’, ‘IL Vmax’ values if they are greater than 0.
- Parameters:
inplace (bool, default True) – If inplace is true, then function overwrites the filtered data. If false returns a CapData object.
- Return type:
CapData object if inplace is set to False.
- filter_sensors(perc_diff=None, inplace=True, row_filter=<function check_all_perc_diff_comb>)
Drop suspicious measurments by comparing values from different sensors.
This method ignores columns generated by the agg_sensors method.
- Parameters:
perc_diff (dict) – Dictionary to specify a different threshold for each group of sensors. Dictionary keys should be translation dictionary keys and values are floats, like {‘irr-poa-’: 0.05}. By default the poa sensors as set by the regression_cols dictionary are filtered with a 5% percent difference threshold.
inplace (bool, default True) – If True, writes over current filtered dataframe. If False, returns CapData object.
- Returns:
Returns filtered dataframe if inplace is False.
- Return type:
DataFrame
- filter_shade(fshdbm=1.0, query_str=None, inplace=True)
Remove data during periods of array shading.
The default behavior assumes the filter is applied to data output from PVsyst and removes all periods where values in the column ‘FShdBm’ are less than 1.0.
Use the query_str parameter when shading losses (power) rather than a shading fraction are available.
- Parameters:
fshdbm (float, default 1.0) – The value for fractional shading of beam irradiance as given by the PVsyst output parameter FShdBm. Data is removed when the shading fraction is less than the value passed to fshdbm. By default all periods of shading are removed.
query_str (str) – Query string to pass to pd.DataFrame.query method. The query string should be a boolean expression comparing a column name to a numeric filter value, like ‘ShdLoss<=50’. The column name must not contain spaces.
inplace (bool, default True) – If inplace is true, then function overwrites the filtered dataframe. If false returns a DataFrame.
- Returns:
If inplace is false returns a dataframe.
- Return type:
pd.DataFrame
- filter_time(start=None, end=None, drop=False, days=None, test_date=None, inplace=True, wrap_year=False)
Select data for a specified time period.
- Parameters:
start (str or pd.Timestamp or None, default None) – Start date for data to be returned. If a string is passed it must be in format that can be converted by pandas.to_datetime. Not required if test_date and days arguments are passed. If not provided and days is also not provided, defaults to the first timestamp in data_filtered.
end (str or pd.Timestamp or None, default None) – End date for data to be returned. If a string is passed it must be in format that can be converted by pandas.to_datetime. Not required if test_date and days arguments are passed. If not provided and days is also not provided, defaults to the last timestamp in data_filtered.
drop (bool, default False) – Set to true to drop time period between start and end rather than keep it. Must supply start and end and wrap_year must be false.
days (int or None, default None) – Days in time period to be returned. Not required if start and end are specified.
test_date (str or pd.Timestamp or None, default None) – Must be format that can be converted by pandas.to_datetime. Not required if start and end are specified. Requires days argument. Time period returned will be centered on this date.
inplace (bool, default True) – If inplace is true, then function overwrites the filtered dataframe. If false returns a DataFrame.
wrap_year (bool, default False) – If true calls the wrap_year_end function. See wrap_year_end docstring for details. wrap_year_end was cntg_eoy prior to v0.7.0.
- fit_regression(filter=False, inplace=True, summary=True)
Perform a regression with statsmodels on filtered data.
- Parameters:
filter (bool, default False) – When true removes timestamps where the residuals are greater than two standard deviations. When false just calcualtes ordinary least squares regression.
inplace (bool, default True) – If filter is true and inplace is true, then function overwrites the filtered data for sim or das. If false returns a CapData object.
summary (bool, default True) – Set to false to not print regression summary.
- Returns:
Returns a filtered CapData object if filter is True and inplace is False.
- Return type:
- get_filtering_table()
Returns DataFrame showing which filter removed each filtered time interval.
Time intervals removed are marked with a “1”. Time intervals kept are marked with a “0”. Time intervals removed by a previous filter are np.nan/blank. Columns/filters are in order they are run from left to right. The last column labeled “all_filters” shows is True for intervals that were not removed by any of the filters.
- get_length_test_period()
Get length of test period.
Uses length of data unless filter_time has been run, then uses length of the kept data after filter_time was run the first time. Subsequent uses of filter_time are ignored.
Rounds up to a period of full days.
- Returns:
Days in test period.
- Return type:
int
- get_pts_required(hrs_req=12.5)
Set number of data points required for complete test attribute.
- Parameters:
hrs_req (numeric, default 12.5) – Number of hours to be represented by final filtered test data set. Default of 12.5 hours is dictated by ASTM E2848 and corresponds to 750 1-minute data points, 150 5-minute, or 50 15-minute points.
- get_reg_cols(reg_vars=None, filtered_data=True)
Get regression columns renamed with keys from regression_cols.
- Parameters:
reg_vars (list or str, default None) – By default returns all columns identified in regression_cols. A list with any combination of the keys of regression_cols is valid or pass a single key as a string.
filtered_data (bool, default true) – Return filtered or unfiltered data.
- Return type:
DataFrame
- get_summary()
Print a summary of filtering applied to the data_filtered attribute.
The summary dataframe shows the history of the filtering steps applied to the data including the timestamps remaining after each step, the timestamps removed by each step and the arguments used to call each filtering method.
If the filter arguments are cutoff, the max column width can be increased by setting pd.options.display.max_colwidth.
- Parameters:
None –
- Return type:
Pandas DataFrame
- plot(combine={'ghi_csky': '(?=.*ghi)(?=.*irr)', 'inv_sum_mtr_pwr': ['(?=.*real)(?=.*pwr)(?=.*mtr)', '(?=.*pwr)(?=.*agg)'], 'poa_csky': '(?=.*poa)(?=.*irr)', 'poa_ghi': 'irr.*(poa|ghi)$', 'temp_amb_bom': '(?=.*temp)((?=.*amb)|(?=.*bom))'}, default_groups=['inv_sum_mtr_pwr', '(?=.*real)(?=.*pwr)(?=.*inv)', '(?=.*real)(?=.*pwr)(?=.*mtr)', 'poa_ghi', 'poa_csky', 'ghi_csky', 'temp_amb_bom'], width=1500, height=250, plot_defaults_path=None, **kwargs)
Create a dashboard to explore timeseries plots of the data.
The dashboard contains three tabs: Groups, Layout, and Overlay. The first tab, Groups, presents a column of plots with a separate plot overlaying the measurements for each group of the column_groups. The groups plotted are defined by the default_groups argument.
The second tab, Layout, allows manually selecting groups to plot. The button on this tab can be used to replace the column of plots on the Groups tab with the current figure on the Layout tab. Rerun this method after clicking the button to see the new plots in the Groups tab.
The third tab, Overlay, allows picking a group or any combination of individual tags to overlay on a single plot. The list of groups and tags can be filtered using regular expressions. Adding a text id in the box and clicking Update will add the current overlay to the list of groups on the Layout tab.
NOTE: If a plot defaults JSON file exists in the current working directory, the default groups will be read from that file. The file is named
plot_defaults_{self.name}.jsonto avoid conflicts when multiple CapData objects are used in the same session. Columns in the file that are no longer present in the data are ignored with a warning.- Parameters:
combine (dict, optional) – Dictionary of group names and regex strings to use to identify groups from column groups and individual tags (columns) to combine into new groups. See the parse_combine function for more details.
default_groups (list of str, optional) – List of regex strings to use to identify default groups to plot. See the plotting.find_default_groups function for more details.
width (int, optional) – The width of the plots on the Groups tab.
height (int, optional) – The height of the plots on the Groups tab.
plot_defaults_path (str or Path, optional) – Path to the plot defaults JSON file. Overrides the default naming scheme. When None, defaults to
./plot_defaults_{self.name}.json.**kwargs (optional) – Additional keyword arguments are passed to the options of the scatter plot.
- Return type:
Panel tabbed layout
- predict_capacities(irr_filter=True, percent_filter=20, **kwargs)
Calculate expected capacities.
- Parameters:
irr_filter (bool, default True) – When true will filter each group of data by a percentage around the reporting irradiance for that group. The data groups are determined from the reporting irradiance attribute.
percent_filter (float or int or tuple, default 20) – Percentage or tuple of percentages used to filter each time-period group of data around the group’s reporting irradiance. Tuple option allows specifying different percentage for below and above the reporting irradiance: (below, above).
**kwargs – NOTE: Should match kwargs used to calculate reporting conditions. Passed to filter_grps which passes on to pandas Grouper to control label and closed side of intervals. See pandas Grouper doucmentation for details. Default is left labeled and left closed.
- print_points_summary(hrs_req=12.5)
print summary data on the number of points collected.
- process_regression_columns(verbose=True)
Walk the regression column dictionary and calculate parameters.
See util.process_reg_cols for additional documentation.
- Parameters:
verbose (bool, default True) – By default prints summary of aggregations and parameter calculations performed while traversing the regression_cols dictionary. Set to False to prevent all output.
- reg_scatter_matrix()
Create pandas scatter matrix of regression variables.
- rename_cols(column_map)
Rename columns in data, data_filtered, and column_groups.
- Parameters:
column_map (dict) – Dictionary mapping old column names to new column names.
- rep_cond(irr_bal=False, percent_filter=20, front_poa='poa', w_vel=None, func=None, rc_kwargs={})
Calculate reporting conditions for the current regression formula.
The calculation is formula-agnostic: the right-hand-side variables of
self.regression_formuladrive which columns are aggregated. Always writes the result toself.rc.- Parameters:
irr_bal (bool, default False) – If True, uses ReportingIrradiance to determine the reporting irradiance (
front_poa). When True, the other reporting conditions are aggregated from the subset of data within the balanced irradiance band.percent_filter (int, default 20) – Percentage used to define the irradiance band around the reporting irradiance when
irr_balis True. Has no effect whenirr_balis False.front_poa (str, default 'poa') – Key in
self.regression_colswhose column is used as the irradiance driver whenirr_balis True.w_vel (numeric or None) – If not None, overrides the calculated wind speed reporting condition with this value.
func (dict, str, callable, or None, default None) – Passed to
df.agg(...). A dict maps rhs variable names to aggregation functions (e.g.{'poa': perc_wrap(60), 't_amb': 'mean'}). When None, defaults to{var: 'mean' for var in rhs}whererhsis derived fromself.regression_formula.rc_kwargs (dict) – Passed to
ReportingIrradiancewhenirr_balis True.
- Returns:
Reporting conditions are stored on
self.rcas a one-row DataFrame. Userep_cond_freqfor seasonal/monthly outputs.- Return type:
None
- rep_cond_freq(irr_bal=False, percent_filter=20, front_poa='poa', w_vel=None, inplace=True, func=None, freq=None, grouper_kwargs={}, rc_kwargs={})
Calculate frequency-grouped reporting conditions.
Like
rep_condbut aggregates within groups defined byfreq(e.g.'MS'for month-start,'60D'for 60-day). Used for seasonal or monthly reporting tests.- Parameters:
irr_bal (bool, default False) – See
rep_cond.percent_filter (int, default 20) – See
rep_cond.front_poa (str, default 'poa') – See
rep_cond.w_vel (numeric or None) – See
rep_cond.inplace (bool, default True) – When True writes the multi-row RC DataFrame to
self.rc; when False returns the DataFrame.func (dict, str, callable, or None, default None) – See
rep_cond.freq (str or None) – Pandas offset alias.
Nonefalls back to single-rowrep_condbehavior.grouper_kwargs (dict) – Passed to
pandas.Grouper.rc_kwargs (dict) – Passed to
ReportingIrradiancewhenirr_balis True.
- Returns:
Multi-row DataFrame of per-group reporting conditions when
inplace=False. Otherwise stores onself.rcand returnsNone.- Return type:
DataFrame or None
- reset_agg()
Remove aggregation columns from data and data_filtered attributes.
Does not reset filtering of data or data_filtered.
- reset_filter()
Set data_filtered to data and reset filtering summary.
- Parameters:
data (str) – ‘sim’ or ‘das’ determines if filter is on sim or das data.
- review_column_groups()
Print column_groups with nice formatting.
- scatter(filtered=True)
Create a matplotlib scatter plot of regression lhs vs. first rhs var.
Formula-agnostic: resolves the x and y columns from
self.regression_formulaviautil.parse_regression_formula.- Parameters:
filtered (bool, default True) – Plots filtered data when True and all data when False.
Notes
Prefer
CapTest.scatter_plotsfor non-default regression presets; it picks the right callable fromTEST_SETUPS(single or multi- panel) automatically.
- scatter_filters()
Returns an overlay of scatter plots of intervals removed for each filter.
A scatter plot of power vs irradiance is generated for the time intervals removed for each filtering step. Each of these plots is labeled and overlayed.
- scatter_hv(timeseries=False, all_reg_columns=False)
Create a holoviews scatter plot of regression lhs vs. first rhs var.
Formula-agnostic thin wrapper around
captest.captest.scatter_default(with additional timeseries-overlay support, which scatter_default does not provide). For non-default regression presets preferCapTest.scatter_plotswhich picks the right callable (single or multi-panel) fromTEST_SETUPS.- Parameters:
timeseries (bool, default False) – If True, returns a layout with the scatter plot and a linked timeseries plot of the lhs variable. Selecting points in the scatter highlights them in the timeseries.
all_reg_columns (bool, default False) – If True, includes every regression column in the scatter plot’s hover tooltip in addition to the x and y variables.
- set_regression_cols(power='', poa='', t_amb='', w_vel='')
Create a dictionary linking the regression variables to data.
As of v0.15.0 prefer using a predefined test setup that includes a regression column dictionary or assigning a dictionary to the regression_cols attribute directly.
Links the independent regression variables to the appropriate translation keys or a column name may be used to specify a single column of data.
Sets attribute and returns nothing.
- Parameters:
power (str) – Translation key for the power variable.
poa (str) – Translation key for the plane of array (poa) irradiance variable.
t_amb (str) – Translation key for the ambient temperature variable.
w_vel (str) – Translation key for the wind velocity key.
- set_test_complete(pts_required)
Sets test_complete attribute.
- Parameters:
pts_required (int) – Number of points required to remain after filtering for a complete test.
- spatial_uncert(column_groups)
Spatial uncertainties of the independent regression variables.
- Parameters:
column_groups (list) – Measurement groups to calculate spatial uncertainty.
- Return type:
None, stores dictionary of spatial uncertainties as an attribute.
- timeseries_filters()
Returns an overlay of scatter plots of intervals removed for each filter.
A scatter plot of power vs irradiance is generated for the time intervals removed for each filtering step. Each of these plots is labeled and overlayed.
- uncertainty()
Calculate random standard uncertainty of the regression.
(SEE times the square root of the leverage of the reporting conditions).
Not fully implemented yet. Need to review and determine what actual variable should be.
- class captest.capdata.FilteredLocIndexer(_capdata)
Bases:
objectClass to implement __getitem__ for indexing the CapData.data_filtered dataframe.
Allows passing a column_groups key, a list of column_groups keys, or a column or list of columns of the CapData.data_filtered dataframe.
- class captest.capdata.LocIndexer(_capdata)
Bases:
objectClass to implement __getitem__ for indexing the CapData.data dataframe.
Allows passing a column_groups key, a list of column_groups keys, or a column or list of columns of the CapData.data dataframe.
- class captest.capdata.ReportingIrradiance(df, irr_col, **param)
Bases:
Parameterized- dashboard()
- df = None
- get_rep_irr()
Calculates the reporting irradiance.
- Returns:
Float reporting irradiance and filtered dataframe.
- Return type:
Tuple
- irr_col = 'GlobInc'
- irr_rc = 0.0
- max_percent_above = 60
- max_ref_irradiance = None
- min_percent_below = 40
- min_ref_irradiance = None
- name = 'ReportingIrradiance'
- percent_band = 20
- plot()
- poa_flt = None
- points_required = 750
- rc_irr_60th_perc = 0.0
- save_csv(output_csv_path)
Save possible reporting irradiance data to csv file at given path.
- save_plot(output_plot_path=None)
Save a plot of the possible reporting irradiances and time intervals.
Saves plot as an html file at path given.
- output_plot_pathstr or Path
Path to save plot to.
- total_pts = 0.0
- captest.capdata.abs_diff_from_average(series, threshold)
Check each value in series <= average of other values.
Drops NaNs from series before calculating difference from average for each value.
Returns True if there is only one value in the series.
- Parameters:
series (pd.Series) – Pandas series of values to check.
threshold (numeric) – Threshold value for absolute difference from average.
- Return type:
bool
- captest.capdata.check_all_perc_diff_comb(series, perc_diff)
Check series for pairs of values with percent difference above perc_diff.
Calculates the percent difference between all combinations of two values in the passed series and checks if all of them are below the passed perc_diff.
- Parameters:
series (pd.Series) – Pandas series of values to check.
perc_diff (float) – Percent difference threshold value as decimal i.e. 5% is 0.05.
- Return type:
bool
- captest.capdata.csky(time_source, loc=None, sys=None, concat=True, output='both')
Calculate clear sky poa and ghi.
- Parameters:
time_source (dataframe or DatetimeIndex) – If passing a dataframe the index of the dataframe will be used. If the index does not have a timezone the timezone will be set using the timezone in the passed loc dictionary. If passing a DatetimeIndex with a timezone it will be returned directly. If passing a DatetimeIndex without a timezone the timezone in the timezone dictionary will be used.
loc (dict) –
Dictionary of values required to instantiate a pvlib Location object.
- loc = {‘latitude’: float,
’longitude’: float, ‘altitude’: float/int, ‘tz’: str, int, float, default ‘UTC’}
See http://en.wikipedia.org/wiki/List_of_tz_database_time_zones for a list of valid time zones. ints and floats must be in hours from UTC.
sys (dict) –
Dictionary of keywords required to create a pvlib
SingleAxisTrackerMountorFixedMount.Example dictionaries:
- fixed_sys = {‘surface_tilt’: 20,
’surface_azimuth’: 180, ‘albedo’: 0.2}
- tracker_sys1 = {‘axis_tilt’: 0, ‘axis_azimuth’: 0,
’max_angle’: 90, ‘backtrack’: True, ‘gcr’: 0.2, ‘albedo’: 0.2}
Refer to pvlib documentation for details.
concat (bool, default True) – If concat is True then returns columns as defined by return argument added to passed dataframe, otherwise returns just clear sky data.
output (str, default 'both') – both - returns only total poa and ghi poa_all - returns all components of poa ghi_all - returns all components of ghi all - returns all components of poa and ghi
- captest.capdata.filter_grps(grps, rcs, irr_col, low, high, freq, **kwargs)
Apply irradiance filter around passsed reporting irradiances to groupby.
For each group in the grps argument the irradiance is filtered by a percentage around the reporting irradiance provided in rcs.
- Parameters:
grps (pandas groupby) – Groupby object with time groups (months, seasons, etc.).
rcs (pandas DataFrame) – Dataframe of reporting conditions. Use the rep_cond method to generate a dataframe for this argument.
irr_col (str) – String that is the name of the column with the irradiance data.
low (float) – Minimum value as fraction e.g. 0.8.
high (float) – Max value as fraction e.g. 1.2.
freq (str) – Frequency to groupby e.g. ‘MS’ for month start.
**kwargs – Passed to pandas Grouper to control label and closed side of intervals. See pandas Grouper doucmentation for details. Default is left labeled and left closed.
- Return type:
pandas groupby
- captest.capdata.filter_irr(df, irr_col, low, high, ref_val=None)
Top level filter on irradiance values.
- Parameters:
df (DataFrame) – Dataframe to be filtered.
irr_col (str) – String that is the name of the column with the irradiance data.
low (float or int) – Minimum value as fraction (0.8) or absolute 200 (W/m^2)
high (float or int) – Max value as fraction (1.2) or absolute 800 (W/m^2)
ref_val (float or int) – Must provide arg when low/high are fractions
- Return type:
DataFrame
- captest.capdata.fit_model(df, fml='power ~ poa + I(poa * poa) + I(poa * t_amb) + I(poa * w_vel) - 1')
Fits linear regression using statsmodels to dataframe passed.
Dataframe must be first argument for use with pandas groupby object apply method.
- Parameters:
df (pandas dataframe) –
fml (str) – Formula to fit refer to statsmodels and patsy documentation for format. Default is the formula in ASTM E2848.
- Return type:
Statsmodels linear model regression results wrapper object.
- captest.capdata.get_tz_index(time_source, loc)
Create DatetimeIndex with timezone aligned with location dictionary.
Handles generating a DatetimeIndex with a timezone for use as an agrument to pvlib ModelChain prepare_inputs method or pvlib Location get_clearsky method.
- Parameters:
time_source (Dataframe, Series, or DatetimeIndex) – If passing a Dataframe or Series, the index of the dataframe will be used. If the index does not have a timezone, the timezone will be set using the timezone in the passed loc dictionary. If passing a DatetimeIndex with a timezone, it will be returned directly. If passing a DatetimeIndex without a timezone, the timezone will be set using the timezone in the passed loc dictionary.
- Return type:
DatetimeIndex with timezone
- captest.capdata.index_capdata(capdata, label, filtered=True)
Like Dataframe.loc but for CapData objects.
Pass a single label or list of labels to select the columns from the data or data_filtered DataFrames. The label can be a column name, a column group key, or a regression column key.
The special label regcols will return the columns identified in regression_cols.
- Parameters:
capdata (CapData) – The CapData object to select from.
label (str or list) – The label or list of labels to select from the data or data_filtered DataFrames. The label can be a column name, a column group key, or a regression column key. The special label regcols will return the columns identified in regression_cols.
filtered (bool, default True) – By default the method will return columns from the data_filtered DataFrame. Set to False to return columns from the data DataFrame.
- Return type:
DataFrame
- captest.capdata.perc_bounds(percent_filter)
Convert +/- percentage to decimals to be used to determine bounds.
- Parameters:
percent_filter (float or tuple, default None) – Percentage or tuple of percentages used to filter around the reporting irradiance. Required when
irr_balis True inrep_cond.- Returns:
Decimal versions of the percent irradiance filter. 0.8 and 1.2 would be returned when passing 20 to the input.
- Return type:
tuple
- captest.capdata.perc_difference(x, y)
Calculate percent difference of two values.
- captest.capdata.pred_summary(grps, rcs, allowance, **kwargs)
Summarize reporting conditions, predicted cap, and gauranteed cap.
This method does not calculate reporting conditions.
- Parameters:
grps (pandas groupby object) – Solar data grouped by season or month used to calculate reporting conditions. This argument is used to fit models for each group.
rcs (pandas dataframe) – Dataframe of reporting conditions used to predict capacities.
allowance (float) – Percent allowance to calculate gauranteed capacity from predicted capacity.
- Returns:
Dataframe of reporting conditions, model coefficients, predicted capacities
gauranteed capacities, and points in each grouping.
- captest.capdata.predict(regs, rcs)
Calculate predicted values for given linear models and predictor values.
Evaluates the first linear model in the iterable with the first row of the predictor values in the dataframe. Passed arguments must be aligned.
- Parameters:
regs (iterable of statsmodels regression results wrappers) –
rcs (pandas dataframe) – Dataframe of predictor values used to evaluate each linear model. The column names must match the strings used in the regression formuala.
- Return type:
Pandas series of predicted values.
- captest.capdata.predict_with_pvalue_check(cd, rc=None, pval_threshold=0.05)
Make prediction with optional p-value filtering of coefficients.
Uses model.predict() with custom params to ensure consistent behavior across pandas 2.x and 3.0+ (avoids Copy-on-Write issues).
- Parameters:
cd (CapData) – Instance of CapData with: - regression_results attribute (fitted statsmodels results) - rc attribute (reporting conditions DataFrame), used if rc param is None
rc (DataFrame, optional) – Reporting conditions DataFrame. If None, uses cd.rc.
pval_threshold (float, default 0.05) – If provided, coefficients with p-value > threshold are set to zero before making the prediction. Set to None to skip pval check.
- Returns:
Predicted value at reporting conditions.
- Return type:
float
- captest.capdata.pvlib_location(loc)
Create a pvlib location object.
- Parameters:
loc (dict) –
Dictionary of values required to instantiate a pvlib Location object.
- loc = {‘latitude’: float,
’longitude’: float, ‘altitude’: float/int, ‘tz’: str, int, float, default ‘UTC’}
See http://en.wikipedia.org/wiki/List_of_tz_database_time_zones for a list of valid time zones. ints and floats must be in hours from UTC.
- Return type:
pvlib location object.
- captest.capdata.pvlib_system(sys)
Create a pvlib
PVSystemobject.The
PVSystemwill have either aFixedMountor aSingleAxisTrackerMountdepending on the keys of the passed dictionary.- Parameters:
sys (dict) –
Dictionary of keywords required to create a pvlib
SingleAxisTrackerMountorFixedMount, plusalbedo.Example dictionaries:
- fixed_sys = {‘surface_tilt’: 20,
’surface_azimuth’: 180, ‘albedo’: 0.2}
- tracker_sys1 = {‘axis_tilt’: 0, ‘axis_azimuth’: 0,
’max_angle’: 90, ‘backtrack’: True, ‘gcr’: 0.2, ‘albedo’: 0.2}
Refer to pvlib documentation for details.
- Return type:
pvlib PVSystem object.
- captest.capdata.round_kwarg_floats(kwarg_dict, decimals=3)
Round float values in a dictionary.
- Parameters:
kwarg_dict (dict) –
decimals (int, default 3) – Number of decimal places to round to.
- Returns:
Dictionary with rounded floats.
- Return type:
dict
- captest.capdata.run_test(cd, steps)
Apply a list of capacity test steps to a given CapData object.
A list of CapData methods is applied sequentially with the passed parameters. This method allows succintly defining a capacity test, which facilitates parametric and automatic testing.
- Parameters:
cd (CapData) – The CapData methods will be applied to this instance of the pvcaptest CapData class.
steps (list of tuples) – A list of the methods to be applied and the arguments to be used. Each item in the list should be a tuple of the CapData method followed by a tuple of arguments and a dictionary of keyword arguments. If there are not args or kwargs an empty tuple or dict should be included. Example: [(CapData.filter_irr, (400, 1500), {})]
- captest.capdata.sensor_filter(df, threshold, row_filter=<function check_all_perc_diff_comb>)
Check dataframe for rows with inconsistent values.
Applies check_all_perc_diff_comb function along rows of passed dataframe.
- Parameters:
df (pandas DataFrame) –
perc_diff (float) – Percent difference as decimal.
- captest.capdata.spans_year(start_date, end_date)
Determine if dates passed are in the same year.
- Parameters:
start_date (pandas Timestamp) –
end_date (pandas Timestamp) –
- Return type:
bool
- captest.capdata.tstamp_kwarg_to_strings(kwarg_dict)
Convert timestamp values in dictionary to strings.
- Parameters:
kwarg_dict (dict) –
- Return type:
dict
- captest.capdata.update_summary(func)
Decoratates the CapData class filter methods.
Updates the CapData.summary and CapData.summary_ix attributes, which are used to generate summary data by the CapData.get_summary method.
- captest.capdata.wrap_seasons(df, freq)
Rearrange an 8760 so a quarterly groupby will result in seasonal groups.
- Parameters:
df (DataFrame) – Dataframe to be rearranged.
freq (str) – String pandas offset alias to specify aggregattion frequency for reporting condition calculation.
- Return type:
DataFrame
- captest.capdata.wrap_year_end(df, start, end)
Shifts data before or after new year to form a contigous time period.
This function shifts data from the end of the year a year back or data from the begining of the year a year forward, to create a contiguous time period. Intended to be used on historical typical year data.
If start date is in dataframe, then data at the beginning of the year will be moved ahead one year. If end date is in dataframe, then data at the end of the year will be moved back one year.
cntg (contiguous); eoy (end of year)
- Parameters:
df (pandas DataFrame) – Dataframe to be adjusted.
start (pandas Timestamp) – Start date for time period.
end (pandas Timestamp) – End date for time period.
captest.captest module
Unified test orchestrator and supporting utilities.
This module houses the CapTest class, the TEST_SETUPS registry of
named regression presets, and small formatting helpers (print_results,
highlight_pvals, perc_wrap) consumed by CapTest methods that
compare a measured + modeled pair of CapData instances.
Import direction
At module-import time the dependency is one-way only:
captest.captest -> captest.capdata. CapData is imported here at
module scope so CapTest can declare meas/sim as
param.ClassSelector(class_=CapData). captest.capdata does NOT import
anything from this module at import time; the single-CapData helper
predict_with_pvalue_check is imported lazily from within
CapTest.captest_results.
- class captest.captest.CapTest(**kwargs)
Bases:
ParameterizedConfig + state container for an ASTM E2848 capacity test.
CapTestbinds a measuredCapDataand a modeledCapDatato a named regression preset fromTEST_SETUPSand holds all test-level configuration in one place. It is intentionally a config + state container rather than a runner: users still invokect.meas.filter_*(...),ct.meas.rep_cond(...), andct.meas.fit_regression()by hand.Typical workflows
Programmatic:
ct = CapTest.from_params( test_setup="e2848_default", meas=meas_cd, sim=sim_cd, ac_nameplate=125_000, test_tolerance="- 4", ) # ``from_params`` runs ``setup()`` automatically because both meas # and sim were supplied as pre-built CapData instances.
From a yaml file:
ct = CapTest.from_yaml("./config.yaml")
Bare + manual:
ct = CapTest(test_setup="bifi_e2848_etotal", bifaciality=0.15) ct.meas = my_meas_cd ct.sim = my_sim_cd ct.setup()
- param meas:
Measured-data
CapDatainstance. Assigned viafrom_params,from_yaml, or directly.- type meas:
CapData or None
- param sim:
Modeled-data
CapDatainstance.- type sim:
CapData or None
- param test_setup:
Key into
TEST_SETUPSor the literal"custom". Default"e2848_default".- type test_setup:
str
- param reg_fml:
If set, overrides the preset’s regression formula at
setup().- type reg_fml:
str or None
- param reg_cols_meas:
If set, overrides the preset’s measured
regression_colsdict.- type reg_cols_meas:
dict or None
- param reg_cols_sim:
If set, overrides the preset’s modeled
regression_colsdict.- type reg_cols_sim:
dict or None
- param rep_conditions:
If set, partial-merged onto the preset’s
rep_conditionsatsetup(). Top-level keys replace; the nestedfuncdict is merged one level deep so users can override only a single variable’s aggregation.- type rep_conditions:
dict or None
- param rep_cond_source:
Which
CapData.rcis used bycaptest_results. Default"meas".- type rep_cond_source:
{“meas”, “sim”}
- param sim_days:
Days of simulated data used for the test. Default 30.
- type sim_days:
int
- param shade_filter_start:
"HH:MM"between-time strings for shade filtering.- type shade_filter_start:
str or None
- param shade_filter_end:
"HH:MM"between-time strings for shade filtering.- type shade_filter_end:
str or None
- param ac_nameplate:
Nameplate AC power in watts.
- type ac_nameplate:
float or None
- param test_tolerance:
Tolerance string forwarded to pass/fail logic. Default
"- 4".- type test_tolerance:
str
- param min_irr:
Irradiance filter bounds (W/m^2).
- type min_irr:
float
- param max_irr:
Irradiance filter bounds (W/m^2).
- type max_irr:
float
- param clipping_irr:
Irradiance filter bounds (W/m^2).
- type clipping_irr:
float
- param rep_irr_filter:
Fractional reporting-irradiance filter band in
[0, 1].- type rep_irr_filter:
float
- param fshdbm:
Shade filter threshold in
[0, 1].- type fshdbm:
float
- param irrad_stability:
Irradiance stability strategy.
- type irrad_stability:
{“std”, “filter_clearsky”, “contract”}
- param irrad_stability_threshold:
Threshold value for
irrad_stability.- type irrad_stability_threshold:
float
- param hrs_req:
Hours of data required for a complete test. Default 12.5.
- type hrs_req:
float
- param bifaciality:
Calc-params scalars propagated onto both
CapDatainstances atsetup(). See_downstream_attrs.- type bifaciality:
float
- param power_temp_coeff:
Calc-params scalars propagated onto both
CapDatainstances atsetup(). See_downstream_attrs.- type power_temp_coeff:
float
- param base_temp:
Calc-params scalars propagated onto both
CapDatainstances atsetup(). See_downstream_attrs.- type base_temp:
float
- param meas_loader:
Programmatic-only data-loader callables. Default resolution when
None:captest.io.load_dataandcaptest.io.load_pvsystrespectively. Not serialized to yaml.- type meas_loader:
callable or None
- param sim_loader:
Programmatic-only data-loader callables. Default resolution when
None:captest.io.load_dataandcaptest.io.load_pvsystrespectively. Not serialized to yaml.- type sim_loader:
callable or None
- param meas_load_kwargs:
Plain-dict kwargs splatted into the loaders.
- type meas_load_kwargs:
dict or None
- param sim_load_kwargs:
Plain-dict kwargs splatted into the loaders.
- type sim_load_kwargs:
dict or None
- _resolved_setup
The fully-resolved
TEST_SETUPSentry aftersetup()has run. Plain instance attribute (not aparam.*) sosetup()can be called multiple times.- Type:
dict or None
- rep_irr_filter_low
Read-only. Lower irradiance fraction bound derived from
rep_irr_filter:1 - rep_irr_filter. For example, whenrep_irr_filter=0.2this is0.8. Pass aslowtoCapData.filter_irrtogether with aref_val.- Type:
float
- rep_irr_filter_high
Read-only. Upper irradiance fraction bound derived from
rep_irr_filter:1 + rep_irr_filter. For example, whenrep_irr_filter=0.2this is1.2. Pass ashightoCapData.filter_irrtogether with aref_val.- Type:
float
Notes
The lhs key of the regression formula is always
"power"across shipped presets, even when the formula regresses a derived quantity (e.g. temperature-corrected power).- ac_nameplate = None
- base_temp = 25
- bifaciality = 0.0
- captest_results(check_pvalues=False, pval=0.05, print_res=True)
Compute the capacity test ratio for
self.measvsself.sim.Picks reporting conditions from
self.meas.rcorself.sim.rcbased onself.rep_cond_source. Usesself.ac_nameplatefor the tested-capacity printout andself.test_tolerance(viaself.determine_pass_or_fail) for the pass/fail result.- Parameters:
check_pvalues (bool, default False) – When True, coefficients with a p-value above
pvalare zeroed before prediction.pval (float, default 0.05) – P-value cutoff used when
check_pvaluesis True.print_res (bool, default True) – When True, prints the formatted results.
- Returns:
Capacity test ratio
actual / expected.- Return type:
float
- captest_results_check_pvalues(print_res=False, **kwargs)
Compute cap ratio with and without p-value filtering.
- Parameters:
print_res (bool, default False) – Forwarded to both internal
captest_resultscalls.**kwargs – Forwarded to
captest_results. Do not passcheck_pvalues; this method sets it explicitly for each internal call.
- Returns:
Styled DataFrame with p-values and parameter values for both
self.measandself.sim. P-values >= 0.05 are highlighted.- Return type:
pandas.io.formats.style.Styler
- clipping_irr = 1000
- determine_pass_or_fail(cap_ratio)
Determine a pass/fail result from a capacity ratio.
Uses
self.test_toleranceandself.ac_nameplate. Replaces the pre-CapTest module-levelcapdata.determine_pass_or_fail.- Parameters:
cap_ratio (float) – Ratio of the measured-data regression result to the modeled-data regression result.
- Returns:
Pass/fail flag and the tolerance bounds string.
- Return type:
tuple of (bool, str)
- classmethod from_mapping(sub, *, key='captest', base_dir=None, meas_loader=None, sim_loader=None)
Construct a CapTest from an already-parsed captest sub-mapping.
Direct-handoff constructor used by downstream wrappers that mutate the captest sub-mapping in memory – applying project-specific defaults, promoting fields, injecting paths – before asking captest to validate and build the
CapTest. Exposes the same validate-and-construct pipeline thatfrom_yamlruns after reading the file, without the file read.- Parameters:
sub (dict) – Captest sub-mapping. Typically obtained from
load_config()or assembled by a downstream wrapper. Must containtest_setup. Supported keys are declared by_CAPTEST_YAML_KEYS/_CAPTEST_OVERRIDE_KEYS.subis not mutated.key (str, default 'captest') – Purely used in error messages (e.g. “Unknown key ‘x’ under the ‘captest’ sub-mapping”). Match the top-level yaml key under which this sub-mapping would normally live so error messages point users at the right place in their config file.
base_dir (str, Path, or None, default None) – Base directory used to resolve relative
meas_path/sim_pathvalues insub. If the sub-mapping contains any relative path andbase_dirisNone, raisesValueError. URI-scheme values in the sub-mapping (e.g.s3://bucket/path) are treated as absolute and skip resolution even thoughpathlib.Path.is_absolute()returns False for them. URI-schemebase_dirvalues are joined to relative paths via string concatenation so the scheme is preserved; localbase_dirvalues are joined viapathlib.Path.meas_loader (callable or None, optional) – Programmatic-only loader callables that override the default resolution (
captest.io.load_data/captest.io.load_pvsyst). Same semantics asfrom_yaml().sim_loader (callable or None, optional) – Programmatic-only loader callables that override the default resolution (
captest.io.load_data/captest.io.load_pvsyst). Same semantics asfrom_yaml().
- Return type:
- classmethod from_params(**kwargs)
Construct a CapTest from parameter kwargs.
Recognizes the non-param kwargs
meas,sim,meas_path,sim_pathin addition to every declaredparam.*. If bothmeasandmeas_pathare supplied the pre-built instance wins and a warning is emitted (same forsim/sim_path).When both
measandsimend up populated,setup()is called automatically. Otherwise the partially-initialized instance is returned and the caller finishes the workflow manually.- Parameters:
**kwargs – Any declared CapTest parameter, plus
meas,sim,meas_path,sim_path.- Return type:
- classmethod from_yaml(path, key='captest', meas_loader=None, sim_loader=None)
Construct a CapTest from a yaml config file.
Reads the sub-mapping at the given top-level
keyof the yaml file and delegates tofrom_mapping()withbase_dir=path.parentso relativemeas_path/sim_pathvalues resolve against the yaml’s directory.- Parameters:
path (str or Path) – Path to a yaml file.
key (str, default 'captest') – Top-level key whose value is the CapTest sub-mapping.
meas_loader (callable or None, optional) – Programmatic-only loader callables that override the default resolution (
captest.io.load_data/captest.io.load_pvsyst). Supplied here because loader callables cannot be represented in yaml. Useful for downstream wrappers that drive yaml-based construction but need a custom measured-data loader. WhenNonethe default resolution applies.sim_loader (callable or None, optional) – Programmatic-only loader callables that override the default resolution (
captest.io.load_data/captest.io.load_pvsyst). Supplied here because loader callables cannot be represented in yaml. Useful for downstream wrappers that drive yaml-based construction but need a custom measured-data loader. WhenNonethe default resolution applies.
- Return type:
- fshdbm = 1.0
- get_summary()
Concatenate
self.meas.get_summary()andself.sim.get_summary().- Returns:
Filter history for both CapData instances, stacked.
- Return type:
pandas.DataFrame
- hrs_req = 12.5
- irrad_stability = 'std'
- irrad_stability_threshold = 30
- max_irr = 1400
- meas = None
- meas_load_kwargs = None
- meas_loader = None
- min_irr = 400
- name = 'CapTest'
- overlay_scatters(expected_label='PVsyst')
Overlay the final scatter plot from
self.measandself.sim.Builds the scatter plot for each CapData instance via the resolved preset’s
scatter_plotscallable, then overlays the two first-panel scatters with labels.- Parameters:
expected_label (str, default "PVsyst") – Label used for the modeled-data scatter.
- Return type:
hv.Overlay
- power_temp_coeff = -0.32
- reg_cols_meas = None
- reg_cols_sim = None
- reg_fml = None
- rep_cond(which='meas', **overrides)
Call
cd.rep_condwith the resolved preset’s rep_conditions.The preset’s
rep_conditionsdict (after anyself.rep_conditionsoverrides fromsetup()) is used as the default kwargs.overridesis partial-merged on top: top-level keys replace, the nestedfuncdict merges one level deep.- Parameters:
which ({'meas', 'sim'}) – Which CapData instance’s
rep_condto call.**overrides – Partial-merged onto the resolved
rep_conditionsdict.
- Returns:
cd.rep_condwrites tocd.rc.- Return type:
None
- rep_cond_source = 'meas'
- rep_conditions = None
- rep_irr_filter = 0.2
- property rep_irr_filter_high
Upper irradiance fraction bound derived from
rep_irr_filter.Equal to
1 + rep_irr_filter. Updates automatically wheneverrep_irr_filteris reassigned. Pass as thehighargument toCapData.filter_irrwith aref_valto filter within the reporting-irradiance band.
- property rep_irr_filter_low
Lower irradiance fraction bound derived from
rep_irr_filter.Equal to
1 - rep_irr_filter. Updates automatically wheneverrep_irr_filteris reassigned. Pass as thelowargument toCapData.filter_irrwith aref_valto filter within the reporting-irradiance band.
- residual_plot()
Overlayed residual plots for
self.measandself.sim.Each regression exogenous variable gets its own panel showing the residuals of both CapData instances overlaid. The single-CapData helper
plotting.get_resid_exog_framestays where it is.- Return type:
hv.Layout
- property resolved_setup
Return the resolved TEST_SETUPS entry or raise if setup() not run.
- scatter_plots(which='meas', **kwargs)
Create the scatter plot for the active capacity-test setup.
This method is intended primarily to plot a power vs irradiance scatter plot that fits with a preset capacity test from the
TEST_SETUPSdefined in thecaptestmodule.To create manual scatter plots and to see the complete list of accepted kwargs and their behavior, see the docstrings for
captest.plotting.ScatterPlotandcaptest.plotting.ScatterBifiPowerTc.ScatterBifiPowerTcinherits most options fromScatterPlotbut ignorestc_powerbecause thebifi_power_tcregression power term is already temperature corrected.The selected
test_setupcontrols which plotting function is used. Duringsetup(), the named setup is resolved fromTEST_SETUPS; that resolved setup includes ascatter_plotscallable matched to the setup’s regression formula. This method picksself.measorself.simand forwards it, plus any keyword arguments, to that callable.Built-in setup behavior:
e2848_default,bifi_e2848_etotal, ande2848_spec_corrected_poauseScatterPlotthrough thescatter_default/scatter_etotalwrappers. These create a formula-driven scatter of the regression left-hand-side variable against the first right-hand-side variable.bifi_power_tcusesScatterBifiPowerTcthrough thescatter_bifi_power_tcwrapper. This creates one panel for each right-hand-side variable in the bifacial temperature-corrected regression, typicallypower vs poaandpower vs rpoa.
All keyword arguments are forwarded to the underlying plotting class. The most commonly used options are:
filtered: usedata_filteredwhen True, otherwisedata.split_dayandsplit_time: split points into AM and PM groups.am_color,pm_color,am_marker, andpm_marker: customize AM / PM glyph style.tc_power,tc_mode,tc_power_calc, andtc_force_recompute: show temperature-corrected power for setups whose regression still uses raw power.tc_modecan be"replace","add_panel", or"overlay".timeseries: add a linked timeseries panel below the scatter.heightandwidth: set plot dimensions.
- Parameters:
which ({'meas', 'sim'}) – Which
captest.capdata.CapDatainstance to plot.**kwargs – Plotting options forwarded to the preset’s scatter callable.
- Returns:
Scatter plot layout for the selected measured or modeled data.
- Return type:
holoviews.Layout
Examples
Plot measured data with the default options:
ct.scatter_plots()
Plot modeled data, split points into AM and PM groups, and add a linked timeseries panel:
ct.scatter_plots(which="sim", split_day=True, timeseries=True)
Add a temperature-corrected power panel for a setup that uses raw power in the regression:
ct.scatter_plots(tc_power=True, tc_mode="add_panel")
- setup(verbose=True)
Resolve TEST_SETUPS, propagate scalars, process regression cols.
Raises
RuntimeErrorifmeasorsimis unset. Assigns the resolved TEST_SETUPS entry toself._resolved_setupand returnsselffor fluent chaining.- Parameters:
verbose (bool, default True) – Forwarded to
CapData.process_regression_columns.- Returns:
self, for fluent chaining.- Return type:
- shade_filter_end = None
- shade_filter_start = None
- sim = None
- sim_days = 30
- sim_load_kwargs = None
- sim_loader = None
- spectral_module_type = 'cdte'
- test_setup = 'e2848_default'
- test_tolerance = '- 4'
- to_yaml(path, key='captest', merge_into_existing=True)
Serialize the curated CapTest configuration to a yaml file.
The written sub-mapping lives under the top-level
key(default"captest") and contains every scalarparam.*plustest_setup, any non-None override ofreg_fml/reg_cols_meas/reg_cols_sim/rep_conditions,meas_path/sim_path(when the instance was constructed from paths), and non-emptymeas_load_kwargs/sim_load_kwargs.Percentile
perc_wrap(N)callables insiderep_conditions['func']are written back as"perc_N"strings so thatfrom_yamlround-trips them.meas,sim,regression_results,_resolved_setup, and the loader callables are never serialized.- Parameters:
path (str or Path) – Destination yaml file.
key (str, default 'captest') – Top-level key under which the captest sub-mapping is written. Parametrizing this lets a single yaml hold multiple captest flavors (e.g.
captest_e2848andcaptest_bifi).merge_into_existing (bool, default True) – When True and the destination file already exists and parses as a mapping, preserve the other top-level keys and overwrite only the sub-tree at
key. When False, the destination is unconditionally replaced with a fresh mapping containing onlykey.
- Return type:
None
- captest.captest.highlight_pvals(s)
Highlight Series entries >= 0.05 with a yellow background.
Intended for use with
pandas.io.formats.style.Styler.apply. Consumed byCapTest.captest_results_check_pvalues(ported in Unit 7).
- captest.captest.load_config(path, key='captest')
Load and lightly validate the captest sub-mapping from a yaml file.
- Parameters:
path (str or Path) – Path to the yaml file. Relative paths in
meas_path/sim_pathare resolved by callers usingPath(path).parentas the base.key (str, default 'captest') – Top-level key whose value is the CapTest configuration sub-mapping.
- Returns:
The sub-mapping at
keywith string shorthands resolved. Does NOT validate againstCapTestparam types;CapTest.from_yamldoes that.- Return type:
dict
- Raises:
KeyError – If
keyis not present at the top level of the yaml file.
- captest.captest.perc_wrap(p)
Return a callable that computes the
p-th percentile of a Series.Used to build
TEST_SETUPS[...]['rep_conditions']['func']dicts for percentile-based reporting irradiance (e.g. 60th percentile POA).- Parameters:
p (numeric) – Percentile in [0, 100].
- Returns:
Function that takes a pandas Series or array-like and returns the p-th percentile using
method='nearest'.- Return type:
callable
- captest.captest.print_results(test_passed, expected, actual, cap_ratio, capacity, bounds)
Print formatted results of a capacity test.
- Parameters:
test_passed (tuple of (bool, str)) – Pass/fail flag and bounds string produced by
CapTest.determine_pass_or_fail(or the legacy module-leveldetermine_pass_or_failincapdata.pyuntil Unit 7 removes it).expected (float) – Predicted modeled test output at reporting conditions.
actual (float) – Predicted measured test output at reporting conditions.
cap_ratio (float) – Capacity test ratio (
actual / expected).capacity (float) – Tested capacity (
nameplate * cap_ratio).bounds (str) – Human-readable bounds string for the test tolerance.
- captest.captest.resolve_test_setup(name, overrides=None)
Resolve a preset by name plus optional overrides.
- Parameters:
name (str) – Key into
TEST_SETUPSor the literal"custom".overrides (dict or None) – Optional dict with any of
reg_cols_meas,reg_cols_sim,reg_fml,scatter_plots,rep_conditionsto override the preset.rep_conditionsis partial-merged; other keys replace. Whenname == "custom",reg_cols_meas,reg_cols_sim, andreg_fmlare required inoverrides.
- Returns:
A fully-validated entry dict suitable for
CapTest._resolved_setup.- Return type:
dict
- captest.captest.scatter_bifi_power_tc(cd, **kwargs)
Two-panel layout: lhs vs.
poaand lhs vs.rpoa.Intended for the
bifi_power_tcpreset whose regression formula ispower ~ poa + rpoa(withpowerresolved to the temperature-corrected calculated column). Thin wrapper aroundcaptest.plotting.ScatterBifiPowerTc; each rhs variable gets its own panel.
- captest.captest.scatter_default(cd, **kwargs)
Formula-agnostic scatter of regression lhs vs. first rhs variable.
Thin wrapper around
captest.plotting.ScatterPlot. Forwards every keyword argument through to the class constructor, so callers can opt into the AM/PM split, temperature-corrected power, and timeseries-pairing features without changing call sites.- Parameters:
cd (CapData) – Must have
regression_formulaset andregression_colsresolved (e.g. viaCapTest.setup()orcd.process_regression_columns()).**kwargs – Forwarded to
ScatterPlot. See its docstring for the full parameter surface.
- Returns:
A single-panel Layout wrapping the scatter plot.
- Return type:
hv.Layout
- captest.captest.scatter_etotal(cd, **kwargs)
Single scatter of regression lhs vs. the
e_totalcolumn.Intended for the
bifi_e2848_etotalpreset. Thin wrapper aroundcaptest.plotting.ScatterPlot; resolves the x column fromcd.regression_cols['poa']afterprocess_regression_columnshas materialized the calculated e_total column.
- captest.captest.validate_test_setup(entry)
Validate a single
TEST_SETUPSentry dict.- Raises:
KeyError – If required keys are missing or unknown keys are present.
ValueError – If
reg_fmldoes not parse, lhs+rhs are not subsets of bothreg_cols_measandreg_cols_sim,scatter_plotsis not callable, orrep_conditions/rep_conditions['func']have an unexpected shape.
captest.columngroups module
- class captest.columngroups.ColumnGroups(dict=None, /, **kwargs)
Bases:
UserDict
- captest.columngroups.group_columns(data)
Create a dict of raw column names paired to categorical column names.
Uses multiple type_def formatted dictionaries to determine the type, sub-type, and equipment type for data series of a dataframe. The determined types are concatenated to a string used as a dictionary key with a list of one or more original column names as the paired value.
- Parameters:
data (DataFrame) – Data with columns to group.
- Returns:
cg
- Return type:
- captest.columngroups.series_type(series, type_defs)
Assign columns to a category by analyzing the column names.
The type_defs parameter is a dictionary which defines search strings for each key, where the key is a categorical name and the search strings are possible related names. For example an irradiance sensor has the key ‘irr’ with search strings ‘irradiance’ ‘plane of array’, ‘poa’, etc.
- Parameters:
series (pandas series) – Row or column of dataframe passed by pandas.df.apply.
type_defs (dictionary) – Dictionary with the following structure. See type_defs {‘category abbreviation’: [category search strings]}
- Returns:
Returns a string representing the category for the series.
- Return type:
string
captest.io module
- class captest.io.DataLoader(path: str = './data/', loc: dict | None = None, sys: dict | None = None, file_reader: object = <function file_reader>, files_to_load: list | None = None, failed_to_load: list | None = None)
Bases:
objectClass to load SCADA data and return a CapData object.
Supports loading from local filesystems and S3 buckets. The optional``s3fs`` package must be installed for S3 support.
- drop_duplicate_rows()
- failed_to_load: list | None = None
- file_reader(**kwargs)
Read measured solar data from a csv file.
Utilizes pandas read_csv to import measure solar data from a csv file. Attempts a few different encodings, tries to determine the header end by looking for a date in the first column, and concatenates column headings to a single string.
- Parameters:
path (Path) – Path to file to import.
**kwargs – Use to pass additional kwargs to pandas read_csv.
- Return type:
pandas DataFrame
- files_to_load: list | None = None
- join_files()
Combine the DataFrames of loaded_files into a single DataFrame.
Checks if the columns of each DataFrame in loaded_files matches. If they do all match, then they will be combined vertically along the index.
If they do not match, then they will be combined by creating a datetime index that begins with the earliest datetime in all the indices to the latest datetime in all the indices using the most common frequency across all the indices. The columns will be a set of all the columns.
- Returns:
data – The combined data.
- Return type:
DataFrame
- load(extension='csv', summary=True, verbose=False, raise_errors=False, skip_dir_load=False, **kwargs)
Load file(s) of timeseries data from SCADA / DAS systems.
Set path to the path to a file to load a single file. Set path to the path to a directory of files to load all the files in the directory ending in “csv”. Or, set files_to_load to a list of specific files to load. Paths may be local filesystem paths or S3 URIs (e.g.
s3://bucket/path/).Multiple files will be joined together and may include files with different column headings. When multiple files with matching column headings are loaded, the individual files will be reindexed and then joined.
Missing time intervals within the individual files will be filled, but missing time intervals between the individual files will not be filled.
When loading multiple files they will be stored in loaded_files, a dictionary, mapping the file names to a dataframe for each file.
- Parameters:
extension (str, default "csv") – Change the extension to allow loading different filetypes. Must also set the file_reader attribute to a function that will read that type of file. Do not include a period “.”.
summary (bool, default True) – By default prints path of each file attempted to load and then confirmation it was loaded or states it failed to load. Is only relevant if path is set to a directory not a file. Set to False to not print out any file loading status.
verbose (bool, default False) – Prints same output as if summary were True (sets summary True) and prints details of reindexing each file after loading.
raise_errors (bool, default False) – Set to true to raise error if file fails to load.
skip_dir_load (bool, default False) – Set to True to pass a custom file_reader that handles multiple files. This will skip the parsing of files in a directory and pass the path to the directory and kwargs to the file_reader function.
**kwargs – Are passed through to the file_reader callable, which by default will pass them on to pandas.read_csv.
- Returns:
Resulting DataFrame of data is stored to the data attribute.
- Return type:
None
- loc: dict | None = None
- path: str = './data/'
- reindex()
- reindex_loaded_files(verbose=False)
Reindex files to ensure no missing indices and find frequency for each file.
- Parameters:
verbose (bool, default False) – Set to True for more detailed output.
- Returns:
reindexed_dfs (dict) – Filenames mapped to reindexed DataFrames.
common_freq (str) – The index frequency most common across the reindexed DataFrames.
file_frequencies (list) – The index frequencies for each file.
- set_files_to_load(extension='csv')
Set files_to_load attribute to a list of filepaths.
- sort_data()
- sys: dict | None = None
- captest.io.file_reader(path, **kwargs)
Read measured solar data from a csv file.
Utilizes pandas read_csv to import measure solar data from a csv file. Attempts a few different encodings, tries to determine the header end by looking for a date in the first column, and concatenates column headings to a single string.
- Parameters:
path (Path) – Path to file to import.
**kwargs – Use to pass additional kwargs to pandas read_csv.
- Return type:
pandas DataFrame
- captest.io.flatten_multi_index(columns)
- captest.io.load_data(path, group_columns=<function group_columns>, file_reader=<function file_reader>, skip_dir_load=False, name='meas', sort=True, drop_duplicates=True, reindex=True, site=None, column_groups_template=False, verbose=False, **kwargs)
Load file(s) of timeseries data from SCADA / DAS systems.
This is a convenience function to generate an instance of DataLoader and call the load method.
A single file or multiple files can be loaded. Multiple files will be joined together and may include files with different column headings.
- Parameters:
path (str) – Path to either a single file to load or a directory of files to load. Supports local paths and S3 URIs (e.g.
s3://bucket/path/).group_columns (function or str, default columngroups.group_columns) – Function to use to group the columns of the loaded data. Function should accept a DataFrame and return a dictionary with keys that are ids and values that are lists of column names. Will be set to the group_columns attribute of the CapData.DataLoader object. Provide a string to load column grouping from a json, yaml, or excel file. The json or yaml file should parse to a dictionary and the excel file should have two columns with the first column containing the group ids and the second column the column names. The first column may have missing values. See function load_excel_column_groups for more details.
file_reader (function, default io.file_reader) – Function to use to load an individual file. By default will use the built in file_reader function to try to load csv files. If passing a function to read other filetypes, the kwargs should include the filetype extension e.g. ‘parquet’.
skip_dir_load (bool, default False) – Set to True to pass a custom file_reader that handles multiple files. This will skip the parsing of files in a directory by DataLoader.load and allow the function passed to file_reader to handle multiple files in a directory.
name (str) – Identifier that will be assigned to the returned CapData instance.
sort (bool, default True) – By default sorts the data by the datetime index from old to new.
drop_duplicates (bool, default True) – By default drops rows of the joined data where all the columns are duplicates of another row. Keeps the first instance of the duplicated values. This is helpful if individual data files have overlapping rows with the same data.
reindex (bool, default True) – By default will create a new index for the data using the earliest datetime, latest datetime, and the most frequent time interval ensuring there are no missing intervals.
site (dict or str, default None) – Pass a dictionary or path to a json or yaml file containing site data, which will be used to generate modeled clear sky ghi and poa values. The clear sky irradiance values are added to the data and the column_groups attribute is updated to include these two irradiance columns. The site data dictionary should be {sys: {system data}, loc: {location data}}. See the capdata.csky documentation for the format of the system data and location data.
column_groups_template (bool, default False) – If True, will call CapData.data_columns_to_excel to save a file to use to manually create column groupings at path.
verbose (bool, default False) – Set to True to print status of file loading.
**kwargs – Passed to DataLoader.load. Any kwargs not used by DataLoader.load are passed to the file_reader function, which by default passes them to pandas.read_csv. DataLoader.load accepts a summary kwarg to show files loaded from a directory without reindexing status shown when verbose is set to True.
- captest.io.load_excel_column_groups(path)
Load column groups from an excel file.
The excel file should have two columns with no heder. The first column contains the group names and the second column contain the the column names of the data. The first column may have blanks rathe than repeating the group name for each column in the group.
For example: group1, col1
, col2 , col3
- group2, col4
, col5
- Parameters:
path (str) – Path to file to import.
- Returns:
Dictionary mapping column group names to lists of column names.
- Return type:
dict
- captest.io.load_pvsyst(path, name='pvsyst', egrid_unit_adj_factor=None, set_regression_columns=True, **kwargs)
Load data from a PVsyst energy production model.
Will load day first or month first dates. Expects files that use a comma as a separator rather than a semicolon.
- Parameters:
path (str) – Path to file to import.
name (str, default pvsyst) – Name to assign to returned CapData object.
egrid_unit_adj_factor (numeric, default None) – E_Grid will be divided by the value passed.
set_regression_columns (bool, default True) – By default sets power to E_Grid, poa to GlobInc, t_amb to T Amb, and w_vel to WindVel. Set to False to not set regression columns on load.
**kwargs – Use to pass additional kwargs to pandas read_csv. Pass sep=’;’ to load files that use semicolons instead of commas as the separator.
- Return type:
Notes
Standardizes the ambient temperature column name to T_Amb. v6.63 of PVsyst used “T Amb”, v.6.87 uses “T_Amb”, and v7.2 uses “T_Amb”. Will change ‘T Amb’ or ‘TAmb’ to ‘T_Amb’ if found in the column names.
captest.prtest module
- class captest.prtest.PrResults(*, dc_nameplate, expected_pr, input_data, pr, results_data, timestep, name)
Bases:
ParameterizedResults from a PR calculation.
- dc_nameplate = 0.0
- expected_pr = 0.0
- input_data = None
- name = 'PrResults'
- pr = 0.0
- print_pr_result()
Print summary of PR result - passing / failing and by how much
- results_data = None
- timestep = (0, 0)
- captest.prtest.perf_ratio(ac_energy, dc_nameplate, poa, unit_adj=1, degradation=0, year=1, availability=1)
Calculate performance ratio.
- Parameters:
ac_energy (Series) – Measured energy production (Wh) from system meter.
dc_nameplate (numeric) – Summation of nameplate ratings (W) for all installed modules of system under test.
poa (Series) – POA irradiance (W/m^2) for each time interval of the test.
unit_adj (numeric, default 1) – Scale factor to adjust units of ac_energy. For exmaple pass 1000 to convert measured energy from kWh to Wh within PR calculation.
degradation (numeric, default None) – Apply a derate (percent, Ex: 0.5%) for degradation to the expected power (denominator). Must also pass specify a value for the year argument. NOTE: Percent is divided by 100 to convert to decimal within function.
year (numeric) – Year of operation to use in degradation calculation.
availability (numeric or Series, default 1) – Apply an adjustment for plant availability to the expected power (denominator).
- Returns:
Instance of class PrResults.
- Return type:
- captest.prtest.perf_ratio_inputs_ok(ac_energy, dc_nameplate, poa, availability=1)
Check types of perf_ratio arguments.
- Parameters:
ac_energy (Series) – Measured energy production (Wh) from system meter.
dc_nameplate (numeric) – Summation of nameplate ratings (W) for all installed modules of system under test.
poa (Series) – POA irradiance (W/m^2) for each time interval of the test.
availability (numeric or Series, default 1) – Apply an adjustment for plant availability to the expected power (denominator).
- captest.prtest.perf_ratio_temp_corr_nrel(ac_energy, dc_nameplate, poa, power_temp_coeff=None, temp_bom=None, temp_amb=None, single_irr_weighted_temp=False, wind_speed=None, base_temp=25, module_type='glass_cell_poly', racking='open_rack', unit_adj=1, degradation=None, year=None, availability=1)
Calculate performance ratio.
- Parameters:
ac_energy (Series) – Measured energy production (kWh) from system meter.
dc_nameplate (numeric) – Summation of nameplate ratings (W) for all installed modules of system under test.
poa (Series) – POA irradiance (W/m^2) for each time interval of the test.
power_temp_coeff (numeric, default None) – Module power temperature coefficient as percent per degree celsius. Ex. -0.36
temp_bom (Series) – Measured back of module temperature. The temp_amb and wind_speed arguments are not used if this argument is not None; skips calculating BOM temps from ambient temperature, wind speed, and POA irradiance.
single_irr_weighted_temp (bool, default False) – Set to True to calculate a single irradiance weighted temperature to use when temperature correcting the power. Some contract language calls for this but it does not follow the calculation defined in the NREL paper.
temp_amb (Series) – Ambient temperature (degrees C) measurements.
wind_speed (Series) – Measured wind speed (m/sec) corrected to measurement height of 10 meters.
base_temp (numeric, default 25) – Base temperature (in Celsius) to correct power to. Default is the STC of 25 degrees Celsius. The NREL Weather-Corrected Performance Ratio technical report uses the term ‘Tcell_typ_avg’ for this value.
module_type (str, default 'glass_cell_poly') – Any of glass_cell_poly, glass_cell_glass, or ‘poly_tf_steel’.
racking (str, default 'open_rack') – Any of ‘open_rack’, ‘close_roof_mount’, or ‘insulated_back’
unit_adj (numeric, default 1) – Scale factor to adjust units of ac_energy. For exmaple pass 1000 to convert measured energy from kWh to Wh within PR calculation.
degradation (numeric, default None) – NOT IMPLEMENTED Apply a derate for degradation to the expected power (denominator). Must also pass specify a value for the year argument.
year (numeric) – NOT IMPLEMENTED Year of operation to use in degradation calculation.
availability (numeric or Series, default 1) – NOT IMPLEMENTED Apply an adjustment for plant availability to the expected power (denominator).
captest.util module
- captest.util.append_tags(sel_tags, tags, regex_str)
- captest.util.detect_solar_noon(data, ghi_col='ghi_mod_csky', default='12:30')
Estimate a single representative solar-noon clock time from clear-sky GHI.
Groups
data[ghi_col]by the clock time of each timestamp (hour and minute, ignoring date), takes the mean of each clock-time bucket, and returns the bucket with the largest mean formatted as"HH:MM".Used by plotting helpers that split observations into morning and afternoon at solar noon.
- Parameters:
data (pandas.DataFrame) – DataFrame with a
DatetimeIndex. Must containghi_colfor the idxmax-based detection to apply.ghi_col (str, default
"ghi_mod_csky") – Column to use as the clear-sky GHI signal.ghi_mod_cskyis the column added toCapData.databycaptest.io.load_datawhen asitedictionary is provided.default (str, default
"12:30") – Fallback clock-time string returned whenghi_colis absent fromdataor whendatais empty.
- Returns:
Clock time formatted as
"HH:MM".- Return type:
str
- Warns:
UserWarning – Emitted when
ghi_colis missing fromdata.columnsor the index is empty; thedefaultis then returned.
- captest.util.generate_irr_distribution(lowest_irr, highest_irr, rng=Generator(PCG64) at 0x740A6A954120)
Create a list of increasing values similar to POA irradiance data.
Default parameters result in increasing values where the difference between each subsquent value is randomly chosen from the typical range of steps for a POA tracker.
- Parameters:
lowest_irr (numeric) – Lowest value in the list of values returned.
highest_irr (numeric) – Highest value in the list of values returned.
rng (Numpy Random Generator) – Instance of the default Generator.
- Returns:
irr_values
- Return type:
list
- captest.util.get_agg_column_name(group_id, agg_func)
Generate a column name for an aggregated column.
- Parameters:
group_id (str) – Identifier for the group of columns being aggregated.
agg_func (str or callable) – Aggregation function used.
- Returns:
Name for the aggregated column.
- Return type:
str
- captest.util.get_common_timestep(data, units='m', string_output=True)
Get the most commonly occuring timestep of data as frequency string.
- Parameters:
data (Series or DataFrame) – Data with a DateTimeIndex.
units (str, default 'm') – String representing date/time unit, such as (D)ay, (M)onth, (Y)ear, (h)ours, (m)inutes, or (s)econds.
string_output (bool, default True) – Set to False to return a numeric value.
- Returns:
If the string_output is True and the most common timestep is an integer in the specified units then a valid pandas frequency or offset alias is returned. If string_output is false, then a numeric value is returned.
- Return type:
str or numeric
- captest.util.parse_regression_formula(formula: str) Tuple[List[str], List[str]]
Return (lhs_list, rhs_list) for formula.
Rules
Each list contains the unique raw variable names appearing on that side, sorted.
- 1 (intercept-removal) is ignored.
I(…) blocks are unwrapped; products like I(poa * t_amb) are split into their component symbols (poa, t_amb).
- param formula:
Regression formula to parse.
- type formula:
str
- returns:
Tuple of (lhs_list, rhs_list).
- rtype:
Tuple[List[str], List[str]]
- captest.util.process_reg_cols(original_calc_params, calc_params=None, key_id=None, dict_path=None, cd=None, agg_cache=None, verbose=True)
Recursively process a regression columns dictionary that includes calculated parameters.
The regression parameters dictionary attribute of CapData can be defined with a nested structure which includes tuples with two values where the first is a CapData method to calculate a new value (column of Data attribute) and the second is a dictionary of the kwargs to be passed to the function.
An example tuple: (bom_temp, {‘poa’: ‘irr_poa’, ‘temp_amb’:’temp_amb’, ‘wind_speed’:’wind_speed’})
Where bom_temp is a CapData method that accepts the kwargs poa, temp_amb, and wind_speed, which have the values (column group ids) irr_poa, temp_amb, wind_speed, respectively.
Additionally, column groups can be aggregated by specifying a tuple which contains two strings - the column group id (e.g., ‘irr_poa’) and the aggregation method (e.g. ‘mean’). This will result in the CapData.agg_group method being called and the first value in the tuple passed to the group_id kwarg and the second passed to the agg_func kwarg.
If a regression parameter key is paired with a column groups id for a column group with only a single column, then that column name will replace the column group id.
The dictionary passed to original_calc_params may be nested like this example:
- calc_params_map = {
- ‘power_tc’: (CapData.power_tc, {
‘power’: ‘real_pwr_mtr’, ‘cell_temp’: (CapData.cell_temp, {
‘poa’: (‘irr_poa’, ‘mean’), ‘bom’: (CapData.bom_temp, {
‘poa’: (‘irr_poa’, ‘mean’), ‘temp_amb’: (‘temp_amb’, ‘mean’), ‘wind_speed’: (‘wind_speed’, ‘mean’)
})
})
}),
}
This function will start at the bottom of nested dictionaries and progressively call the functions with the kwargs replacing the function tuples with the function names or the aggregated column names.
- Parameters:
original_calc_params (dict) – The original dictionary to be modified
calc_params (dict or tuple) – Deprecated. Ignored if provided.
key_id (str) – Deprecated. Ignored if provided.
dict_path (list) – Deprecated. Ignored if provided.
cd (CapData) – CapData instance that functions in original_calc_params will act on.
agg_cache (dict, optional) – Cache of already aggregated column groups to avoid redundant calls to agg_group. Keys are tuples of (group_id, agg_func) and values are the aggregated column names.
verbose (bool, default True) – Passed to the group aggregations and the parameter calculations. Set to False to prevent all summary output.
- Returns:
Modifies the original_calc_params and the data attribute of the CapData object passed to the cd argument.
- Return type:
None
- captest.util.read_json(path)
- captest.util.read_yaml(path)
- captest.util.reindex_datetime(data, file_name=None, report=False)
Find dataframe index frequency and reindex to add any missing intervals.
Sorts index of passed dataframe before reindexing.
- Parameters:
data (DataFrame) – DataFrame to be reindexed.
file_name (str, default None) – Name of file being reindexed. Used for warning message.
- Return type:
Reindexed DataFrame
- captest.util.tags_by_regex(tag_list, regex_str)
- captest.util.transform_calc_params(node, cd, agg_cache=None, verbose=True)
Recursively transform a calc_params node, returning resolved values.
This function processes a nested dictionary structure that defines regression parameters, executing aggregations and calculations as needed, and returns a flattened structure with resolved column names.
Node types handled: - dict: Transform each value recursively - tuple (str, str): Aggregation - returns aggregated column name - tuple (callable, dict): Calculation - executes function, returns function name - str: Column group ID - resolved to column name if single column - other: Passed through unchanged (e.g., numeric values)
- Parameters:
node (dict, tuple, str, or other) – The current node in the calc_params structure.
cd (CapData) – CapData instance that functions will act on.
agg_cache (dict, optional) – Cache of already aggregated column groups to avoid redundant calls. Keys are tuples of (group_id, agg_func), values are aggregated column names.
verbose (bool, default True) – Passed to aggregations and calculations. Set to False to suppress output.
- Returns:
The transformed node with all aggregations executed and calculations replaced by their function names.
- Return type:
transformed
- captest.util.update_by_path(dictionary, path, new_value=None, convert_callable=False)
Update a nested dictionary value by following a path list.
- Parameters:
dictionary (dict) – The dictionary to update
path (list) – A list representing the path to the target key
new_value (optional) – The new value to set (if None and convert_callable=True, will convert existing tuple to function name)
convert_callable (bool, optional) – If True and new_value is None, converts tuple to function name
- Returns:
updated_dictionary – The updated dictionary
- Return type:
dict
captest.calcparams module
Functions to calculate derived values from measured data.
For example, back-of-module temperature from poa, wind speed, and ambient temp with the Sandia module temperature model.
- captest.calcparams.absolute_airmass(data, apparent_zenith=None, pressure=None, pressure_scale=100, airmass_model='kastenyoung1989', verbose=True)
Compute absolute (pressure-corrected) airmass from apparent zenith.
Uses
pvlib.atmosphere.get_relative_airmass()with thekastenyoung1989model by default, then passes the result topvlib.atmosphere.get_absolute_airmass(). IfpressureisNonethe pvlib default (101325 Pa) is used; otherwise the columndata[pressure]is scaled bypressure_scale(default 100 to convert hPa/mbar to Pa) and passed through.When a
pressurecolumn is supplied, the scaled pressure values are sanity-checked against global surface-pressure records (PRESSURE_MIN_MBAR–PRESSURE_MAX_MBAR). The 5th and 95th percentiles are used to ignore isolated outliers from bad data. AUserWarningis emitted if the central 90% of values falls outside that band, which typically indicates a unit mismatch betweendata[pressure]andpressure_scale.- Parameters:
data (DataFrame) – DataFrame containing the
apparent_zenith(and optionallypressure) columns.apparent_zenith (str) – Column name for apparent zenith angle (degrees).
pressure (str or None, default None) – Column name for station pressure.
Nonefalls back to pvlib’s default sea-level pressure.pressure_scale (numeric, default 100) – Multiplier applied to
data[pressure]before passing to pvlib. Default converts hPa/mbar to Pa.airmass_model (str, default 'kastenyoung1989') – Model passed to
pvlib.atmosphere.get_relative_airmass().verbose (bool, default True) – Set to False to suppress the explanatory print message.
- Returns:
Absolute airmass indexed like
data.index.- Return type:
Series
- captest.calcparams.apparent_zenith(data, site=None, altitude_override=0, verbose=True)
Compute apparent solar zenith angle at each timestamp in
data.Wraps
pvlib.location.Location.get_solarposition()and returns theapparent_zenithcolumn aligned todata.index. Designed for use inside aCapData.regression_colscalc tuple:siteis auto-injected byCapData.custom_paramfromcd.site.Per the pvlib First Solar spectral-correction reference, the absolute airmass is computed against zenith at sea level.
altitude_overridedefaults to 0 so a deep copy ofsitehas itsloc.altitudeforced to 0 before theLocationis instantiated. The caller’ssitedict is not mutated.Night-time rows (
apparent_zenith > 90) are set to NaN so downstream airmass / spectral-factor calls do not emit pvlib warnings on invalid geometry.- Parameters:
data (DataFrame) – DataFrame with a DatetimeIndex. The index may be tz-naive or tz-aware.
site (dict) – Nested
{"loc": {...}, "sys": {...}}dict as produced byload_data(site=...). Only thelocsub-dict is consumed here. Auto-injected fromcd.sitebycustom_paramwhen used in aregression_colscalc tuple.altitude_override (numeric, default 0) – Altitude (in meters) to use when building the
pvlib.Location. Set toNoneto respectsite['loc']['altitude']unchanged.verbose (bool, default True) – Set to False to suppress the explanatory print message.
- Returns:
Apparent zenith angle (degrees) indexed like
data.indexwith a tz-naive index. NaN where the sun is below the horizon.- Return type:
Series
- captest.calcparams.apparent_zenith_pvsyst(data, site=None, altitude_override=0, shift_minutes=30, verbose=True)
Apparent solar zenith at the mid-point of each PVsyst interval.
PVsyst reports hourly values labelled at the start of each interval but computes sun positions at the interval mid-point. To match that convention we shift
data.indexforward byshift_minutesbefore callingpvlib.location.Location.get_solarposition(), then shift the resulting Series index back by the same amount so the output aligns with the originaldata.index.The site timezone should be a fixed-offset
Etc/GMT±Nstring because PVsyst data is not DST-aware.CapTest.setup()auto-convertsmeas.siteto anEtc/GMT±Nvariant when propagating it tosim.site.- Parameters:
data (DataFrame) – DataFrame with a tz-naive DatetimeIndex at the PVsyst cadence.
site (dict) – Same shape as
apparent_zenith(). Auto-injected fromcd.site.altitude_override (numeric, default 0) – See
apparent_zenith().shift_minutes (int, default 30) – Interval mid-point offset applied to
data.indexbefore the pvlib solar-position call. Set to 0 to disable the shift.verbose (bool, default True) – Set to False to suppress the explanatory print message.
- Returns:
Apparent zenith angle (degrees) indexed like
data.index.- Return type:
Series
- captest.calcparams.avg_typ_cell_temp(data, poa, cell_temp, verbose=True)
Calculate irradiance weighted cell temperature.
- Parameters:
data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.
poa (str) – Column name for POA irradiance (W/m^2).
cell_temp (str) – Column name for Cell temperature for each interval (degrees C).
- Returns:
Average irradiance-weighted cell temperature.
- Return type:
float
- captest.calcparams.bom_temp(data, poa=None, temp_amb=None, wind_speed=None, module_type='glass_cell_poly', racking='open_rack', verbose=True)
Calculate back of module temperature from measured weather data.
Calculate back of module temperature from POA irradiance, ambient temperature, wind speed (at height of 10 meters), and empirically derived heat transfer coefficients.
Equation from NREL Weather Corrected Performance Ratio Report.
- Parameters:
data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.
poa (str) – Column name for POA irradiance in W/m^2.
temp_amb (str) – Column name for Ambient temperature in degrees C.
wind_speed (str) – Column name for Measured wind speed (m/sec) corrected to measurement height of 10 meters.
module_type (str, default 'glass_cell_poly') – Any of glass_cell_poly, glass_cell_glass, or ‘poly_tf_steel’.
racking (str, default 'open_rack') – Any of ‘open_rack’, ‘close_roof_mount’, or ‘insulated_back’
- Returns:
Back of module temperatures.
- Return type:
numeric or Series
- captest.calcparams.cell_temp(data, bom, poa, module_type='glass_cell_poly', racking='open_rack', verbose=True)
Calculate cell temp from BOM temp, POA, and heat transfer coefficient.
Equation from NREL Weather Corrected Performance Ratio Report.
- Parameters:
data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.
bom (str) –
Column name for back of module temperature (degrees C). Strictly following the NREL procedure this value would be obtained from the back_of_module_temp function.
Alternatively, a measured BOM temperature may be used.
Refer to p.7 of NREL Weather Corrected Performance Ratio Report.
poa (str) – Column name for POA irradiance in W/m^2.
module_type (str, default 'glass_cell_poly') – Any of glass_cell_poly, glass_cell_glass, or ‘poly_tf_steel’.
racking (str, default 'open_rack') – Any of ‘open_rack’, ‘close_roof_mount’, or ‘insulated_back’
verbose (bool, default True) – By default prints explanation of calculation. Set to False for no output message.
- Returns:
Cell temperatures.
- Return type:
Series
- captest.calcparams.e_total(data, poa, rpoa, bifaciality=0.7, bifacial_frac=1, rear_shade=0, verbose=True)
Calculate total irradiance from POA and rear irradiance.
- Parameters:
data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.
poa (str) – Column name for POA irradiance (W/m^2).
rpoa (str) – Column name for rear irradiance (W/m^2).
bifaciality (numeric, default 0.7) – Bifaciality factor.
bifacial_frac (numeric, default 1) – Fraction of total array nameplate power that is bifacial. Pass to calculate total plane of array irradiance for plants with a mix of monofacial and bifacial modules.
rear_shade (numeric, default 0) – Fraction of rear irradiance that is lost due to shading. Set to decimal fraction, e.g. 0.12, to include in calculation of e_total.
- Returns:
Total plane of array irradiance.
- Return type:
numeric or Series
- captest.calcparams.multiply(data, a=None, b=None, verbose=True)
Elementwise multiplication of two columns.
- Parameters:
data (DataFrame) – Source DataFrame.
a (str) – Column names to multiply. Both kwarg names must not collide with any
column_groupsid, perCapData.custom_paramsemantics.b (str) – Column names to multiply. Both kwarg names must not collide with any
column_groupsid, perCapData.custom_paramsemantics.verbose (bool, default True) – Set to False to suppress the explanatory print message.
- Returns:
data[a] * data[b]indexed likedata.index.- Return type:
Series
- captest.calcparams.poa_spec_corrected(data, poa=None, spectral_correction=None, verbose=True)
Spectrally corrected plane-of-array irradiance.
Thin named alias that multiplies a POA column by a spectral-correction column. Primary use is the top-level node of a
regression_colscalc tree whosespectral_correctionkwarg is itself a calc subtree ending inspectral_factor_firstsolar().- Parameters:
data (DataFrame) – Source DataFrame.
poa (str) – Column name for plane-of-array irradiance (W/m^2).
spectral_correction (str) – Column name for the spectral correction factor.
verbose (bool, default True) – Set to False to suppress the explanatory print message.
- Returns:
data[poa] * data[spectral_correction]indexed likedata.index.- Return type:
Series
- captest.calcparams.power_temp_correct(data, power, cell_temp, power_temp_coeff=None, base_temp=25, verbose=True)
Apply temperature correction to PV power.
Divides power by the temperature correction, so low power values that are above base_temp will be increased and high power values that are below the base_temp will be decreased.
- Parameters:
data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.
power (str) – The column name of the data attribute with the power to correct.
cell_temp (str) – Name of the column in data containing the cell temperature (in Celsius) used to calculate temperature differential from the base_temp.
power_temp_coeff (numeric) – Module power temperature coefficient as percent per degree celsius. Ex. -0.36
base_temp (numeric, default 25) – Base temperature (in Celsius) to correct power to. Default is the STC of 25 degrees Celsius.
- Returns:
Power corrected for temperature.
- Return type:
Series
- captest.calcparams.precipitable_water_gueymard(data, temp_amb=None, rel_humidity=None, verbose=True)
Precipitable water (cm) from ambient temperature and relative humidity.
Wraps
pvlib.atmosphere.gueymard94_pw().- Parameters:
data (DataFrame) – DataFrame containing the ambient-temperature and relative-humidity columns.
temp_amb (str) – Column name for ambient (dry-bulb) temperature in degrees Celsius.
rel_humidity (str) – Column name for relative humidity as a percentage (0-100).
verbose (bool, default True) – Set to False to suppress the explanatory print message.
- Returns:
Precipitable water (cm) indexed like
data.index.- Return type:
Series
- captest.calcparams.rpoa_pvsyst(data, globbak='GlobBak', backshd='BackShd', verbose=True)
Calculate the sum of PVsyst’s global rear irradiance and rear shading and IAM losses.
- Parameters:
data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance containing PVsyst 8760 data.
globbak (str, default 'GlobBak') – Column name for global rear irradiance (W/m^2).
backshd (str, default 'BackShd') – Column name for rear shading and IAM losses (W/m^2).
verbose (bool, default True) – Set to False to not print calculation explanation.
- Returns:
Sum of global rear irradiance and rear shading and IAM losses.
- Return type:
Series
- captest.calcparams.scale(data, col=None, factor=1.0, verbose=True)
Multiply a single column by a scalar factor.
Generic unit-conversion / rescaling helper usable in
regression_colscalc trees. Primary use in this module is converting PVsystPrecWatfrom meters to centimeters withfactor=100.- Parameters:
data (DataFrame) – Source DataFrame.
col (str) – Column name to scale.
factor (numeric, default 1.0) – Scalar multiplier applied elementwise to
data[col].verbose (bool, default True) – Set to False to suppress the explanatory print message.
- Returns:
data[col] * factorindexed likedata.index.- Return type:
Series
- captest.calcparams.spectral_factor_firstsolar(data, precipitable_water=None, absolute_airmass=None, spectral_module_type='cdte', verbose=True)
First Solar spectral correction factor.
Wraps
pvlib.spectrum.spectral_factor_firstsolar().spectral_module_typedefaults to'cdte'but can be overridden via acd.spectral_module_typeattribute whichcustom_paramauto-injects when the kwarg is left unset.CapTestpropagates itsspectral_module_typeparam onto both CapData instances atsetup().The kwarg is named
spectral_module_type(notmodule_type) to avoid collisions with themodule_typekwarg used bybom_temp()andcell_temp(), which expects values like'glass_cell_poly'rather than the pvlib First Solar module-type strings.- Parameters:
data (DataFrame) – DataFrame containing the precipitable-water and absolute-airmass columns.
precipitable_water (str) – Column name for precipitable water in cm.
absolute_airmass (str) – Column name for absolute airmass.
spectral_module_type (str, default 'cdte') – Passed through to
pvlib.spectrum.spectral_factor_firstsolar()as itsmodule_typeargument.verbose (bool, default True) – Set to False to suppress the explanatory print message.
- Returns:
Spectral correction factor indexed like
data.index.- Return type:
Series
captest.plotting module
- class captest.plotting.ScatterBifiPowerTc(*, am_color, am_marker, cd, filtered, height, pm_color, pm_marker, split_day, split_time, tc_force_recompute, tc_mode, tc_power, tc_power_calc, timeseries, width, name)
Bases:
ScatterPlotTwo-panel scatter for the
bifi_power_tcpreset.The
bifi_power_tcregression formula ispower ~ poa + rpoawherepoweris already temperature-corrected. This subclass builds one panel per rhs variable (power vs poaandpower vs rpoa). Thetc_powerparameter is ignored here because the regression power is already tc-corrected; setting it to True emits aUserWarning.AM/PM splitting and timeseries pairing are inherited from
ScatterPlot. Whentimeseries=True, only the first panel is paired with a linked timeseries view to keep the layout sane.- name = 'ScatterBifiPowerTc'
- view()
Build a two-panel
hv.Layoutfor the bifi_power_tc preset.- Return type:
holoviews.Layout
- class captest.plotting.ScatterPlot(*, am_color, am_marker, cd, filtered, height, pm_color, pm_marker, split_day, split_time, tc_force_recompute, tc_mode, tc_power, tc_power_calc, timeseries, width, name)
Bases:
ParameterizedComposable scatter plot for
CapTestregression diagnostics.Resolves x and y from
cd.regression_formula(lhs vs first rhs) and optionally:splits points into morning / afternoon glyphs (
split_day=True),swaps the y-axis to a temperature-corrected power column (
tc_power=True, with modereplace/add_panel/overlay), and / orpairs the scatter with a linked timeseries panel (
timeseries=True).
- Parameters:
cd (CapData or None) – CapData instance whose
data/column_groups/regression_formuladrive the plot. Required at view time.filtered (bool, default True) – When True (default), pulls regression columns from
cd.data_filtered; when False, fromcd.data.split_day (bool, default False) – Render morning and afternoon points as two distinct overlaid Scatters with different colors and markers.
split_time (str or None, default None) – Clock-time override (
"HH:MM") for the AM/PM boundary. When None andsplit_day=True, the boundary is detected viacaptest.util.detect_solar_noon(idxmax of clock-time-binnedghi_mod_cskymean) with a 12:30 fallback.am_color (str, default
"#1f77b4"/"#d62728") – Glyph colors for the AM and PM Scatters whensplit_day=True.pm_color (str, default
"#1f77b4"/"#d62728") – Glyph colors for the AM and PM Scatters whensplit_day=True.am_marker (str, default
"circle"/"triangle") – Glyph markers for the AM and PM Scatters whensplit_day=True.pm_marker (str, default
"circle"/"triangle") – Glyph markers for the AM and PM Scatters whensplit_day=True.tc_power (bool, default False) – Plot against temperature-corrected power instead of (or in addition to) raw power.
tc_mode ({"replace", "add_panel", "overlay"}, default "replace") – Layout strategy when
tc_power=True.tc_power_calc (dict or None, default None) – Calc-params nested dict that produces the tc-power column. When None,
DEFAULT_TC_POWER_CALCis used (tuned for measured DAS data; sim users must override).tc_force_recompute (bool, default False) – When True, recomputes the tc-power column even if it already exists on
cd.data.timeseries (bool, default False) – Pair the principal scatter with a linked timeseries panel below. The timeseries panel overlays a thin gray curve of the full unfiltered y-series under the linked scatter of the filtered data so removed points remain visible as background context. Only valid for the single-panel
tc_modevalues (replaceandoverlay); raisesValueErrorif combined withtc_mode='add_panel'.height (int, default 400 / 500) – Pixel dimensions forwarded to the Scatter / Curve options.
width (int, default 400 / 500) – Pixel dimensions forwarded to the Scatter / Curve options.
- am_color = '#1f77b4'
- am_marker = 'circle'
- cd = None
- filtered = True
- height = 400
- name = 'ScatterPlot'
- pm_color = '#d62728'
- pm_marker = 'triangle'
- split_day = False
- split_time = None
- tc_force_recompute = False
- tc_mode = 'replace'
- tc_power = False
- tc_power_calc = None
- timeseries = False
- view()
Build and return the
hv.Layoutfor the configured options.- Returns:
A Layout whose first element is the principal scatter (a
Scatterfor the single-glyph case, anOverlaywhensplit_day=True). Additional panels appear whentc_mode='add_panel'ortimeseries=True.- Return type:
holoviews.Layout
- Raises:
ValueError – If
cdis unset, or iftimeseries=Trueis combined withtc_mode='add_panel', or iftimeseries=Trueis combined withtc_power=Trueandtc_mode='overlay'(the linked timeseries panel can only display a single y-series, so an overlaid raw + tc-power principal is ambiguous).ImportError – If
holoviewsis not installed.
- width = 500
- captest.plotting.add_am_pm_dim(df, split_time)
Tag rows of
dfas morning or afternoon based on a clock-time split.- Parameters:
df (pandas.DataFrame) – DataFrame with a
DatetimeIndex.split_time (str) – Clock-time string in
"HH:MM"format (24-hour, leading zeros optional, e.g."12:30"or"9:05"). Rows whose index time is strictly beforesplit_timeare tagged"am"; rows at or aftersplit_timeare tagged"pm".
- Returns:
Copy of
dfwith a newperiodcolumn whose values are"am"or"pm".- Return type:
pandas.DataFrame
- Raises:
ValueError – If
split_timedoes not match"HH:MM"or specifies an invalid hour/minute.
- captest.plotting.add_custom_plot(name, column_groups, group_tags, column_tags)
Append a new custom group to column groups for plotting.
- captest.plotting.calc_tc_power_column(cd, tc_power_calc, col_name='power_tc_plot', verbose=False, force_recompute=False)
Materialize a temperature-corrected power column for plotting only.
Walks
tc_power_calc(a calc-params nested dict using the same grammar asTEST_SETUPSreg_cols_*values) viacaptest.util.transform_calc_paramsand writes the resultingpower_temp_correctSeries tocd.data[col_name]andcd.data_filtered[col_name].This helper is intentionally isolated from
CapData.process_regression_columns: it does NOT touchcd.regression_cols,cd.regression_formula,cd.summary,cd.kept, orcd.removed.- Parameters:
cd (CapData) – The CapData instance whose
dataanddata_filteredwill be extended withcol_name.power_temp_coeffandbase_tempattributes (propagated byCapTest.setupfor shipped presets) are auto-injected byCapData.custom_paramif not present intc_power_calc.tc_power_calc (dict) – Calc-params nested dict mirroring the bifi_power_tc preset’s
reg_cols_meas['power']value. The outermost callable must produce a Series of temperature-corrected power values; in practice this iscalcparams.power_temp_correct. The dict must contain a top-level"power"calculation tuple.col_name (str, default
TC_POWER_PLOT_COL) – Name of the column written tocd.data/cd.data_filtered.verbose (bool, default False) – Forwarded to
transform_calc_params.force_recompute (bool, default False) – When False (default), short-circuits and returns
col_nameif the column already exists incd.data. Pass True to recompute.
- Returns:
col_name.- Return type:
str
- Raises:
KeyError – When
tc_power_calcreferences a column-group id that is missing fromcd.column_groups.ValueError – When
tc_power_calcdoes not contain a top-level"power"calculation tuple that produces a column incd.data.
- captest.plotting.filter_list(text_input, ms_to_filter, names, event=None)
Filter a multi-select widget by a regex string.
- Parameters:
text_input (pn.widgets.TextInput) – The text input widget to get the regex string from.
ms_to_filter (pn.widgets.MultiSelect) – The multi-select widget to update.
names (list of str) – The list of names to filter.
event (pn.widgets.event, optional) – Passed by the param.watch method. Not used.
- Return type:
None
- captest.plotting.find_default_groups(groups, default_groups)
Find the default groups in the list of groups.
- Parameters:
groups (list of str) – The list of groups to search for the default groups.
default_groups (list of str) – List of regex strings to use to identify default groups.
- Returns:
The default groups found in the list of groups.
- Return type:
list of str
- captest.plotting.get_resid_exog_frame(cd)
Get a DataFrame of residuals and exogenous variables from a CapData object.
- Parameters:
cd (captest.CapData) – The CapData object.
- Returns:
DataFrame with residuals and exogenous variables.
- Return type:
pd.DataFrame
- captest.plotting.group_tag_overlay(group_tags, column_tags)
Overlay curves of groups and individually selected columns.
- Parameters:
group_tags (list of str) – The tags to plot from the groups selected.
column_tags (list of str) – The tags to plot from the individually selected columns.
- captest.plotting.msel_from_column_groups(column_groups, groups=True)
Create a multi-select widget from a column groups object.
- Parameters:
column_groups (ColumnGroups) – The column groups object.
groups (bool, default True) – By default creates list of groups i.e. the keys of column_groups, otherwise creates list of individual columns i.e. the values of column_groups concatenated together.
- captest.plotting.parse_combine(combine, column_groups=None, data=None, cd=None)
Parse regex strings for identifying groups of columns or tags to combine.
- Parameters:
combine (dict) – Dictionary of group names and regex strings to use to identify groups from column groups and individual tags (columns) to combine into new groups. Keys should be strings for names of new groups. Values should be either a string or a list of two strings. If a string, the string is used as a regex to identify groups to combine. If a list, the first string is used to identify groups to combine and the second is used to identify individual tags (columns) to combine.
column_groups (ColumnGroups, optional) – The column groups object to add new groups to. Required if cd is not provided.
data (pd.DataFrame, optional) – The data to use to identify groups and columns to combine. Required if cd is not provided.
cd (captest.CapData, optional) – The captest.CapData object with the data and column_groups attributes set. Required if columng_groups and data are not provided.
- Returns:
New column groups object with new groups added.
- Return type:
- captest.plotting.plot(cd=None, cg=None, data=None, combine={'ghi_csky': '(?=.*ghi)(?=.*irr)', 'inv_sum_mtr_pwr': ['(?=.*real)(?=.*pwr)(?=.*mtr)', '(?=.*pwr)(?=.*agg)'], 'poa_csky': '(?=.*poa)(?=.*irr)', 'poa_ghi': 'irr.*(poa|ghi)$', 'temp_amb_bom': '(?=.*temp)((?=.*amb)|(?=.*bom))'}, default_groups=['inv_sum_mtr_pwr', '(?=.*real)(?=.*pwr)(?=.*inv)', '(?=.*real)(?=.*pwr)(?=.*mtr)', 'poa_ghi', 'poa_csky', 'ghi_csky', 'temp_amb_bom'], group_width=1500, group_height=250, plot_defaults_path=None, **kwargs)
Create plotting dashboard.
NOTE: If a plot defaults JSON file exists in the current working directory, the default groups will be read from that file instead of using the default_groups argument. When a cd (CapData) object is provided, the file is named
plot_defaults_{cd.name}.jsonto avoid conflicts between multiple CapData objects in the same session. Otherwise the file is namedplot_defaults.json. Use the plot_defaults_path argument to override the path. Delete or manually edit the file to change the default groups. Columns in the file that are no longer present in the data are ignored with a warning.- Parameters:
cd (captest.CapData, optional) – The captest.CapData object.
cg (captest.ColumnGroups, optional) – The captest.ColumnGroups object. data must also be provided.
data (pd.DataFrame, optional) – The data to plot. cg must also be provided.
combine (dict, optional) – Dictionary of group names and regex strings to use to identify groups from column groups and individual tags (columns) to combine into new groups. See the parse_combine function for more details.
default_groups (list of str, optional) – List of regex strings to use to identify default groups to plot. See the find_default_groups function for more details.
group_width (int, optional) – The width of the plots on the Groups tab.
group_height (int, optional) – The height of the plots on the Groups tab.
plot_defaults_path (str or Path, optional) – Path to the plot defaults JSON file. Overrides the default naming scheme. When None and cd is provided, defaults to
./plot_defaults_{cd.name}.json. When None and cd is not provided, defaults to./plot_defaults.json.**kwargs (optional) – Pass additional keyword arguments to the holoviews options of the scatter plot on the ‘Scatter’ tab.
- captest.plotting.plot_group_tag_overlay(data, group_tags, column_tags, width=1500, height=400)
Overlay curves of groups and individually selected columns.
- Parameters:
data (pd.DataFrame) – The data to plot.
group_tags (list of str) – The tags to plot from the groups selected.
column_tags (list of str) – The tags to plot from the individually selected columns.
- captest.plotting.plot_tag(data, tag, width=1500, height=250)
- captest.plotting.plot_tag_groups(data, tags_to_plot, width=1500, height=250)
Plot groups of tags, one of overlayed curves per group.
- Parameters:
data (pd.DataFrame) – The data to plot.
tags_to_plot (list) – List of lists of strings. One plot for each inner list.
- captest.plotting.scatter_dboard(data, **kwargs)
Create a dashboard to plot any two columns of data against each other.
- Parameters:
data (pd.DataFrame) – The data to plot.
**kwargs (optional) – Pass additional keyword arguments to the holoviews options of the scatter plot.
- Returns:
The dashboard with a scatter plot of the data.
- Return type:
pn.Column