captest package

Submodules

captest.capdata module

Provides the CapData class and supporting functions.

The CapData class provides methods for loading, filtering, and regressing solar data. A capacity test following the ASTM E2848 standard is orchestrated by captest.CapTest, which binds a measured and a modeled CapData instance together and exposes the cross-CapData comparison methods (captest_results, get_summary, overlay_scatters, residual_plot, determine_pass_or_fail).

class captest.capdata.CapData(name)

Bases: object

Class to store capacity test data and column grouping.

CapData objects store a pandas dataframe of measured or simulated data and a dictionary grouping columns by type of measurement.

The column_groups dictionary allows maintaining the original column names while also grouping measurements of the same type from different sensors. Many of the methods for plotting and filtering data rely on the column groupings.

Parameters:
  • name (str) – Name for the CapData object.

  • data (pandas dataframe) – Used to store measured or simulated data imported from csv.

  • data_filtered (pandas dataframe) – Holds filtered data. Filtering methods act on and write to this attribute.

  • column_groups (dictionary) – Assigned by the group_columns method, which attempts to infer the type of measurement recorded in each column of the dataframe stored in the data attribute. For each inferred measurement type, group_columns creates an abbreviated name and a list of columns that contain measurements of that type. The abbreviated names are the keys and the corresponding values are the lists of columns.

  • regression_cols (dictionary) – Dictionary identifying which columns in data or groups of columns as identified by the keys of column_groups are the independent variables of the ASTM Capacity test regression equation. Set using set_regression_cols or by directly assigning a dictionary.

  • summary_ix (list of tuples) – Holds the row index data modified by the update_summary decorator function.

  • summary (list of dicts) – Holds the data modified by the update_summary decorator function.

  • rc (DataFrame) – Dataframe for the reporting conditions (poa, t_amb, and w_vel).

  • regression_results (statsmodels linear regression model) – Holds the linear regression model object.

  • regression_formula (str) – Regression formula to be fit to measured and simulated data. Must follow the requirements of statsmodels use of patsy.

  • tolerance (str) – String representing error band. Ex. ‘+ 3’, ‘+/- 3’, ‘- 5’ There must be space between the sign and number. Number is interpreted as a percent. For example, 5 percent is 5 not 0.05.

agg_group(group_id, agg_func, verbose=True, rename_map=None, inplace=True, cutoff=10, columns=None)

Aggregate columns in a group.

Parameters:
  • group_id (str) – Key from column_groups attribute.

  • agg_func (str or callable) – Aggregation function to apply.

  • verbose (bool, default True) – Set to True to print the columns that have been aggregated, the aggregation function used, and the new column name.

  • cutoff (int, default 10) – Maximum number of columns to list individually when verbose=True. When the group contains more columns than this value, the first three and last three column names are printed with an ellipsis in between. Increase this value to see more columns listed individually.

  • columns (pd.DataFrame or None, default None) – Pre-fetched DataFrame of columns to aggregate. When provided the lookup via self._get_group is skipped. Intended for internal use by agg_sensors to avoid a redundant lookup.

agg_sensors(agg_map=None, verbose=False)

Aggregate measurments of the same variable from different sensors.

Parameters:
  • agg_map (dict, default None) – Dictionary specifying aggregations to be performed on the specified groups from the column_groups attribute. The dictionary keys should be keys from the column_gruops attribute. The dictionary values should be aggregation functions. See pandas API documentation of Computations / descriptive statistics for a list of all options. By default the groups of columns assigned to the ‘power’, ‘poa’, ‘t_amb’, and ‘w_vel’ keys in the regression_cols attribute are aggregated: - sum power - mean of poa, t_amb, w_vel

  • verbose (bool, default False) – Set to True to print the columns that have been aggregated, the aggregation function used, and the new column name. If the group being aggregated has more than 10 columns, only the group name will be printed.

Returns:

Acts in place on the data, data_filtered, and regression_cols attributes.

Return type:

None

Notes

This method is intended to be used before any filtering methods are applied. Filtering steps applied when this method is used will be lost.

This method modifies the data, data_filtered, and regression_cols attributes.

column_groups_to_excel(save_to='./column_groups.xlsx')

Export the column groups attribute to an excel file.

Parameters:

save_to (str) – File path to save column groups to. Should include .xlsx.

copy()

Create and returns a copy of self.

create_agg_attributes()

Create callable attributes for each aggregated column that return data views.

For each column in self.column_groups[‘agg’], creates an attribute on the instance that when called returns a view of the data for that column group using the loc indexer functionality.

create_column_group_attributes()

Create callable attributes for each column group that return data views.

For each key in self.column_groups, creates an attribute on the instance that when called returns a view of the data for that column group using the loc indexer functionality.

custom_param(func, *args, **kwargs)

Applies the function func with kwargs and adds result as new column to data.

Calculates and adds a new column to data using the function func with the provided arguments and keyword arguments. See the functions in the calcparams module for examples.

Called by util.process_reg_cols to add new columns to the data attribute while recursively processing and updating the regression_cols attribute.

Parameters:

func (function) – Function that takes a DataFrame as its first argument and returns a Series.

Returns:

Adds a new column to the data attribute.

Return type:

None

data_columns_to_excel(sort_by_reversed_names=True)

Write the columns of data to an excel file as a template for a column grouping.

Parameters:

sort_by_inverted_names (bool, default False) – If true sort column names after reversing them.

Returns:

Writes to excel file at self.data_loader.path / ‘column_groups.xlsx’.

Return type:

None

drop_cols(columns)

Drop columns from CapData data, data_filtered, and column_groups.

Parameters:

columns (str or list) – Column name or list of column names to drop.

empty()

Return a boolean indicating if the CapData object contains data.

expand_agg_map(agg_map)

Traverses, expands, and sorts the agg_map.

If a value of agg_map is a dictionary, the items in that dictionary are added to the returned expanded agg_map at the top level. Also, the following steps are completed to aggregate the subgroups: - The column_groups attribute is updated to add a new group with the aggregated columns from the subgroups. - This new group is added to the expanded returned agg_map after the subgroup aggregations. - The resulting aggregation of the subgroups is renamed.

For example, given the following agg_map: ```python agg_map = {

‘irr_ghi’: ‘mean’, ‘irr_poa’: {

‘irr_poa_met1’: ‘mean’, ‘irr_poa_met2’: ‘mean’

},

}

The returned expanded agg_map would be: ```python agg_map = {

‘irr_ghi’: ‘mean’, ‘irr_poa_met1’: ‘mean’, ‘irr_poa_met2’: ‘mean’, ‘irr_poa_aggs’: ‘mean’,

}

and the column_groups attribute would be updated to add the group: ‘irr_poa_aggs’: [‘irr_poa_met1_mean_agg’, ‘irr_poa_met2_mean_agg’]

The column resulting from aggregating the “irr_poa_aggs” group would be “irr_poa_aggs_mean_agg”, which is renamed to “irr_poa_mean_agg”.

param agg_map:

Dictionary specifying aggregations to be performed on the specified groups from the column_groups attribute.

type agg_map:

dict

returns:

agg_map

rtype:

dict

expanded_uncert(grp_to_term, k=1.96)

Calculate expanded uncertainty of the predicted power.

Adds instrument uncertainty and spatial uncertainty in quadrature and passes the result through the regression to calculate the Systematic Standard Uncertainty, which is then added in quadrature with the Random Standard Uncertainty of the regression and multiplied by the k factor, k.

1. Combine by adding in quadrature the spatial and instrument uncertainties for each measurand. 2. Add the absolute uncertainties from step 1 to each of the respective reporting conditions to determine a value for the reporting condition plus the uncertainty. 3. Calculate the predicted power using the RCs plus uncertainty three times i.e. calculate for each RC plus uncertainty. For example, to estimate the impact of the uncertainty of the reporting irradiance one would calculate expected power using the irradiance RC plus irradiance uncertainty at the reporting irradiance and the original temperature and wind reporting conditions that have not had any uncertainty added to them. 6. Calculate the percent difference between the three new expected power values that include uncertainty of the RCs and the expected power with the unmodified RC. 7. Take the square root of the sum of the squares of those three percent differences to obtain the Systematic Standard Uncertainty (bY).

Expects CapData to have a instrument_uncert and spatial_uncerts attributes with matching keys.

Parameters:
  • grp_to_term (dict) – Map the groups of measurement types to the term in the regression formula that was regressed against an aggregated value (typically mean) from that group.

  • k (numeric) – Coverage factor.

Return type:

Expanded uncertainty as a decimal value.

filter_clearsky(ghi_col=None, inplace=True, keep_clear=True, **kwargs)

Use pvlib detect_clearsky to remove periods with unstable irradiance.

The pvlib detect_clearsky function compares modeled clear sky ghi against measured clear sky ghi to detect periods of clear sky. Refer to the pvlib documentation for additional information.

By default uses data identified by the column_groups dictionary as ghi and modeled ghi. Issues warning if there is no modeled ghi data, or the measured ghi data has not been aggregated.

Parameters:
  • ghi_col (str, default None) – The name of a column name of measured GHI data. Overrides default attempt to automatically identify a column of GHI data.

  • inplace (bool, default True) – When true removes periods with unstable irradiance. When false returns pvlib detect_clearsky results, which by default is a series of booleans.

  • keep_clear (bool, default True) – Set to False to keep cloudy periods.

  • **kwargs – Passed to pvlib detect_clearsky. By default infer_limits is set to True, which automatically determines appropriate thresholds (including window length) based on the data’s sample interval. Pass infer_limits=False and window_length=<int> to manually control the detection parameters. See pvlib documentation for all available parameters.

filter_custom(func, *args, **kwargs)

Apply update_summary decorator to passed function.

Parameters:
  • func (function) – Any function that takes a dataframe as the first argument and returns a dataframe. Many pandas dataframe methods meet this requirement, like pd.DataFrame.between_time.

  • *args – Additional positional arguments passed to func.

  • **kwds – Additional keyword arguments passed to func.

Examples

Example use of the pandas dropna method to remove rows with missing data.

>>> das.custom_filter(pd.DataFrame.dropna, axis=0, how='any')
>>> summary = das.get_summary()
>>> summary['pts_before_filter'][0]
1424
>>> summary['pts_removed'][0]
16

Example use of the pandas between_time method to remove time periods.

>>> das.reset_filter()
>>> das.custom_filter(pd.DataFrame.between_time, '9:00', '13:00')
>>> summary = das.get_summary()
>>> summary['pts_before_filter'][0]
245
>>> summary['pts_removed'][0]
1195
>>> das.data_filtered.index[0].hour
9
>>> das.data_filtered.index[-1].hour
13
filter_days(days, drop=False, inplace=True)

Select or drop timestamps for days passed.

Parameters:
  • days (list) – List of days to select or drop.

  • drop (bool, default False) – Set to true to drop the timestamps for the days passed instead of keeping only those days.

  • inplace (bool, default True) – If inplace is true, then function overwrites the filtered dataframe. If false returns a DataFrame.

filter_irr(low, high, ref_val=None, col_name=None, inplace=True)

Filter on irradiance values.

Parameters:
  • low (float or int) – Minimum value as fraction (0.8) or absolute 200 (W/m^2).

  • high (float or int) – Max value as fraction (1.2) or absolute 800 (W/m^2).

  • ref_val (float or int or 'rep_irr') – Must provide arg when low and high are fractions. Pass 'rep_irr' to use the reporting irradiance from self.rc (set by calling rep_cond() first).

  • col_name (str, default None) – Column name of irradiance data to filter. By default uses the POA irradiance set in regression_cols attribute or average of the POA columns.

  • inplace (bool, default True) – Default true write back to data_filtered or return filtered dataframe.

Returns:

Filtered dataframe if inplace is False.

Return type:

DataFrame

filter_missing(columns=None)

Removes any rows where the regression columns contain missing data (NaNs).

Parameters:

columns (list, default None) – Subset of columns to apply dropna. By default uses the regression columns identified in the regression_cols attribute.

Returns:

Modifies data_filtered attribute.

Return type:

None

filter_op_state(op_state, mult_inv=None, inplace=True)

NOT CURRENTLY IMPLEMENTED - Filter on inverter operation state.

This filter is rarely useful in practice, but will be re-implemented if requested.

Parameters:
  • data (str) – ‘sim’ or ‘das’ determines if filter is on sim or das data

  • op_state (int) – integer inverter operating state to keep

  • mult_inv (list of tuples, [(start, stop, op_state), ...]) – List of tuples where start is the first column of an type of inverter, stop is the last column and op_state is the operating state for the inverter type.

  • inplace (bool, default True) – When True writes over current filtered dataframe. When False returns CapData object.

Returns:

Returns filtered CapData object when inplace is False.

Return type:

CapData

filter_outliers(inplace=True, **kwargs)

Apply eliptic envelope from scikit-learn to remove outliers.

Parameters:
  • inplace (bool) – Default of true writes filtered dataframe back to data_filtered attribute.

  • **kwargs – Passed to sklearn EllipticEnvelope. Contamination keyword is useful to adjust proportion of outliers in dataset. Default is 0.04.

filter_pf(pf, inplace=True)

Filter data on the power factor.

Parameters:
  • pf (float) – 0.999 or similar to remove timestamps with lower power factor values. Values greater than or equal to pf are kept.

  • inplace (bool) – Default of true writes filtered dataframe back to data_filtered attribute.

Return type:

Dataframe when inplace is False.

filter_power(power, percent=None, columns=None, inplace=True)

Remove data above the specified power threshold.

Parameters:
  • power (numeric) – If percent is none, all data equal to or greater than power is removed. If percent is not None, then power should be the nameplate power.

  • percent (None, or numeric, default None) – Data greater than or equal to percent of power is removed. Specify percentage as decimal i.e. 1% is passed as 0.01.

  • columns (None or str, default None) – By default filter is applied to the power data identified in the regression_cols attribute. Pass a column name or column group to filter on. When passing a column group the power filter is applied to each column in the group.

  • inplace (bool, default True) – Default of true writes filtered dataframe back to data_filtered attribute.

Return type:

Dataframe when inplace is false.

filter_pvsyst(inplace=True)

Filter pvsyst data for off max power point tracking operation.

This function is only applicable to simulated data generated by PVsyst. Filters the ‘IL Pmin’, IL Vmin’, ‘IL Pmax’, ‘IL Vmax’ values if they are greater than 0.

Parameters:

inplace (bool, default True) – If inplace is true, then function overwrites the filtered data. If false returns a CapData object.

Return type:

CapData object if inplace is set to False.

filter_sensors(perc_diff=None, inplace=True, row_filter=<function check_all_perc_diff_comb>)

Drop suspicious measurments by comparing values from different sensors.

This method ignores columns generated by the agg_sensors method.

Parameters:
  • perc_diff (dict) – Dictionary to specify a different threshold for each group of sensors. Dictionary keys should be translation dictionary keys and values are floats, like {‘irr-poa-’: 0.05}. By default the poa sensors as set by the regression_cols dictionary are filtered with a 5% percent difference threshold.

  • inplace (bool, default True) – If True, writes over current filtered dataframe. If False, returns CapData object.

Returns:

Returns filtered dataframe if inplace is False.

Return type:

DataFrame

filter_shade(fshdbm=1.0, query_str=None, inplace=True)

Remove data during periods of array shading.

The default behavior assumes the filter is applied to data output from PVsyst and removes all periods where values in the column ‘FShdBm’ are less than 1.0.

Use the query_str parameter when shading losses (power) rather than a shading fraction are available.

Parameters:
  • fshdbm (float, default 1.0) – The value for fractional shading of beam irradiance as given by the PVsyst output parameter FShdBm. Data is removed when the shading fraction is less than the value passed to fshdbm. By default all periods of shading are removed.

  • query_str (str) – Query string to pass to pd.DataFrame.query method. The query string should be a boolean expression comparing a column name to a numeric filter value, like ‘ShdLoss<=50’. The column name must not contain spaces.

  • inplace (bool, default True) – If inplace is true, then function overwrites the filtered dataframe. If false returns a DataFrame.

Returns:

If inplace is false returns a dataframe.

Return type:

pd.DataFrame

filter_time(start=None, end=None, drop=False, days=None, test_date=None, inplace=True, wrap_year=False)

Select data for a specified time period.

Parameters:
  • start (str or pd.Timestamp or None, default None) – Start date for data to be returned. If a string is passed it must be in format that can be converted by pandas.to_datetime. Not required if test_date and days arguments are passed. If not provided and days is also not provided, defaults to the first timestamp in data_filtered.

  • end (str or pd.Timestamp or None, default None) – End date for data to be returned. If a string is passed it must be in format that can be converted by pandas.to_datetime. Not required if test_date and days arguments are passed. If not provided and days is also not provided, defaults to the last timestamp in data_filtered.

  • drop (bool, default False) – Set to true to drop time period between start and end rather than keep it. Must supply start and end and wrap_year must be false.

  • days (int or None, default None) – Days in time period to be returned. Not required if start and end are specified.

  • test_date (str or pd.Timestamp or None, default None) – Must be format that can be converted by pandas.to_datetime. Not required if start and end are specified. Requires days argument. Time period returned will be centered on this date.

  • inplace (bool, default True) – If inplace is true, then function overwrites the filtered dataframe. If false returns a DataFrame.

  • wrap_year (bool, default False) – If true calls the wrap_year_end function. See wrap_year_end docstring for details. wrap_year_end was cntg_eoy prior to v0.7.0.

fit_regression(filter=False, inplace=True, summary=True)

Perform a regression with statsmodels on filtered data.

Parameters:
  • filter (bool, default False) – When true removes timestamps where the residuals are greater than two standard deviations. When false just calcualtes ordinary least squares regression.

  • inplace (bool, default True) – If filter is true and inplace is true, then function overwrites the filtered data for sim or das. If false returns a CapData object.

  • summary (bool, default True) – Set to false to not print regression summary.

Returns:

Returns a filtered CapData object if filter is True and inplace is False.

Return type:

CapData

get_filtering_table()

Returns DataFrame showing which filter removed each filtered time interval.

Time intervals removed are marked with a “1”. Time intervals kept are marked with a “0”. Time intervals removed by a previous filter are np.nan/blank. Columns/filters are in order they are run from left to right. The last column labeled “all_filters” shows is True for intervals that were not removed by any of the filters.

get_length_test_period()

Get length of test period.

Uses length of data unless filter_time has been run, then uses length of the kept data after filter_time was run the first time. Subsequent uses of filter_time are ignored.

Rounds up to a period of full days.

Returns:

Days in test period.

Return type:

int

get_pts_required(hrs_req=12.5)

Set number of data points required for complete test attribute.

Parameters:

hrs_req (numeric, default 12.5) – Number of hours to be represented by final filtered test data set. Default of 12.5 hours is dictated by ASTM E2848 and corresponds to 750 1-minute data points, 150 5-minute, or 50 15-minute points.

get_reg_cols(reg_vars=None, filtered_data=True)

Get regression columns renamed with keys from regression_cols.

Parameters:
  • reg_vars (list or str, default None) – By default returns all columns identified in regression_cols. A list with any combination of the keys of regression_cols is valid or pass a single key as a string.

  • filtered_data (bool, default true) – Return filtered or unfiltered data.

Return type:

DataFrame

get_summary()

Print a summary of filtering applied to the data_filtered attribute.

The summary dataframe shows the history of the filtering steps applied to the data including the timestamps remaining after each step, the timestamps removed by each step and the arguments used to call each filtering method.

If the filter arguments are cutoff, the max column width can be increased by setting pd.options.display.max_colwidth.

Parameters:

None

Return type:

Pandas DataFrame

plot(combine={'ghi_csky': '(?=.*ghi)(?=.*irr)', 'inv_sum_mtr_pwr': ['(?=.*real)(?=.*pwr)(?=.*mtr)', '(?=.*pwr)(?=.*agg)'], 'poa_csky': '(?=.*poa)(?=.*irr)', 'poa_ghi': 'irr.*(poa|ghi)$', 'temp_amb_bom': '(?=.*temp)((?=.*amb)|(?=.*bom))'}, default_groups=['inv_sum_mtr_pwr', '(?=.*real)(?=.*pwr)(?=.*inv)', '(?=.*real)(?=.*pwr)(?=.*mtr)', 'poa_ghi', 'poa_csky', 'ghi_csky', 'temp_amb_bom'], width=1500, height=250, plot_defaults_path=None, **kwargs)

Create a dashboard to explore timeseries plots of the data.

The dashboard contains three tabs: Groups, Layout, and Overlay. The first tab, Groups, presents a column of plots with a separate plot overlaying the measurements for each group of the column_groups. The groups plotted are defined by the default_groups argument.

The second tab, Layout, allows manually selecting groups to plot. The button on this tab can be used to replace the column of plots on the Groups tab with the current figure on the Layout tab. Rerun this method after clicking the button to see the new plots in the Groups tab.

The third tab, Overlay, allows picking a group or any combination of individual tags to overlay on a single plot. The list of groups and tags can be filtered using regular expressions. Adding a text id in the box and clicking Update will add the current overlay to the list of groups on the Layout tab.

NOTE: If a plot defaults JSON file exists in the current working directory, the default groups will be read from that file. The file is named plot_defaults_{self.name}.json to avoid conflicts when multiple CapData objects are used in the same session. Columns in the file that are no longer present in the data are ignored with a warning.

Parameters:
  • combine (dict, optional) – Dictionary of group names and regex strings to use to identify groups from column groups and individual tags (columns) to combine into new groups. See the parse_combine function for more details.

  • default_groups (list of str, optional) – List of regex strings to use to identify default groups to plot. See the plotting.find_default_groups function for more details.

  • width (int, optional) – The width of the plots on the Groups tab.

  • height (int, optional) – The height of the plots on the Groups tab.

  • plot_defaults_path (str or Path, optional) – Path to the plot defaults JSON file. Overrides the default naming scheme. When None, defaults to ./plot_defaults_{self.name}.json.

  • **kwargs (optional) – Additional keyword arguments are passed to the options of the scatter plot.

Return type:

Panel tabbed layout

predict_capacities(irr_filter=True, percent_filter=20, **kwargs)

Calculate expected capacities.

Parameters:
  • irr_filter (bool, default True) – When true will filter each group of data by a percentage around the reporting irradiance for that group. The data groups are determined from the reporting irradiance attribute.

  • percent_filter (float or int or tuple, default 20) – Percentage or tuple of percentages used to filter each time-period group of data around the group’s reporting irradiance. Tuple option allows specifying different percentage for below and above the reporting irradiance: (below, above).

  • **kwargs – NOTE: Should match kwargs used to calculate reporting conditions. Passed to filter_grps which passes on to pandas Grouper to control label and closed side of intervals. See pandas Grouper doucmentation for details. Default is left labeled and left closed.

print_points_summary(hrs_req=12.5)

print summary data on the number of points collected.

process_regression_columns(verbose=True)

Walk the regression column dictionary and calculate parameters.

See util.process_reg_cols for additional documentation.

Parameters:

verbose (bool, default True) – By default prints summary of aggregations and parameter calculations performed while traversing the regression_cols dictionary. Set to False to prevent all output.

reg_scatter_matrix()

Create pandas scatter matrix of regression variables.

rename_cols(column_map)

Rename columns in data, data_filtered, and column_groups.

Parameters:

column_map (dict) – Dictionary mapping old column names to new column names.

rep_cond(irr_bal=False, percent_filter=20, front_poa='poa', w_vel=None, func=None, rc_kwargs={})

Calculate reporting conditions for the current regression formula.

The calculation is formula-agnostic: the right-hand-side variables of self.regression_formula drive which columns are aggregated. Always writes the result to self.rc.

Parameters:
  • irr_bal (bool, default False) – If True, uses ReportingIrradiance to determine the reporting irradiance (front_poa). When True, the other reporting conditions are aggregated from the subset of data within the balanced irradiance band.

  • percent_filter (int, default 20) – Percentage used to define the irradiance band around the reporting irradiance when irr_bal is True. Has no effect when irr_bal is False.

  • front_poa (str, default 'poa') – Key in self.regression_cols whose column is used as the irradiance driver when irr_bal is True.

  • w_vel (numeric or None) – If not None, overrides the calculated wind speed reporting condition with this value.

  • func (dict, str, callable, or None, default None) – Passed to df.agg(...). A dict maps rhs variable names to aggregation functions (e.g. {'poa': perc_wrap(60), 't_amb': 'mean'}). When None, defaults to {var: 'mean' for var in rhs} where rhs is derived from self.regression_formula.

  • rc_kwargs (dict) – Passed to ReportingIrradiance when irr_bal is True.

Returns:

Reporting conditions are stored on self.rc as a one-row DataFrame. Use rep_cond_freq for seasonal/monthly outputs.

Return type:

None

rep_cond_freq(irr_bal=False, percent_filter=20, front_poa='poa', w_vel=None, inplace=True, func=None, freq=None, grouper_kwargs={}, rc_kwargs={})

Calculate frequency-grouped reporting conditions.

Like rep_cond but aggregates within groups defined by freq (e.g. 'MS' for month-start, '60D' for 60-day). Used for seasonal or monthly reporting tests.

Parameters:
  • irr_bal (bool, default False) – See rep_cond.

  • percent_filter (int, default 20) – See rep_cond.

  • front_poa (str, default 'poa') – See rep_cond.

  • w_vel (numeric or None) – See rep_cond.

  • inplace (bool, default True) – When True writes the multi-row RC DataFrame to self.rc; when False returns the DataFrame.

  • func (dict, str, callable, or None, default None) – See rep_cond.

  • freq (str or None) – Pandas offset alias. None falls back to single-row rep_cond behavior.

  • grouper_kwargs (dict) – Passed to pandas.Grouper.

  • rc_kwargs (dict) – Passed to ReportingIrradiance when irr_bal is True.

Returns:

Multi-row DataFrame of per-group reporting conditions when inplace=False. Otherwise stores on self.rc and returns None.

Return type:

DataFrame or None

reset_agg()

Remove aggregation columns from data and data_filtered attributes.

Does not reset filtering of data or data_filtered.

reset_filter()

Set data_filtered to data and reset filtering summary.

Parameters:

data (str) – ‘sim’ or ‘das’ determines if filter is on sim or das data.

review_column_groups()

Print column_groups with nice formatting.

scatter(filtered=True)

Create a matplotlib scatter plot of regression lhs vs. first rhs var.

Formula-agnostic: resolves the x and y columns from self.regression_formula via util.parse_regression_formula.

Parameters:

filtered (bool, default True) – Plots filtered data when True and all data when False.

Notes

Prefer CapTest.scatter_plots for non-default regression presets; it picks the right callable from TEST_SETUPS (single or multi- panel) automatically.

scatter_filters()

Returns an overlay of scatter plots of intervals removed for each filter.

A scatter plot of power vs irradiance is generated for the time intervals removed for each filtering step. Each of these plots is labeled and overlayed.

scatter_hv(timeseries=False, all_reg_columns=False)

Create a holoviews scatter plot of regression lhs vs. first rhs var.

Formula-agnostic thin wrapper around captest.captest.scatter_default (with additional timeseries-overlay support, which scatter_default does not provide). For non-default regression presets prefer CapTest.scatter_plots which picks the right callable (single or multi-panel) from TEST_SETUPS.

Parameters:
  • timeseries (bool, default False) – If True, returns a layout with the scatter plot and a linked timeseries plot of the lhs variable. Selecting points in the scatter highlights them in the timeseries.

  • all_reg_columns (bool, default False) – If True, includes every regression column in the scatter plot’s hover tooltip in addition to the x and y variables.

set_regression_cols(power='', poa='', t_amb='', w_vel='')

Create a dictionary linking the regression variables to data.

As of v0.15.0 prefer using a predefined test setup that includes a regression column dictionary or assigning a dictionary to the regression_cols attribute directly.

Links the independent regression variables to the appropriate translation keys or a column name may be used to specify a single column of data.

Sets attribute and returns nothing.

Parameters:
  • power (str) – Translation key for the power variable.

  • poa (str) – Translation key for the plane of array (poa) irradiance variable.

  • t_amb (str) – Translation key for the ambient temperature variable.

  • w_vel (str) – Translation key for the wind velocity key.

set_test_complete(pts_required)

Sets test_complete attribute.

Parameters:

pts_required (int) – Number of points required to remain after filtering for a complete test.

spatial_uncert(column_groups)

Spatial uncertainties of the independent regression variables.

Parameters:

column_groups (list) – Measurement groups to calculate spatial uncertainty.

Return type:

None, stores dictionary of spatial uncertainties as an attribute.

timeseries_filters()

Returns an overlay of scatter plots of intervals removed for each filter.

A scatter plot of power vs irradiance is generated for the time intervals removed for each filtering step. Each of these plots is labeled and overlayed.

uncertainty()

Calculate random standard uncertainty of the regression.

(SEE times the square root of the leverage of the reporting conditions).

Not fully implemented yet. Need to review and determine what actual variable should be.

class captest.capdata.FilteredLocIndexer(_capdata)

Bases: object

Class to implement __getitem__ for indexing the CapData.data_filtered dataframe.

Allows passing a column_groups key, a list of column_groups keys, or a column or list of columns of the CapData.data_filtered dataframe.

class captest.capdata.LocIndexer(_capdata)

Bases: object

Class to implement __getitem__ for indexing the CapData.data dataframe.

Allows passing a column_groups key, a list of column_groups keys, or a column or list of columns of the CapData.data dataframe.

class captest.capdata.ReportingIrradiance(df, irr_col, **param)

Bases: Parameterized

dashboard()
df = None
get_rep_irr()

Calculates the reporting irradiance.

Returns:

Float reporting irradiance and filtered dataframe.

Return type:

Tuple

irr_col = 'GlobInc'
irr_rc = 0.0
max_percent_above = 60
max_ref_irradiance = None
min_percent_below = 40
min_ref_irradiance = None
name = 'ReportingIrradiance'
percent_band = 20
plot()
poa_flt = None
points_required = 750
rc_irr_60th_perc = 0.0
save_csv(output_csv_path)

Save possible reporting irradiance data to csv file at given path.

save_plot(output_plot_path=None)

Save a plot of the possible reporting irradiances and time intervals.

Saves plot as an html file at path given.

output_plot_pathstr or Path

Path to save plot to.

total_pts = 0.0
captest.capdata.abs_diff_from_average(series, threshold)

Check each value in series <= average of other values.

Drops NaNs from series before calculating difference from average for each value.

Returns True if there is only one value in the series.

Parameters:
  • series (pd.Series) – Pandas series of values to check.

  • threshold (numeric) – Threshold value for absolute difference from average.

Return type:

bool

captest.capdata.check_all_perc_diff_comb(series, perc_diff)

Check series for pairs of values with percent difference above perc_diff.

Calculates the percent difference between all combinations of two values in the passed series and checks if all of them are below the passed perc_diff.

Parameters:
  • series (pd.Series) – Pandas series of values to check.

  • perc_diff (float) – Percent difference threshold value as decimal i.e. 5% is 0.05.

Return type:

bool

captest.capdata.csky(time_source, loc=None, sys=None, concat=True, output='both')

Calculate clear sky poa and ghi.

Parameters:
  • time_source (dataframe or DatetimeIndex) – If passing a dataframe the index of the dataframe will be used. If the index does not have a timezone the timezone will be set using the timezone in the passed loc dictionary. If passing a DatetimeIndex with a timezone it will be returned directly. If passing a DatetimeIndex without a timezone the timezone in the timezone dictionary will be used.

  • loc (dict) –

    Dictionary of values required to instantiate a pvlib Location object.

    loc = {‘latitude’: float,

    ’longitude’: float, ‘altitude’: float/int, ‘tz’: str, int, float, default ‘UTC’}

    See http://en.wikipedia.org/wiki/List_of_tz_database_time_zones for a list of valid time zones. ints and floats must be in hours from UTC.

  • sys (dict) –

    Dictionary of keywords required to create a pvlib SingleAxisTrackerMount or FixedMount.

    Example dictionaries:

    fixed_sys = {‘surface_tilt’: 20,

    ’surface_azimuth’: 180, ‘albedo’: 0.2}

    tracker_sys1 = {‘axis_tilt’: 0, ‘axis_azimuth’: 0,

    ’max_angle’: 90, ‘backtrack’: True, ‘gcr’: 0.2, ‘albedo’: 0.2}

    Refer to pvlib documentation for details.

  • concat (bool, default True) – If concat is True then returns columns as defined by return argument added to passed dataframe, otherwise returns just clear sky data.

  • output (str, default 'both') – both - returns only total poa and ghi poa_all - returns all components of poa ghi_all - returns all components of ghi all - returns all components of poa and ghi

captest.capdata.filter_grps(grps, rcs, irr_col, low, high, freq, **kwargs)

Apply irradiance filter around passsed reporting irradiances to groupby.

For each group in the grps argument the irradiance is filtered by a percentage around the reporting irradiance provided in rcs.

Parameters:
  • grps (pandas groupby) – Groupby object with time groups (months, seasons, etc.).

  • rcs (pandas DataFrame) – Dataframe of reporting conditions. Use the rep_cond method to generate a dataframe for this argument.

  • irr_col (str) – String that is the name of the column with the irradiance data.

  • low (float) – Minimum value as fraction e.g. 0.8.

  • high (float) – Max value as fraction e.g. 1.2.

  • freq (str) – Frequency to groupby e.g. ‘MS’ for month start.

  • **kwargs – Passed to pandas Grouper to control label and closed side of intervals. See pandas Grouper doucmentation for details. Default is left labeled and left closed.

Return type:

pandas groupby

captest.capdata.filter_irr(df, irr_col, low, high, ref_val=None)

Top level filter on irradiance values.

Parameters:
  • df (DataFrame) – Dataframe to be filtered.

  • irr_col (str) – String that is the name of the column with the irradiance data.

  • low (float or int) – Minimum value as fraction (0.8) or absolute 200 (W/m^2)

  • high (float or int) – Max value as fraction (1.2) or absolute 800 (W/m^2)

  • ref_val (float or int) – Must provide arg when low/high are fractions

Return type:

DataFrame

captest.capdata.fit_model(df, fml='power ~ poa + I(poa * poa) + I(poa * t_amb) + I(poa * w_vel) - 1')

Fits linear regression using statsmodels to dataframe passed.

Dataframe must be first argument for use with pandas groupby object apply method.

Parameters:
  • df (pandas dataframe) –

  • fml (str) – Formula to fit refer to statsmodels and patsy documentation for format. Default is the formula in ASTM E2848.

Return type:

Statsmodels linear model regression results wrapper object.

captest.capdata.get_tz_index(time_source, loc)

Create DatetimeIndex with timezone aligned with location dictionary.

Handles generating a DatetimeIndex with a timezone for use as an agrument to pvlib ModelChain prepare_inputs method or pvlib Location get_clearsky method.

Parameters:

time_source (Dataframe, Series, or DatetimeIndex) – If passing a Dataframe or Series, the index of the dataframe will be used. If the index does not have a timezone, the timezone will be set using the timezone in the passed loc dictionary. If passing a DatetimeIndex with a timezone, it will be returned directly. If passing a DatetimeIndex without a timezone, the timezone will be set using the timezone in the passed loc dictionary.

Return type:

DatetimeIndex with timezone

captest.capdata.index_capdata(capdata, label, filtered=True)

Like Dataframe.loc but for CapData objects.

Pass a single label or list of labels to select the columns from the data or data_filtered DataFrames. The label can be a column name, a column group key, or a regression column key.

The special label regcols will return the columns identified in regression_cols.

Parameters:
  • capdata (CapData) – The CapData object to select from.

  • label (str or list) – The label or list of labels to select from the data or data_filtered DataFrames. The label can be a column name, a column group key, or a regression column key. The special label regcols will return the columns identified in regression_cols.

  • filtered (bool, default True) – By default the method will return columns from the data_filtered DataFrame. Set to False to return columns from the data DataFrame.

Return type:

DataFrame

captest.capdata.perc_bounds(percent_filter)

Convert +/- percentage to decimals to be used to determine bounds.

Parameters:

percent_filter (float or tuple, default None) – Percentage or tuple of percentages used to filter around the reporting irradiance. Required when irr_bal is True in rep_cond.

Returns:

Decimal versions of the percent irradiance filter. 0.8 and 1.2 would be returned when passing 20 to the input.

Return type:

tuple

captest.capdata.perc_difference(x, y)

Calculate percent difference of two values.

captest.capdata.pred_summary(grps, rcs, allowance, **kwargs)

Summarize reporting conditions, predicted cap, and gauranteed cap.

This method does not calculate reporting conditions.

Parameters:
  • grps (pandas groupby object) – Solar data grouped by season or month used to calculate reporting conditions. This argument is used to fit models for each group.

  • rcs (pandas dataframe) – Dataframe of reporting conditions used to predict capacities.

  • allowance (float) – Percent allowance to calculate gauranteed capacity from predicted capacity.

Returns:

  • Dataframe of reporting conditions, model coefficients, predicted capacities

  • gauranteed capacities, and points in each grouping.

captest.capdata.predict(regs, rcs)

Calculate predicted values for given linear models and predictor values.

Evaluates the first linear model in the iterable with the first row of the predictor values in the dataframe. Passed arguments must be aligned.

Parameters:
  • regs (iterable of statsmodels regression results wrappers) –

  • rcs (pandas dataframe) – Dataframe of predictor values used to evaluate each linear model. The column names must match the strings used in the regression formuala.

Return type:

Pandas series of predicted values.

captest.capdata.predict_with_pvalue_check(cd, rc=None, pval_threshold=0.05)

Make prediction with optional p-value filtering of coefficients.

Uses model.predict() with custom params to ensure consistent behavior across pandas 2.x and 3.0+ (avoids Copy-on-Write issues).

Parameters:
  • cd (CapData) – Instance of CapData with: - regression_results attribute (fitted statsmodels results) - rc attribute (reporting conditions DataFrame), used if rc param is None

  • rc (DataFrame, optional) – Reporting conditions DataFrame. If None, uses cd.rc.

  • pval_threshold (float, default 0.05) – If provided, coefficients with p-value > threshold are set to zero before making the prediction. Set to None to skip pval check.

Returns:

Predicted value at reporting conditions.

Return type:

float

captest.capdata.pvlib_location(loc)

Create a pvlib location object.

Parameters:

loc (dict) –

Dictionary of values required to instantiate a pvlib Location object.

loc = {‘latitude’: float,

’longitude’: float, ‘altitude’: float/int, ‘tz’: str, int, float, default ‘UTC’}

See http://en.wikipedia.org/wiki/List_of_tz_database_time_zones for a list of valid time zones. ints and floats must be in hours from UTC.

Return type:

pvlib location object.

captest.capdata.pvlib_system(sys)

Create a pvlib PVSystem object.

The PVSystem will have either a FixedMount or a SingleAxisTrackerMount depending on the keys of the passed dictionary.

Parameters:

sys (dict) –

Dictionary of keywords required to create a pvlib SingleAxisTrackerMount or FixedMount, plus albedo.

Example dictionaries:

fixed_sys = {‘surface_tilt’: 20,

’surface_azimuth’: 180, ‘albedo’: 0.2}

tracker_sys1 = {‘axis_tilt’: 0, ‘axis_azimuth’: 0,

’max_angle’: 90, ‘backtrack’: True, ‘gcr’: 0.2, ‘albedo’: 0.2}

Refer to pvlib documentation for details.

Return type:

pvlib PVSystem object.

captest.capdata.round_kwarg_floats(kwarg_dict, decimals=3)

Round float values in a dictionary.

Parameters:
  • kwarg_dict (dict) –

  • decimals (int, default 3) – Number of decimal places to round to.

Returns:

Dictionary with rounded floats.

Return type:

dict

captest.capdata.run_test(cd, steps)

Apply a list of capacity test steps to a given CapData object.

A list of CapData methods is applied sequentially with the passed parameters. This method allows succintly defining a capacity test, which facilitates parametric and automatic testing.

Parameters:
  • cd (CapData) – The CapData methods will be applied to this instance of the pvcaptest CapData class.

  • steps (list of tuples) – A list of the methods to be applied and the arguments to be used. Each item in the list should be a tuple of the CapData method followed by a tuple of arguments and a dictionary of keyword arguments. If there are not args or kwargs an empty tuple or dict should be included. Example: [(CapData.filter_irr, (400, 1500), {})]

captest.capdata.sensor_filter(df, threshold, row_filter=<function check_all_perc_diff_comb>)

Check dataframe for rows with inconsistent values.

Applies check_all_perc_diff_comb function along rows of passed dataframe.

Parameters:
  • df (pandas DataFrame) –

  • perc_diff (float) – Percent difference as decimal.

captest.capdata.spans_year(start_date, end_date)

Determine if dates passed are in the same year.

Parameters:
  • start_date (pandas Timestamp) –

  • end_date (pandas Timestamp) –

Return type:

bool

captest.capdata.tstamp_kwarg_to_strings(kwarg_dict)

Convert timestamp values in dictionary to strings.

Parameters:

kwarg_dict (dict) –

Return type:

dict

captest.capdata.update_summary(func)

Decoratates the CapData class filter methods.

Updates the CapData.summary and CapData.summary_ix attributes, which are used to generate summary data by the CapData.get_summary method.

captest.capdata.wrap_seasons(df, freq)

Rearrange an 8760 so a quarterly groupby will result in seasonal groups.

Parameters:
  • df (DataFrame) – Dataframe to be rearranged.

  • freq (str) – String pandas offset alias to specify aggregattion frequency for reporting condition calculation.

Return type:

DataFrame

captest.capdata.wrap_year_end(df, start, end)

Shifts data before or after new year to form a contigous time period.

This function shifts data from the end of the year a year back or data from the begining of the year a year forward, to create a contiguous time period. Intended to be used on historical typical year data.

If start date is in dataframe, then data at the beginning of the year will be moved ahead one year. If end date is in dataframe, then data at the end of the year will be moved back one year.

cntg (contiguous); eoy (end of year)

Parameters:
  • df (pandas DataFrame) – Dataframe to be adjusted.

  • start (pandas Timestamp) – Start date for time period.

  • end (pandas Timestamp) – End date for time period.

captest.captest module

Unified test orchestrator and supporting utilities.

This module houses the CapTest class, the TEST_SETUPS registry of named regression presets, and small formatting helpers (print_results, highlight_pvals, perc_wrap) consumed by CapTest methods that compare a measured + modeled pair of CapData instances.

Import direction

At module-import time the dependency is one-way only: captest.captest -> captest.capdata. CapData is imported here at module scope so CapTest can declare meas/sim as param.ClassSelector(class_=CapData). captest.capdata does NOT import anything from this module at import time; the single-CapData helper predict_with_pvalue_check is imported lazily from within CapTest.captest_results.

class captest.captest.CapTest(**kwargs)

Bases: Parameterized

Config + state container for an ASTM E2848 capacity test.

CapTest binds a measured CapData and a modeled CapData to a named regression preset from TEST_SETUPS and holds all test-level configuration in one place. It is intentionally a config + state container rather than a runner: users still invoke ct.meas.filter_*(...), ct.meas.rep_cond(...), and ct.meas.fit_regression() by hand.

Typical workflows

  1. Programmatic:

    ct = CapTest.from_params(
        test_setup="e2848_default",
        meas=meas_cd,
        sim=sim_cd,
        ac_nameplate=125_000,
        test_tolerance="- 4",
    )
    # ``from_params`` runs ``setup()`` automatically because both meas
    # and sim were supplied as pre-built CapData instances.
    
  2. From a yaml file:

    ct = CapTest.from_yaml("./config.yaml")
    
  3. Bare + manual:

    ct = CapTest(test_setup="bifi_e2848_etotal", bifaciality=0.15)
    ct.meas = my_meas_cd
    ct.sim = my_sim_cd
    ct.setup()
    
param meas:

Measured-data CapData instance. Assigned via from_params, from_yaml, or directly.

type meas:

CapData or None

param sim:

Modeled-data CapData instance.

type sim:

CapData or None

param test_setup:

Key into TEST_SETUPS or the literal "custom". Default "e2848_default".

type test_setup:

str

param reg_fml:

If set, overrides the preset’s regression formula at setup().

type reg_fml:

str or None

param reg_cols_meas:

If set, overrides the preset’s measured regression_cols dict.

type reg_cols_meas:

dict or None

param reg_cols_sim:

If set, overrides the preset’s modeled regression_cols dict.

type reg_cols_sim:

dict or None

param rep_conditions:

If set, partial-merged onto the preset’s rep_conditions at setup(). Top-level keys replace; the nested func dict is merged one level deep so users can override only a single variable’s aggregation.

type rep_conditions:

dict or None

param rep_cond_source:

Which CapData.rc is used by captest_results. Default "meas".

type rep_cond_source:

{“meas”, “sim”}

param sim_days:

Days of simulated data used for the test. Default 30.

type sim_days:

int

param shade_filter_start:

"HH:MM" between-time strings for shade filtering.

type shade_filter_start:

str or None

param shade_filter_end:

"HH:MM" between-time strings for shade filtering.

type shade_filter_end:

str or None

param ac_nameplate:

Nameplate AC power in watts.

type ac_nameplate:

float or None

param test_tolerance:

Tolerance string forwarded to pass/fail logic. Default "- 4".

type test_tolerance:

str

param min_irr:

Irradiance filter bounds (W/m^2).

type min_irr:

float

param max_irr:

Irradiance filter bounds (W/m^2).

type max_irr:

float

param clipping_irr:

Irradiance filter bounds (W/m^2).

type clipping_irr:

float

param rep_irr_filter:

Fractional reporting-irradiance filter band in [0, 1].

type rep_irr_filter:

float

param fshdbm:

Shade filter threshold in [0, 1].

type fshdbm:

float

param irrad_stability:

Irradiance stability strategy.

type irrad_stability:

{“std”, “filter_clearsky”, “contract”}

param irrad_stability_threshold:

Threshold value for irrad_stability.

type irrad_stability_threshold:

float

param hrs_req:

Hours of data required for a complete test. Default 12.5.

type hrs_req:

float

param bifaciality:

Calc-params scalars propagated onto both CapData instances at setup(). See _downstream_attrs.

type bifaciality:

float

param power_temp_coeff:

Calc-params scalars propagated onto both CapData instances at setup(). See _downstream_attrs.

type power_temp_coeff:

float

param base_temp:

Calc-params scalars propagated onto both CapData instances at setup(). See _downstream_attrs.

type base_temp:

float

param meas_loader:

Programmatic-only data-loader callables. Default resolution when None: captest.io.load_data and captest.io.load_pvsyst respectively. Not serialized to yaml.

type meas_loader:

callable or None

param sim_loader:

Programmatic-only data-loader callables. Default resolution when None: captest.io.load_data and captest.io.load_pvsyst respectively. Not serialized to yaml.

type sim_loader:

callable or None

param meas_load_kwargs:

Plain-dict kwargs splatted into the loaders.

type meas_load_kwargs:

dict or None

param sim_load_kwargs:

Plain-dict kwargs splatted into the loaders.

type sim_load_kwargs:

dict or None

_resolved_setup

The fully-resolved TEST_SETUPS entry after setup() has run. Plain instance attribute (not a param.*) so setup() can be called multiple times.

Type:

dict or None

rep_irr_filter_low

Read-only. Lower irradiance fraction bound derived from rep_irr_filter: 1 - rep_irr_filter. For example, when rep_irr_filter=0.2 this is 0.8. Pass as low to CapData.filter_irr together with a ref_val.

Type:

float

rep_irr_filter_high

Read-only. Upper irradiance fraction bound derived from rep_irr_filter: 1 + rep_irr_filter. For example, when rep_irr_filter=0.2 this is 1.2. Pass as high to CapData.filter_irr together with a ref_val.

Type:

float

Notes

The lhs key of the regression formula is always "power" across shipped presets, even when the formula regresses a derived quantity (e.g. temperature-corrected power).

ac_nameplate = None
base_temp = 25
bifaciality = 0.0
captest_results(check_pvalues=False, pval=0.05, print_res=True)

Compute the capacity test ratio for self.meas vs self.sim.

Picks reporting conditions from self.meas.rc or self.sim.rc based on self.rep_cond_source. Uses self.ac_nameplate for the tested-capacity printout and self.test_tolerance (via self.determine_pass_or_fail) for the pass/fail result.

Parameters:
  • check_pvalues (bool, default False) – When True, coefficients with a p-value above pval are zeroed before prediction.

  • pval (float, default 0.05) – P-value cutoff used when check_pvalues is True.

  • print_res (bool, default True) – When True, prints the formatted results.

Returns:

Capacity test ratio actual / expected.

Return type:

float

captest_results_check_pvalues(print_res=False, **kwargs)

Compute cap ratio with and without p-value filtering.

Parameters:
  • print_res (bool, default False) – Forwarded to both internal captest_results calls.

  • **kwargs – Forwarded to captest_results. Do not pass check_pvalues; this method sets it explicitly for each internal call.

Returns:

Styled DataFrame with p-values and parameter values for both self.meas and self.sim. P-values >= 0.05 are highlighted.

Return type:

pandas.io.formats.style.Styler

clipping_irr = 1000
determine_pass_or_fail(cap_ratio)

Determine a pass/fail result from a capacity ratio.

Uses self.test_tolerance and self.ac_nameplate. Replaces the pre-CapTest module-level capdata.determine_pass_or_fail.

Parameters:

cap_ratio (float) – Ratio of the measured-data regression result to the modeled-data regression result.

Returns:

Pass/fail flag and the tolerance bounds string.

Return type:

tuple of (bool, str)

classmethod from_mapping(sub, *, key='captest', base_dir=None, meas_loader=None, sim_loader=None)

Construct a CapTest from an already-parsed captest sub-mapping.

Direct-handoff constructor used by downstream wrappers that mutate the captest sub-mapping in memory – applying project-specific defaults, promoting fields, injecting paths – before asking captest to validate and build the CapTest. Exposes the same validate-and-construct pipeline that from_yaml runs after reading the file, without the file read.

Parameters:
  • sub (dict) – Captest sub-mapping. Typically obtained from load_config() or assembled by a downstream wrapper. Must contain test_setup. Supported keys are declared by _CAPTEST_YAML_KEYS / _CAPTEST_OVERRIDE_KEYS. sub is not mutated.

  • key (str, default 'captest') – Purely used in error messages (e.g. “Unknown key ‘x’ under the ‘captest’ sub-mapping”). Match the top-level yaml key under which this sub-mapping would normally live so error messages point users at the right place in their config file.

  • base_dir (str, Path, or None, default None) – Base directory used to resolve relative meas_path / sim_path values in sub. If the sub-mapping contains any relative path and base_dir is None, raises ValueError. URI-scheme values in the sub-mapping (e.g. s3://bucket/path) are treated as absolute and skip resolution even though pathlib.Path.is_absolute() returns False for them. URI-scheme base_dir values are joined to relative paths via string concatenation so the scheme is preserved; local base_dir values are joined via pathlib.Path.

  • meas_loader (callable or None, optional) – Programmatic-only loader callables that override the default resolution (captest.io.load_data / captest.io.load_pvsyst). Same semantics as from_yaml().

  • sim_loader (callable or None, optional) – Programmatic-only loader callables that override the default resolution (captest.io.load_data / captest.io.load_pvsyst). Same semantics as from_yaml().

Return type:

CapTest

classmethod from_params(**kwargs)

Construct a CapTest from parameter kwargs.

Recognizes the non-param kwargs meas, sim, meas_path, sim_path in addition to every declared param.*. If both meas and meas_path are supplied the pre-built instance wins and a warning is emitted (same for sim / sim_path).

When both meas and sim end up populated, setup() is called automatically. Otherwise the partially-initialized instance is returned and the caller finishes the workflow manually.

Parameters:

**kwargs – Any declared CapTest parameter, plus meas, sim, meas_path, sim_path.

Return type:

CapTest

classmethod from_yaml(path, key='captest', meas_loader=None, sim_loader=None)

Construct a CapTest from a yaml config file.

Reads the sub-mapping at the given top-level key of the yaml file and delegates to from_mapping() with base_dir=path.parent so relative meas_path / sim_path values resolve against the yaml’s directory.

Parameters:
  • path (str or Path) – Path to a yaml file.

  • key (str, default 'captest') – Top-level key whose value is the CapTest sub-mapping.

  • meas_loader (callable or None, optional) – Programmatic-only loader callables that override the default resolution (captest.io.load_data / captest.io.load_pvsyst). Supplied here because loader callables cannot be represented in yaml. Useful for downstream wrappers that drive yaml-based construction but need a custom measured-data loader. When None the default resolution applies.

  • sim_loader (callable or None, optional) – Programmatic-only loader callables that override the default resolution (captest.io.load_data / captest.io.load_pvsyst). Supplied here because loader callables cannot be represented in yaml. Useful for downstream wrappers that drive yaml-based construction but need a custom measured-data loader. When None the default resolution applies.

Return type:

CapTest

fshdbm = 1.0
get_summary()

Concatenate self.meas.get_summary() and self.sim.get_summary().

Returns:

Filter history for both CapData instances, stacked.

Return type:

pandas.DataFrame

hrs_req = 12.5
irrad_stability = 'std'
irrad_stability_threshold = 30
max_irr = 1400
meas = None
meas_load_kwargs = None
meas_loader = None
min_irr = 400
name = 'CapTest'
overlay_scatters(expected_label='PVsyst')

Overlay the final scatter plot from self.meas and self.sim.

Builds the scatter plot for each CapData instance via the resolved preset’s scatter_plots callable, then overlays the two first-panel scatters with labels.

Parameters:

expected_label (str, default "PVsyst") – Label used for the modeled-data scatter.

Return type:

hv.Overlay

power_temp_coeff = -0.32
reg_cols_meas = None
reg_cols_sim = None
reg_fml = None
rep_cond(which='meas', **overrides)

Call cd.rep_cond with the resolved preset’s rep_conditions.

The preset’s rep_conditions dict (after any self.rep_conditions overrides from setup()) is used as the default kwargs. overrides is partial-merged on top: top-level keys replace, the nested func dict merges one level deep.

Parameters:
  • which ({'meas', 'sim'}) – Which CapData instance’s rep_cond to call.

  • **overrides – Partial-merged onto the resolved rep_conditions dict.

Returns:

cd.rep_cond writes to cd.rc.

Return type:

None

rep_cond_source = 'meas'
rep_conditions = None
rep_irr_filter = 0.2
property rep_irr_filter_high

Upper irradiance fraction bound derived from rep_irr_filter.

Equal to 1 + rep_irr_filter. Updates automatically whenever rep_irr_filter is reassigned. Pass as the high argument to CapData.filter_irr with a ref_val to filter within the reporting-irradiance band.

property rep_irr_filter_low

Lower irradiance fraction bound derived from rep_irr_filter.

Equal to 1 - rep_irr_filter. Updates automatically whenever rep_irr_filter is reassigned. Pass as the low argument to CapData.filter_irr with a ref_val to filter within the reporting-irradiance band.

residual_plot()

Overlayed residual plots for self.meas and self.sim.

Each regression exogenous variable gets its own panel showing the residuals of both CapData instances overlaid. The single-CapData helper plotting.get_resid_exog_frame stays where it is.

Return type:

hv.Layout

property resolved_setup

Return the resolved TEST_SETUPS entry or raise if setup() not run.

scatter_plots(which='meas', **kwargs)

Create the scatter plot for the active capacity-test setup.

This method is intended primarily to plot a power vs irradiance scatter plot that fits with a preset capacity test from the TEST_SETUPS defined in the captest module.

To create manual scatter plots and to see the complete list of accepted kwargs and their behavior, see the docstrings for captest.plotting.ScatterPlot and captest.plotting.ScatterBifiPowerTc. ScatterBifiPowerTc inherits most options from ScatterPlot but ignores tc_power because the bifi_power_tc regression power term is already temperature corrected.

The selected test_setup controls which plotting function is used. During setup(), the named setup is resolved from TEST_SETUPS; that resolved setup includes a scatter_plots callable matched to the setup’s regression formula. This method picks self.meas or self.sim and forwards it, plus any keyword arguments, to that callable.

Built-in setup behavior:

  • e2848_default, bifi_e2848_etotal, and e2848_spec_corrected_poa use ScatterPlot through the scatter_default / scatter_etotal wrappers. These create a formula-driven scatter of the regression left-hand-side variable against the first right-hand-side variable.

  • bifi_power_tc uses ScatterBifiPowerTc through the scatter_bifi_power_tc wrapper. This creates one panel for each right-hand-side variable in the bifacial temperature-corrected regression, typically power vs poa and power vs rpoa.

All keyword arguments are forwarded to the underlying plotting class. The most commonly used options are:

  • filtered: use data_filtered when True, otherwise data.

  • split_day and split_time: split points into AM and PM groups.

  • am_color, pm_color, am_marker, and pm_marker: customize AM / PM glyph style.

  • tc_power, tc_mode, tc_power_calc, and tc_force_recompute: show temperature-corrected power for setups whose regression still uses raw power. tc_mode can be "replace", "add_panel", or "overlay".

  • timeseries: add a linked timeseries panel below the scatter.

  • height and width: set plot dimensions.

Parameters:
  • which ({'meas', 'sim'}) – Which captest.capdata.CapData instance to plot.

  • **kwargs – Plotting options forwarded to the preset’s scatter callable.

Returns:

Scatter plot layout for the selected measured or modeled data.

Return type:

holoviews.Layout

Examples

Plot measured data with the default options:

ct.scatter_plots()

Plot modeled data, split points into AM and PM groups, and add a linked timeseries panel:

ct.scatter_plots(which="sim", split_day=True, timeseries=True)

Add a temperature-corrected power panel for a setup that uses raw power in the regression:

ct.scatter_plots(tc_power=True, tc_mode="add_panel")
setup(verbose=True)

Resolve TEST_SETUPS, propagate scalars, process regression cols.

Raises RuntimeError if meas or sim is unset. Assigns the resolved TEST_SETUPS entry to self._resolved_setup and returns self for fluent chaining.

Parameters:

verbose (bool, default True) – Forwarded to CapData.process_regression_columns.

Returns:

self, for fluent chaining.

Return type:

CapTest

shade_filter_end = None
shade_filter_start = None
sim = None
sim_days = 30
sim_load_kwargs = None
sim_loader = None
spectral_module_type = 'cdte'
test_setup = 'e2848_default'
test_tolerance = '- 4'
to_yaml(path, key='captest', merge_into_existing=True)

Serialize the curated CapTest configuration to a yaml file.

The written sub-mapping lives under the top-level key (default "captest") and contains every scalar param.* plus test_setup, any non-None override of reg_fml / reg_cols_meas / reg_cols_sim / rep_conditions, meas_path / sim_path (when the instance was constructed from paths), and non-empty meas_load_kwargs / sim_load_kwargs.

Percentile perc_wrap(N) callables inside rep_conditions['func'] are written back as "perc_N" strings so that from_yaml round-trips them. meas, sim, regression_results, _resolved_setup, and the loader callables are never serialized.

Parameters:
  • path (str or Path) – Destination yaml file.

  • key (str, default 'captest') – Top-level key under which the captest sub-mapping is written. Parametrizing this lets a single yaml hold multiple captest flavors (e.g. captest_e2848 and captest_bifi).

  • merge_into_existing (bool, default True) – When True and the destination file already exists and parses as a mapping, preserve the other top-level keys and overwrite only the sub-tree at key. When False, the destination is unconditionally replaced with a fresh mapping containing only key.

Return type:

None

captest.captest.highlight_pvals(s)

Highlight Series entries >= 0.05 with a yellow background.

Intended for use with pandas.io.formats.style.Styler.apply. Consumed by CapTest.captest_results_check_pvalues (ported in Unit 7).

captest.captest.load_config(path, key='captest')

Load and lightly validate the captest sub-mapping from a yaml file.

Parameters:
  • path (str or Path) – Path to the yaml file. Relative paths in meas_path / sim_path are resolved by callers using Path(path).parent as the base.

  • key (str, default 'captest') – Top-level key whose value is the CapTest configuration sub-mapping.

Returns:

The sub-mapping at key with string shorthands resolved. Does NOT validate against CapTest param types; CapTest.from_yaml does that.

Return type:

dict

Raises:

KeyError – If key is not present at the top level of the yaml file.

captest.captest.perc_wrap(p)

Return a callable that computes the p-th percentile of a Series.

Used to build TEST_SETUPS[...]['rep_conditions']['func'] dicts for percentile-based reporting irradiance (e.g. 60th percentile POA).

Parameters:

p (numeric) – Percentile in [0, 100].

Returns:

Function that takes a pandas Series or array-like and returns the p-th percentile using method='nearest'.

Return type:

callable

captest.captest.print_results(test_passed, expected, actual, cap_ratio, capacity, bounds)

Print formatted results of a capacity test.

Parameters:
  • test_passed (tuple of (bool, str)) – Pass/fail flag and bounds string produced by CapTest.determine_pass_or_fail (or the legacy module-level determine_pass_or_fail in capdata.py until Unit 7 removes it).

  • expected (float) – Predicted modeled test output at reporting conditions.

  • actual (float) – Predicted measured test output at reporting conditions.

  • cap_ratio (float) – Capacity test ratio (actual / expected).

  • capacity (float) – Tested capacity (nameplate * cap_ratio).

  • bounds (str) – Human-readable bounds string for the test tolerance.

captest.captest.resolve_test_setup(name, overrides=None)

Resolve a preset by name plus optional overrides.

Parameters:
  • name (str) – Key into TEST_SETUPS or the literal "custom".

  • overrides (dict or None) – Optional dict with any of reg_cols_meas, reg_cols_sim, reg_fml, scatter_plots, rep_conditions to override the preset. rep_conditions is partial-merged; other keys replace. When name == "custom", reg_cols_meas, reg_cols_sim, and reg_fml are required in overrides.

Returns:

A fully-validated entry dict suitable for CapTest._resolved_setup.

Return type:

dict

captest.captest.scatter_bifi_power_tc(cd, **kwargs)

Two-panel layout: lhs vs. poa and lhs vs. rpoa.

Intended for the bifi_power_tc preset whose regression formula is power ~ poa + rpoa (with power resolved to the temperature-corrected calculated column). Thin wrapper around captest.plotting.ScatterBifiPowerTc; each rhs variable gets its own panel.

captest.captest.scatter_default(cd, **kwargs)

Formula-agnostic scatter of regression lhs vs. first rhs variable.

Thin wrapper around captest.plotting.ScatterPlot. Forwards every keyword argument through to the class constructor, so callers can opt into the AM/PM split, temperature-corrected power, and timeseries-pairing features without changing call sites.

Parameters:
  • cd (CapData) – Must have regression_formula set and regression_cols resolved (e.g. via CapTest.setup() or cd.process_regression_columns()).

  • **kwargs – Forwarded to ScatterPlot. See its docstring for the full parameter surface.

Returns:

A single-panel Layout wrapping the scatter plot.

Return type:

hv.Layout

captest.captest.scatter_etotal(cd, **kwargs)

Single scatter of regression lhs vs. the e_total column.

Intended for the bifi_e2848_etotal preset. Thin wrapper around captest.plotting.ScatterPlot; resolves the x column from cd.regression_cols['poa'] after process_regression_columns has materialized the calculated e_total column.

captest.captest.validate_test_setup(entry)

Validate a single TEST_SETUPS entry dict.

Raises:
  • KeyError – If required keys are missing or unknown keys are present.

  • ValueError – If reg_fml does not parse, lhs+rhs are not subsets of both reg_cols_meas and reg_cols_sim, scatter_plots is not callable, or rep_conditions / rep_conditions['func'] have an unexpected shape.

captest.columngroups module

class captest.columngroups.ColumnGroups(dict=None, /, **kwargs)

Bases: UserDict

captest.columngroups.group_columns(data)

Create a dict of raw column names paired to categorical column names.

Uses multiple type_def formatted dictionaries to determine the type, sub-type, and equipment type for data series of a dataframe. The determined types are concatenated to a string used as a dictionary key with a list of one or more original column names as the paired value.

Parameters:

data (DataFrame) – Data with columns to group.

Returns:

cg

Return type:

ColumnGroups

captest.columngroups.series_type(series, type_defs)

Assign columns to a category by analyzing the column names.

The type_defs parameter is a dictionary which defines search strings for each key, where the key is a categorical name and the search strings are possible related names. For example an irradiance sensor has the key ‘irr’ with search strings ‘irradiance’ ‘plane of array’, ‘poa’, etc.

Parameters:
  • series (pandas series) – Row or column of dataframe passed by pandas.df.apply.

  • type_defs (dictionary) – Dictionary with the following structure. See type_defs {‘category abbreviation’: [category search strings]}

Returns:

Returns a string representing the category for the series.

Return type:

string

captest.io module

class captest.io.DataLoader(path: str = './data/', loc: dict | None = None, sys: dict | None = None, file_reader: object = <function file_reader>, files_to_load: list | None = None, failed_to_load: list | None = None)

Bases: object

Class to load SCADA data and return a CapData object.

Supports loading from local filesystems and S3 buckets. The optional``s3fs`` package must be installed for S3 support.

drop_duplicate_rows()
failed_to_load: list | None = None
file_reader(**kwargs)

Read measured solar data from a csv file.

Utilizes pandas read_csv to import measure solar data from a csv file. Attempts a few different encodings, tries to determine the header end by looking for a date in the first column, and concatenates column headings to a single string.

Parameters:
  • path (Path) – Path to file to import.

  • **kwargs – Use to pass additional kwargs to pandas read_csv.

Return type:

pandas DataFrame

files_to_load: list | None = None
join_files()

Combine the DataFrames of loaded_files into a single DataFrame.

Checks if the columns of each DataFrame in loaded_files matches. If they do all match, then they will be combined vertically along the index.

If they do not match, then they will be combined by creating a datetime index that begins with the earliest datetime in all the indices to the latest datetime in all the indices using the most common frequency across all the indices. The columns will be a set of all the columns.

Returns:

data – The combined data.

Return type:

DataFrame

load(extension='csv', summary=True, verbose=False, raise_errors=False, skip_dir_load=False, **kwargs)

Load file(s) of timeseries data from SCADA / DAS systems.

Set path to the path to a file to load a single file. Set path to the path to a directory of files to load all the files in the directory ending in “csv”. Or, set files_to_load to a list of specific files to load. Paths may be local filesystem paths or S3 URIs (e.g. s3://bucket/path/).

Multiple files will be joined together and may include files with different column headings. When multiple files with matching column headings are loaded, the individual files will be reindexed and then joined.

Missing time intervals within the individual files will be filled, but missing time intervals between the individual files will not be filled.

When loading multiple files they will be stored in loaded_files, a dictionary, mapping the file names to a dataframe for each file.

Parameters:
  • extension (str, default "csv") – Change the extension to allow loading different filetypes. Must also set the file_reader attribute to a function that will read that type of file. Do not include a period “.”.

  • summary (bool, default True) – By default prints path of each file attempted to load and then confirmation it was loaded or states it failed to load. Is only relevant if path is set to a directory not a file. Set to False to not print out any file loading status.

  • verbose (bool, default False) – Prints same output as if summary were True (sets summary True) and prints details of reindexing each file after loading.

  • raise_errors (bool, default False) – Set to true to raise error if file fails to load.

  • skip_dir_load (bool, default False) – Set to True to pass a custom file_reader that handles multiple files. This will skip the parsing of files in a directory and pass the path to the directory and kwargs to the file_reader function.

  • **kwargs – Are passed through to the file_reader callable, which by default will pass them on to pandas.read_csv.

Returns:

Resulting DataFrame of data is stored to the data attribute.

Return type:

None

loc: dict | None = None
path: str = './data/'
reindex()
reindex_loaded_files(verbose=False)

Reindex files to ensure no missing indices and find frequency for each file.

Parameters:

verbose (bool, default False) – Set to True for more detailed output.

Returns:

  • reindexed_dfs (dict) – Filenames mapped to reindexed DataFrames.

  • common_freq (str) – The index frequency most common across the reindexed DataFrames.

  • file_frequencies (list) – The index frequencies for each file.

set_files_to_load(extension='csv')

Set files_to_load attribute to a list of filepaths.

sort_data()
sys: dict | None = None
captest.io.file_reader(path, **kwargs)

Read measured solar data from a csv file.

Utilizes pandas read_csv to import measure solar data from a csv file. Attempts a few different encodings, tries to determine the header end by looking for a date in the first column, and concatenates column headings to a single string.

Parameters:
  • path (Path) – Path to file to import.

  • **kwargs – Use to pass additional kwargs to pandas read_csv.

Return type:

pandas DataFrame

captest.io.flatten_multi_index(columns)
captest.io.load_data(path, group_columns=<function group_columns>, file_reader=<function file_reader>, skip_dir_load=False, name='meas', sort=True, drop_duplicates=True, reindex=True, site=None, column_groups_template=False, verbose=False, **kwargs)

Load file(s) of timeseries data from SCADA / DAS systems.

This is a convenience function to generate an instance of DataLoader and call the load method.

A single file or multiple files can be loaded. Multiple files will be joined together and may include files with different column headings.

Parameters:
  • path (str) – Path to either a single file to load or a directory of files to load. Supports local paths and S3 URIs (e.g. s3://bucket/path/).

  • group_columns (function or str, default columngroups.group_columns) – Function to use to group the columns of the loaded data. Function should accept a DataFrame and return a dictionary with keys that are ids and values that are lists of column names. Will be set to the group_columns attribute of the CapData.DataLoader object. Provide a string to load column grouping from a json, yaml, or excel file. The json or yaml file should parse to a dictionary and the excel file should have two columns with the first column containing the group ids and the second column the column names. The first column may have missing values. See function load_excel_column_groups for more details.

  • file_reader (function, default io.file_reader) – Function to use to load an individual file. By default will use the built in file_reader function to try to load csv files. If passing a function to read other filetypes, the kwargs should include the filetype extension e.g. ‘parquet’.

  • skip_dir_load (bool, default False) – Set to True to pass a custom file_reader that handles multiple files. This will skip the parsing of files in a directory by DataLoader.load and allow the function passed to file_reader to handle multiple files in a directory.

  • name (str) – Identifier that will be assigned to the returned CapData instance.

  • sort (bool, default True) – By default sorts the data by the datetime index from old to new.

  • drop_duplicates (bool, default True) – By default drops rows of the joined data where all the columns are duplicates of another row. Keeps the first instance of the duplicated values. This is helpful if individual data files have overlapping rows with the same data.

  • reindex (bool, default True) – By default will create a new index for the data using the earliest datetime, latest datetime, and the most frequent time interval ensuring there are no missing intervals.

  • site (dict or str, default None) – Pass a dictionary or path to a json or yaml file containing site data, which will be used to generate modeled clear sky ghi and poa values. The clear sky irradiance values are added to the data and the column_groups attribute is updated to include these two irradiance columns. The site data dictionary should be {sys: {system data}, loc: {location data}}. See the capdata.csky documentation for the format of the system data and location data.

  • column_groups_template (bool, default False) – If True, will call CapData.data_columns_to_excel to save a file to use to manually create column groupings at path.

  • verbose (bool, default False) – Set to True to print status of file loading.

  • **kwargs – Passed to DataLoader.load. Any kwargs not used by DataLoader.load are passed to the file_reader function, which by default passes them to pandas.read_csv. DataLoader.load accepts a summary kwarg to show files loaded from a directory without reindexing status shown when verbose is set to True.

captest.io.load_excel_column_groups(path)

Load column groups from an excel file.

The excel file should have two columns with no heder. The first column contains the group names and the second column contain the the column names of the data. The first column may have blanks rathe than repeating the group name for each column in the group.

For example: group1, col1

, col2 , col3

group2, col4

, col5

Parameters:

path (str) – Path to file to import.

Returns:

Dictionary mapping column group names to lists of column names.

Return type:

dict

captest.io.load_pvsyst(path, name='pvsyst', egrid_unit_adj_factor=None, set_regression_columns=True, **kwargs)

Load data from a PVsyst energy production model.

Will load day first or month first dates. Expects files that use a comma as a separator rather than a semicolon.

Parameters:
  • path (str) – Path to file to import.

  • name (str, default pvsyst) – Name to assign to returned CapData object.

  • egrid_unit_adj_factor (numeric, default None) – E_Grid will be divided by the value passed.

  • set_regression_columns (bool, default True) – By default sets power to E_Grid, poa to GlobInc, t_amb to T Amb, and w_vel to WindVel. Set to False to not set regression columns on load.

  • **kwargs – Use to pass additional kwargs to pandas read_csv. Pass sep=’;’ to load files that use semicolons instead of commas as the separator.

Return type:

CapData

Notes

Standardizes the ambient temperature column name to T_Amb. v6.63 of PVsyst used “T Amb”, v.6.87 uses “T_Amb”, and v7.2 uses “T_Amb”. Will change ‘T Amb’ or ‘TAmb’ to ‘T_Amb’ if found in the column names.

captest.prtest module

class captest.prtest.PrResults(*, dc_nameplate, expected_pr, input_data, pr, results_data, timestep, name)

Bases: Parameterized

Results from a PR calculation.

dc_nameplate = 0.0
expected_pr = 0.0
input_data = None
name = 'PrResults'
pr = 0.0
print_pr_result()

Print summary of PR result - passing / failing and by how much

results_data = None
timestep = (0, 0)
captest.prtest.perf_ratio(ac_energy, dc_nameplate, poa, unit_adj=1, degradation=0, year=1, availability=1)

Calculate performance ratio.

Parameters:
  • ac_energy (Series) – Measured energy production (Wh) from system meter.

  • dc_nameplate (numeric) – Summation of nameplate ratings (W) for all installed modules of system under test.

  • poa (Series) – POA irradiance (W/m^2) for each time interval of the test.

  • unit_adj (numeric, default 1) – Scale factor to adjust units of ac_energy. For exmaple pass 1000 to convert measured energy from kWh to Wh within PR calculation.

  • degradation (numeric, default None) – Apply a derate (percent, Ex: 0.5%) for degradation to the expected power (denominator). Must also pass specify a value for the year argument. NOTE: Percent is divided by 100 to convert to decimal within function.

  • year (numeric) – Year of operation to use in degradation calculation.

  • availability (numeric or Series, default 1) – Apply an adjustment for plant availability to the expected power (denominator).

Returns:

Instance of class PrResults.

Return type:

PrResults

captest.prtest.perf_ratio_inputs_ok(ac_energy, dc_nameplate, poa, availability=1)

Check types of perf_ratio arguments.

Parameters:
  • ac_energy (Series) – Measured energy production (Wh) from system meter.

  • dc_nameplate (numeric) – Summation of nameplate ratings (W) for all installed modules of system under test.

  • poa (Series) – POA irradiance (W/m^2) for each time interval of the test.

  • availability (numeric or Series, default 1) – Apply an adjustment for plant availability to the expected power (denominator).

captest.prtest.perf_ratio_temp_corr_nrel(ac_energy, dc_nameplate, poa, power_temp_coeff=None, temp_bom=None, temp_amb=None, single_irr_weighted_temp=False, wind_speed=None, base_temp=25, module_type='glass_cell_poly', racking='open_rack', unit_adj=1, degradation=None, year=None, availability=1)

Calculate performance ratio.

Parameters:
  • ac_energy (Series) – Measured energy production (kWh) from system meter.

  • dc_nameplate (numeric) – Summation of nameplate ratings (W) for all installed modules of system under test.

  • poa (Series) – POA irradiance (W/m^2) for each time interval of the test.

  • power_temp_coeff (numeric, default None) – Module power temperature coefficient as percent per degree celsius. Ex. -0.36

  • temp_bom (Series) – Measured back of module temperature. The temp_amb and wind_speed arguments are not used if this argument is not None; skips calculating BOM temps from ambient temperature, wind speed, and POA irradiance.

  • single_irr_weighted_temp (bool, default False) – Set to True to calculate a single irradiance weighted temperature to use when temperature correcting the power. Some contract language calls for this but it does not follow the calculation defined in the NREL paper.

  • temp_amb (Series) – Ambient temperature (degrees C) measurements.

  • wind_speed (Series) – Measured wind speed (m/sec) corrected to measurement height of 10 meters.

  • base_temp (numeric, default 25) – Base temperature (in Celsius) to correct power to. Default is the STC of 25 degrees Celsius. The NREL Weather-Corrected Performance Ratio technical report uses the term ‘Tcell_typ_avg’ for this value.

  • module_type (str, default 'glass_cell_poly') – Any of glass_cell_poly, glass_cell_glass, or ‘poly_tf_steel’.

  • racking (str, default 'open_rack') – Any of ‘open_rack’, ‘close_roof_mount’, or ‘insulated_back’

  • unit_adj (numeric, default 1) – Scale factor to adjust units of ac_energy. For exmaple pass 1000 to convert measured energy from kWh to Wh within PR calculation.

  • degradation (numeric, default None) – NOT IMPLEMENTED Apply a derate for degradation to the expected power (denominator). Must also pass specify a value for the year argument.

  • year (numeric) – NOT IMPLEMENTED Year of operation to use in degradation calculation.

  • availability (numeric or Series, default 1) – NOT IMPLEMENTED Apply an adjustment for plant availability to the expected power (denominator).

captest.util module

captest.util.append_tags(sel_tags, tags, regex_str)
captest.util.detect_solar_noon(data, ghi_col='ghi_mod_csky', default='12:30')

Estimate a single representative solar-noon clock time from clear-sky GHI.

Groups data[ghi_col] by the clock time of each timestamp (hour and minute, ignoring date), takes the mean of each clock-time bucket, and returns the bucket with the largest mean formatted as "HH:MM".

Used by plotting helpers that split observations into morning and afternoon at solar noon.

Parameters:
  • data (pandas.DataFrame) – DataFrame with a DatetimeIndex. Must contain ghi_col for the idxmax-based detection to apply.

  • ghi_col (str, default "ghi_mod_csky") – Column to use as the clear-sky GHI signal. ghi_mod_csky is the column added to CapData.data by captest.io.load_data when a site dictionary is provided.

  • default (str, default "12:30") – Fallback clock-time string returned when ghi_col is absent from data or when data is empty.

Returns:

Clock time formatted as "HH:MM".

Return type:

str

Warns:

UserWarning – Emitted when ghi_col is missing from data.columns or the index is empty; the default is then returned.

captest.util.generate_irr_distribution(lowest_irr, highest_irr, rng=Generator(PCG64) at 0x740A6A954120)

Create a list of increasing values similar to POA irradiance data.

Default parameters result in increasing values where the difference between each subsquent value is randomly chosen from the typical range of steps for a POA tracker.

Parameters:
  • lowest_irr (numeric) – Lowest value in the list of values returned.

  • highest_irr (numeric) – Highest value in the list of values returned.

  • rng (Numpy Random Generator) – Instance of the default Generator.

Returns:

irr_values

Return type:

list

captest.util.get_agg_column_name(group_id, agg_func)

Generate a column name for an aggregated column.

Parameters:
  • group_id (str) – Identifier for the group of columns being aggregated.

  • agg_func (str or callable) – Aggregation function used.

Returns:

Name for the aggregated column.

Return type:

str

captest.util.get_common_timestep(data, units='m', string_output=True)

Get the most commonly occuring timestep of data as frequency string.

Parameters:
  • data (Series or DataFrame) – Data with a DateTimeIndex.

  • units (str, default 'm') – String representing date/time unit, such as (D)ay, (M)onth, (Y)ear, (h)ours, (m)inutes, or (s)econds.

  • string_output (bool, default True) – Set to False to return a numeric value.

Returns:

If the string_output is True and the most common timestep is an integer in the specified units then a valid pandas frequency or offset alias is returned. If string_output is false, then a numeric value is returned.

Return type:

str or numeric

captest.util.parse_regression_formula(formula: str) Tuple[List[str], List[str]]

Return (lhs_list, rhs_list) for formula.

Rules

  • Each list contains the unique raw variable names appearing on that side, sorted.

  • - 1 (intercept-removal) is ignored.

  • I(…) blocks are unwrapped; products like I(poa * t_amb) are split into their component symbols (poa, t_amb).

param formula:

Regression formula to parse.

type formula:

str

returns:

Tuple of (lhs_list, rhs_list).

rtype:

Tuple[List[str], List[str]]

captest.util.process_reg_cols(original_calc_params, calc_params=None, key_id=None, dict_path=None, cd=None, agg_cache=None, verbose=True)

Recursively process a regression columns dictionary that includes calculated parameters.

The regression parameters dictionary attribute of CapData can be defined with a nested structure which includes tuples with two values where the first is a CapData method to calculate a new value (column of Data attribute) and the second is a dictionary of the kwargs to be passed to the function.

An example tuple: (bom_temp, {‘poa’: ‘irr_poa’, ‘temp_amb’:’temp_amb’, ‘wind_speed’:’wind_speed’})

Where bom_temp is a CapData method that accepts the kwargs poa, temp_amb, and wind_speed, which have the values (column group ids) irr_poa, temp_amb, wind_speed, respectively.

Additionally, column groups can be aggregated by specifying a tuple which contains two strings - the column group id (e.g., ‘irr_poa’) and the aggregation method (e.g. ‘mean’). This will result in the CapData.agg_group method being called and the first value in the tuple passed to the group_id kwarg and the second passed to the agg_func kwarg.

If a regression parameter key is paired with a column groups id for a column group with only a single column, then that column name will replace the column group id.

The dictionary passed to original_calc_params may be nested like this example:

calc_params_map = {
‘power_tc’: (CapData.power_tc, {

‘power’: ‘real_pwr_mtr’, ‘cell_temp’: (CapData.cell_temp, {

‘poa’: (‘irr_poa’, ‘mean’), ‘bom’: (CapData.bom_temp, {

‘poa’: (‘irr_poa’, ‘mean’), ‘temp_amb’: (‘temp_amb’, ‘mean’), ‘wind_speed’: (‘wind_speed’, ‘mean’)

})

})

}),

}

This function will start at the bottom of nested dictionaries and progressively call the functions with the kwargs replacing the function tuples with the function names or the aggregated column names.

Parameters:
  • original_calc_params (dict) – The original dictionary to be modified

  • calc_params (dict or tuple) – Deprecated. Ignored if provided.

  • key_id (str) – Deprecated. Ignored if provided.

  • dict_path (list) – Deprecated. Ignored if provided.

  • cd (CapData) – CapData instance that functions in original_calc_params will act on.

  • agg_cache (dict, optional) – Cache of already aggregated column groups to avoid redundant calls to agg_group. Keys are tuples of (group_id, agg_func) and values are the aggregated column names.

  • verbose (bool, default True) – Passed to the group aggregations and the parameter calculations. Set to False to prevent all summary output.

Returns:

Modifies the original_calc_params and the data attribute of the CapData object passed to the cd argument.

Return type:

None

captest.util.read_json(path)
captest.util.read_yaml(path)
captest.util.reindex_datetime(data, file_name=None, report=False)

Find dataframe index frequency and reindex to add any missing intervals.

Sorts index of passed dataframe before reindexing.

Parameters:
  • data (DataFrame) – DataFrame to be reindexed.

  • file_name (str, default None) – Name of file being reindexed. Used for warning message.

Return type:

Reindexed DataFrame

captest.util.tags_by_regex(tag_list, regex_str)
captest.util.transform_calc_params(node, cd, agg_cache=None, verbose=True)

Recursively transform a calc_params node, returning resolved values.

This function processes a nested dictionary structure that defines regression parameters, executing aggregations and calculations as needed, and returns a flattened structure with resolved column names.

Node types handled: - dict: Transform each value recursively - tuple (str, str): Aggregation - returns aggregated column name - tuple (callable, dict): Calculation - executes function, returns function name - str: Column group ID - resolved to column name if single column - other: Passed through unchanged (e.g., numeric values)

Parameters:
  • node (dict, tuple, str, or other) – The current node in the calc_params structure.

  • cd (CapData) – CapData instance that functions will act on.

  • agg_cache (dict, optional) – Cache of already aggregated column groups to avoid redundant calls. Keys are tuples of (group_id, agg_func), values are aggregated column names.

  • verbose (bool, default True) – Passed to aggregations and calculations. Set to False to suppress output.

Returns:

The transformed node with all aggregations executed and calculations replaced by their function names.

Return type:

transformed

captest.util.update_by_path(dictionary, path, new_value=None, convert_callable=False)

Update a nested dictionary value by following a path list.

Parameters:
  • dictionary (dict) – The dictionary to update

  • path (list) – A list representing the path to the target key

  • new_value (optional) – The new value to set (if None and convert_callable=True, will convert existing tuple to function name)

  • convert_callable (bool, optional) – If True and new_value is None, converts tuple to function name

Returns:

updated_dictionary – The updated dictionary

Return type:

dict

captest.calcparams module

Functions to calculate derived values from measured data.

For example, back-of-module temperature from poa, wind speed, and ambient temp with the Sandia module temperature model.

captest.calcparams.absolute_airmass(data, apparent_zenith=None, pressure=None, pressure_scale=100, airmass_model='kastenyoung1989', verbose=True)

Compute absolute (pressure-corrected) airmass from apparent zenith.

Uses pvlib.atmosphere.get_relative_airmass() with the kastenyoung1989 model by default, then passes the result to pvlib.atmosphere.get_absolute_airmass(). If pressure is None the pvlib default (101325 Pa) is used; otherwise the column data[pressure] is scaled by pressure_scale (default 100 to convert hPa/mbar to Pa) and passed through.

When a pressure column is supplied, the scaled pressure values are sanity-checked against global surface-pressure records (PRESSURE_MIN_MBARPRESSURE_MAX_MBAR). The 5th and 95th percentiles are used to ignore isolated outliers from bad data. A UserWarning is emitted if the central 90% of values falls outside that band, which typically indicates a unit mismatch between data[pressure] and pressure_scale.

Parameters:
  • data (DataFrame) – DataFrame containing the apparent_zenith (and optionally pressure) columns.

  • apparent_zenith (str) – Column name for apparent zenith angle (degrees).

  • pressure (str or None, default None) – Column name for station pressure. None falls back to pvlib’s default sea-level pressure.

  • pressure_scale (numeric, default 100) – Multiplier applied to data[pressure] before passing to pvlib. Default converts hPa/mbar to Pa.

  • airmass_model (str, default 'kastenyoung1989') – Model passed to pvlib.atmosphere.get_relative_airmass().

  • verbose (bool, default True) – Set to False to suppress the explanatory print message.

Returns:

Absolute airmass indexed like data.index.

Return type:

Series

captest.calcparams.apparent_zenith(data, site=None, altitude_override=0, verbose=True)

Compute apparent solar zenith angle at each timestamp in data.

Wraps pvlib.location.Location.get_solarposition() and returns the apparent_zenith column aligned to data.index. Designed for use inside a CapData.regression_cols calc tuple: site is auto-injected by CapData.custom_param from cd.site.

Per the pvlib First Solar spectral-correction reference, the absolute airmass is computed against zenith at sea level. altitude_override defaults to 0 so a deep copy of site has its loc.altitude forced to 0 before the Location is instantiated. The caller’s site dict is not mutated.

Night-time rows (apparent_zenith > 90) are set to NaN so downstream airmass / spectral-factor calls do not emit pvlib warnings on invalid geometry.

Parameters:
  • data (DataFrame) – DataFrame with a DatetimeIndex. The index may be tz-naive or tz-aware.

  • site (dict) – Nested {"loc": {...}, "sys": {...}} dict as produced by load_data(site=...). Only the loc sub-dict is consumed here. Auto-injected from cd.site by custom_param when used in a regression_cols calc tuple.

  • altitude_override (numeric, default 0) – Altitude (in meters) to use when building the pvlib.Location. Set to None to respect site['loc']['altitude'] unchanged.

  • verbose (bool, default True) – Set to False to suppress the explanatory print message.

Returns:

Apparent zenith angle (degrees) indexed like data.index with a tz-naive index. NaN where the sun is below the horizon.

Return type:

Series

captest.calcparams.apparent_zenith_pvsyst(data, site=None, altitude_override=0, shift_minutes=30, verbose=True)

Apparent solar zenith at the mid-point of each PVsyst interval.

PVsyst reports hourly values labelled at the start of each interval but computes sun positions at the interval mid-point. To match that convention we shift data.index forward by shift_minutes before calling pvlib.location.Location.get_solarposition(), then shift the resulting Series index back by the same amount so the output aligns with the original data.index.

The site timezone should be a fixed-offset Etc/GMT±N string because PVsyst data is not DST-aware. CapTest.setup() auto-converts meas.site to an Etc/GMT±N variant when propagating it to sim.site.

Parameters:
  • data (DataFrame) – DataFrame with a tz-naive DatetimeIndex at the PVsyst cadence.

  • site (dict) – Same shape as apparent_zenith(). Auto-injected from cd.site.

  • altitude_override (numeric, default 0) – See apparent_zenith().

  • shift_minutes (int, default 30) – Interval mid-point offset applied to data.index before the pvlib solar-position call. Set to 0 to disable the shift.

  • verbose (bool, default True) – Set to False to suppress the explanatory print message.

Returns:

Apparent zenith angle (degrees) indexed like data.index.

Return type:

Series

captest.calcparams.avg_typ_cell_temp(data, poa, cell_temp, verbose=True)

Calculate irradiance weighted cell temperature.

Parameters:
  • data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.

  • poa (str) – Column name for POA irradiance (W/m^2).

  • cell_temp (str) – Column name for Cell temperature for each interval (degrees C).

Returns:

Average irradiance-weighted cell temperature.

Return type:

float

captest.calcparams.bom_temp(data, poa=None, temp_amb=None, wind_speed=None, module_type='glass_cell_poly', racking='open_rack', verbose=True)

Calculate back of module temperature from measured weather data.

Calculate back of module temperature from POA irradiance, ambient temperature, wind speed (at height of 10 meters), and empirically derived heat transfer coefficients.

Equation from NREL Weather Corrected Performance Ratio Report.

Parameters:
  • data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.

  • poa (str) – Column name for POA irradiance in W/m^2.

  • temp_amb (str) – Column name for Ambient temperature in degrees C.

  • wind_speed (str) – Column name for Measured wind speed (m/sec) corrected to measurement height of 10 meters.

  • module_type (str, default 'glass_cell_poly') – Any of glass_cell_poly, glass_cell_glass, or ‘poly_tf_steel’.

  • racking (str, default 'open_rack') – Any of ‘open_rack’, ‘close_roof_mount’, or ‘insulated_back’

Returns:

Back of module temperatures.

Return type:

numeric or Series

captest.calcparams.cell_temp(data, bom, poa, module_type='glass_cell_poly', racking='open_rack', verbose=True)

Calculate cell temp from BOM temp, POA, and heat transfer coefficient.

Equation from NREL Weather Corrected Performance Ratio Report.

Parameters:
  • data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.

  • bom (str) –

    Column name for back of module temperature (degrees C). Strictly following the NREL procedure this value would be obtained from the back_of_module_temp function.

    Alternatively, a measured BOM temperature may be used.

    Refer to p.7 of NREL Weather Corrected Performance Ratio Report.

  • poa (str) – Column name for POA irradiance in W/m^2.

  • module_type (str, default 'glass_cell_poly') – Any of glass_cell_poly, glass_cell_glass, or ‘poly_tf_steel’.

  • racking (str, default 'open_rack') – Any of ‘open_rack’, ‘close_roof_mount’, or ‘insulated_back’

  • verbose (bool, default True) – By default prints explanation of calculation. Set to False for no output message.

Returns:

Cell temperatures.

Return type:

Series

captest.calcparams.e_total(data, poa, rpoa, bifaciality=0.7, bifacial_frac=1, rear_shade=0, verbose=True)

Calculate total irradiance from POA and rear irradiance.

Parameters:
  • data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.

  • poa (str) – Column name for POA irradiance (W/m^2).

  • rpoa (str) – Column name for rear irradiance (W/m^2).

  • bifaciality (numeric, default 0.7) – Bifaciality factor.

  • bifacial_frac (numeric, default 1) – Fraction of total array nameplate power that is bifacial. Pass to calculate total plane of array irradiance for plants with a mix of monofacial and bifacial modules.

  • rear_shade (numeric, default 0) – Fraction of rear irradiance that is lost due to shading. Set to decimal fraction, e.g. 0.12, to include in calculation of e_total.

Returns:

Total plane of array irradiance.

Return type:

numeric or Series

captest.calcparams.multiply(data, a=None, b=None, verbose=True)

Elementwise multiplication of two columns.

Parameters:
  • data (DataFrame) – Source DataFrame.

  • a (str) – Column names to multiply. Both kwarg names must not collide with any column_groups id, per CapData.custom_param semantics.

  • b (str) – Column names to multiply. Both kwarg names must not collide with any column_groups id, per CapData.custom_param semantics.

  • verbose (bool, default True) – Set to False to suppress the explanatory print message.

Returns:

data[a] * data[b] indexed like data.index.

Return type:

Series

captest.calcparams.poa_spec_corrected(data, poa=None, spectral_correction=None, verbose=True)

Spectrally corrected plane-of-array irradiance.

Thin named alias that multiplies a POA column by a spectral-correction column. Primary use is the top-level node of a regression_cols calc tree whose spectral_correction kwarg is itself a calc subtree ending in spectral_factor_firstsolar().

Parameters:
  • data (DataFrame) – Source DataFrame.

  • poa (str) – Column name for plane-of-array irradiance (W/m^2).

  • spectral_correction (str) – Column name for the spectral correction factor.

  • verbose (bool, default True) – Set to False to suppress the explanatory print message.

Returns:

data[poa] * data[spectral_correction] indexed like data.index.

Return type:

Series

captest.calcparams.power_temp_correct(data, power, cell_temp, power_temp_coeff=None, base_temp=25, verbose=True)

Apply temperature correction to PV power.

Divides power by the temperature correction, so low power values that are above base_temp will be increased and high power values that are below the base_temp will be decreased.

Parameters:
  • data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance.

  • power (str) – The column name of the data attribute with the power to correct.

  • cell_temp (str) – Name of the column in data containing the cell temperature (in Celsius) used to calculate temperature differential from the base_temp.

  • power_temp_coeff (numeric) – Module power temperature coefficient as percent per degree celsius. Ex. -0.36

  • base_temp (numeric, default 25) – Base temperature (in Celsius) to correct power to. Default is the STC of 25 degrees Celsius.

Returns:

Power corrected for temperature.

Return type:

Series

captest.calcparams.precipitable_water_gueymard(data, temp_amb=None, rel_humidity=None, verbose=True)

Precipitable water (cm) from ambient temperature and relative humidity.

Wraps pvlib.atmosphere.gueymard94_pw().

Parameters:
  • data (DataFrame) – DataFrame containing the ambient-temperature and relative-humidity columns.

  • temp_amb (str) – Column name for ambient (dry-bulb) temperature in degrees Celsius.

  • rel_humidity (str) – Column name for relative humidity as a percentage (0-100).

  • verbose (bool, default True) – Set to False to suppress the explanatory print message.

Returns:

Precipitable water (cm) indexed like data.index.

Return type:

Series

captest.calcparams.rpoa_pvsyst(data, globbak='GlobBak', backshd='BackShd', verbose=True)

Calculate the sum of PVsyst’s global rear irradiance and rear shading and IAM losses.

Parameters:
  • data (DataFrame) – DataFrame with the source data for calculations. Usually the data attribute of a CapData instance containing PVsyst 8760 data.

  • globbak (str, default 'GlobBak') – Column name for global rear irradiance (W/m^2).

  • backshd (str, default 'BackShd') – Column name for rear shading and IAM losses (W/m^2).

  • verbose (bool, default True) – Set to False to not print calculation explanation.

Returns:

Sum of global rear irradiance and rear shading and IAM losses.

Return type:

Series

captest.calcparams.scale(data, col=None, factor=1.0, verbose=True)

Multiply a single column by a scalar factor.

Generic unit-conversion / rescaling helper usable in regression_cols calc trees. Primary use in this module is converting PVsyst PrecWat from meters to centimeters with factor=100.

Parameters:
  • data (DataFrame) – Source DataFrame.

  • col (str) – Column name to scale.

  • factor (numeric, default 1.0) – Scalar multiplier applied elementwise to data[col].

  • verbose (bool, default True) – Set to False to suppress the explanatory print message.

Returns:

data[col] * factor indexed like data.index.

Return type:

Series

captest.calcparams.spectral_factor_firstsolar(data, precipitable_water=None, absolute_airmass=None, spectral_module_type='cdte', verbose=True)

First Solar spectral correction factor.

Wraps pvlib.spectrum.spectral_factor_firstsolar(). spectral_module_type defaults to 'cdte' but can be overridden via a cd.spectral_module_type attribute which custom_param auto-injects when the kwarg is left unset. CapTest propagates its spectral_module_type param onto both CapData instances at setup().

The kwarg is named spectral_module_type (not module_type) to avoid collisions with the module_type kwarg used by bom_temp() and cell_temp(), which expects values like 'glass_cell_poly' rather than the pvlib First Solar module-type strings.

Parameters:
  • data (DataFrame) – DataFrame containing the precipitable-water and absolute-airmass columns.

  • precipitable_water (str) – Column name for precipitable water in cm.

  • absolute_airmass (str) – Column name for absolute airmass.

  • spectral_module_type (str, default 'cdte') – Passed through to pvlib.spectrum.spectral_factor_firstsolar() as its module_type argument.

  • verbose (bool, default True) – Set to False to suppress the explanatory print message.

Returns:

Spectral correction factor indexed like data.index.

Return type:

Series

captest.plotting module

class captest.plotting.ScatterBifiPowerTc(*, am_color, am_marker, cd, filtered, height, pm_color, pm_marker, split_day, split_time, tc_force_recompute, tc_mode, tc_power, tc_power_calc, timeseries, width, name)

Bases: ScatterPlot

Two-panel scatter for the bifi_power_tc preset.

The bifi_power_tc regression formula is power ~ poa + rpoa where power is already temperature-corrected. This subclass builds one panel per rhs variable (power vs poa and power vs rpoa). The tc_power parameter is ignored here because the regression power is already tc-corrected; setting it to True emits a UserWarning.

AM/PM splitting and timeseries pairing are inherited from ScatterPlot. When timeseries=True, only the first panel is paired with a linked timeseries view to keep the layout sane.

name = 'ScatterBifiPowerTc'
view()

Build a two-panel hv.Layout for the bifi_power_tc preset.

Return type:

holoviews.Layout

class captest.plotting.ScatterPlot(*, am_color, am_marker, cd, filtered, height, pm_color, pm_marker, split_day, split_time, tc_force_recompute, tc_mode, tc_power, tc_power_calc, timeseries, width, name)

Bases: Parameterized

Composable scatter plot for CapTest regression diagnostics.

Resolves x and y from cd.regression_formula (lhs vs first rhs) and optionally:

  • splits points into morning / afternoon glyphs (split_day=True),

  • swaps the y-axis to a temperature-corrected power column (tc_power=True, with mode replace / add_panel / overlay), and / or

  • pairs the scatter with a linked timeseries panel (timeseries=True).

Parameters:
  • cd (CapData or None) – CapData instance whose data / column_groups / regression_formula drive the plot. Required at view time.

  • filtered (bool, default True) – When True (default), pulls regression columns from cd.data_filtered; when False, from cd.data.

  • split_day (bool, default False) – Render morning and afternoon points as two distinct overlaid Scatters with different colors and markers.

  • split_time (str or None, default None) – Clock-time override ("HH:MM") for the AM/PM boundary. When None and split_day=True, the boundary is detected via captest.util.detect_solar_noon (idxmax of clock-time-binned ghi_mod_csky mean) with a 12:30 fallback.

  • am_color (str, default "#1f77b4" / "#d62728") – Glyph colors for the AM and PM Scatters when split_day=True.

  • pm_color (str, default "#1f77b4" / "#d62728") – Glyph colors for the AM and PM Scatters when split_day=True.

  • am_marker (str, default "circle" / "triangle") – Glyph markers for the AM and PM Scatters when split_day=True.

  • pm_marker (str, default "circle" / "triangle") – Glyph markers for the AM and PM Scatters when split_day=True.

  • tc_power (bool, default False) – Plot against temperature-corrected power instead of (or in addition to) raw power.

  • tc_mode ({"replace", "add_panel", "overlay"}, default "replace") – Layout strategy when tc_power=True.

  • tc_power_calc (dict or None, default None) – Calc-params nested dict that produces the tc-power column. When None, DEFAULT_TC_POWER_CALC is used (tuned for measured DAS data; sim users must override).

  • tc_force_recompute (bool, default False) – When True, recomputes the tc-power column even if it already exists on cd.data.

  • timeseries (bool, default False) – Pair the principal scatter with a linked timeseries panel below. The timeseries panel overlays a thin gray curve of the full unfiltered y-series under the linked scatter of the filtered data so removed points remain visible as background context. Only valid for the single-panel tc_mode values (replace and overlay); raises ValueError if combined with tc_mode='add_panel'.

  • height (int, default 400 / 500) – Pixel dimensions forwarded to the Scatter / Curve options.

  • width (int, default 400 / 500) – Pixel dimensions forwarded to the Scatter / Curve options.

am_color = '#1f77b4'
am_marker = 'circle'
cd = None
filtered = True
height = 400
name = 'ScatterPlot'
pm_color = '#d62728'
pm_marker = 'triangle'
split_day = False
split_time = None
tc_force_recompute = False
tc_mode = 'replace'
tc_power = False
tc_power_calc = None
timeseries = False
view()

Build and return the hv.Layout for the configured options.

Returns:

A Layout whose first element is the principal scatter (a Scatter for the single-glyph case, an Overlay when split_day=True). Additional panels appear when tc_mode='add_panel' or timeseries=True.

Return type:

holoviews.Layout

Raises:
  • ValueError – If cd is unset, or if timeseries=True is combined with tc_mode='add_panel', or if timeseries=True is combined with tc_power=True and tc_mode='overlay' (the linked timeseries panel can only display a single y-series, so an overlaid raw + tc-power principal is ambiguous).

  • ImportError – If holoviews is not installed.

width = 500
captest.plotting.add_am_pm_dim(df, split_time)

Tag rows of df as morning or afternoon based on a clock-time split.

Parameters:
  • df (pandas.DataFrame) – DataFrame with a DatetimeIndex.

  • split_time (str) – Clock-time string in "HH:MM" format (24-hour, leading zeros optional, e.g. "12:30" or "9:05"). Rows whose index time is strictly before split_time are tagged "am"; rows at or after split_time are tagged "pm".

Returns:

Copy of df with a new period column whose values are "am" or "pm".

Return type:

pandas.DataFrame

Raises:

ValueError – If split_time does not match "HH:MM" or specifies an invalid hour/minute.

captest.plotting.add_custom_plot(name, column_groups, group_tags, column_tags)

Append a new custom group to column groups for plotting.

captest.plotting.calc_tc_power_column(cd, tc_power_calc, col_name='power_tc_plot', verbose=False, force_recompute=False)

Materialize a temperature-corrected power column for plotting only.

Walks tc_power_calc (a calc-params nested dict using the same grammar as TEST_SETUPS reg_cols_* values) via captest.util.transform_calc_params and writes the resulting power_temp_correct Series to cd.data[col_name] and cd.data_filtered[col_name].

This helper is intentionally isolated from CapData.process_regression_columns: it does NOT touch cd.regression_cols, cd.regression_formula, cd.summary, cd.kept, or cd.removed.

Parameters:
  • cd (CapData) – The CapData instance whose data and data_filtered will be extended with col_name. power_temp_coeff and base_temp attributes (propagated by CapTest.setup for shipped presets) are auto-injected by CapData.custom_param if not present in tc_power_calc.

  • tc_power_calc (dict) – Calc-params nested dict mirroring the bifi_power_tc preset’s reg_cols_meas['power'] value. The outermost callable must produce a Series of temperature-corrected power values; in practice this is calcparams.power_temp_correct. The dict must contain a top-level "power" calculation tuple.

  • col_name (str, default TC_POWER_PLOT_COL) – Name of the column written to cd.data / cd.data_filtered.

  • verbose (bool, default False) – Forwarded to transform_calc_params.

  • force_recompute (bool, default False) – When False (default), short-circuits and returns col_name if the column already exists in cd.data. Pass True to recompute.

Returns:

col_name.

Return type:

str

Raises:
  • KeyError – When tc_power_calc references a column-group id that is missing from cd.column_groups.

  • ValueError – When tc_power_calc does not contain a top-level "power" calculation tuple that produces a column in cd.data.

captest.plotting.filter_list(text_input, ms_to_filter, names, event=None)

Filter a multi-select widget by a regex string.

Parameters:
  • text_input (pn.widgets.TextInput) – The text input widget to get the regex string from.

  • ms_to_filter (pn.widgets.MultiSelect) – The multi-select widget to update.

  • names (list of str) – The list of names to filter.

  • event (pn.widgets.event, optional) – Passed by the param.watch method. Not used.

Return type:

None

captest.plotting.find_default_groups(groups, default_groups)

Find the default groups in the list of groups.

Parameters:
  • groups (list of str) – The list of groups to search for the default groups.

  • default_groups (list of str) – List of regex strings to use to identify default groups.

Returns:

The default groups found in the list of groups.

Return type:

list of str

captest.plotting.get_resid_exog_frame(cd)

Get a DataFrame of residuals and exogenous variables from a CapData object.

Parameters:

cd (captest.CapData) – The CapData object.

Returns:

DataFrame with residuals and exogenous variables.

Return type:

pd.DataFrame

captest.plotting.group_tag_overlay(group_tags, column_tags)

Overlay curves of groups and individually selected columns.

Parameters:
  • group_tags (list of str) – The tags to plot from the groups selected.

  • column_tags (list of str) – The tags to plot from the individually selected columns.

captest.plotting.msel_from_column_groups(column_groups, groups=True)

Create a multi-select widget from a column groups object.

Parameters:
  • column_groups (ColumnGroups) – The column groups object.

  • groups (bool, default True) – By default creates list of groups i.e. the keys of column_groups, otherwise creates list of individual columns i.e. the values of column_groups concatenated together.

captest.plotting.parse_combine(combine, column_groups=None, data=None, cd=None)

Parse regex strings for identifying groups of columns or tags to combine.

Parameters:
  • combine (dict) – Dictionary of group names and regex strings to use to identify groups from column groups and individual tags (columns) to combine into new groups. Keys should be strings for names of new groups. Values should be either a string or a list of two strings. If a string, the string is used as a regex to identify groups to combine. If a list, the first string is used to identify groups to combine and the second is used to identify individual tags (columns) to combine.

  • column_groups (ColumnGroups, optional) – The column groups object to add new groups to. Required if cd is not provided.

  • data (pd.DataFrame, optional) – The data to use to identify groups and columns to combine. Required if cd is not provided.

  • cd (captest.CapData, optional) – The captest.CapData object with the data and column_groups attributes set. Required if columng_groups and data are not provided.

Returns:

New column groups object with new groups added.

Return type:

ColumnGroups

captest.plotting.plot(cd=None, cg=None, data=None, combine={'ghi_csky': '(?=.*ghi)(?=.*irr)', 'inv_sum_mtr_pwr': ['(?=.*real)(?=.*pwr)(?=.*mtr)', '(?=.*pwr)(?=.*agg)'], 'poa_csky': '(?=.*poa)(?=.*irr)', 'poa_ghi': 'irr.*(poa|ghi)$', 'temp_amb_bom': '(?=.*temp)((?=.*amb)|(?=.*bom))'}, default_groups=['inv_sum_mtr_pwr', '(?=.*real)(?=.*pwr)(?=.*inv)', '(?=.*real)(?=.*pwr)(?=.*mtr)', 'poa_ghi', 'poa_csky', 'ghi_csky', 'temp_amb_bom'], group_width=1500, group_height=250, plot_defaults_path=None, **kwargs)

Create plotting dashboard.

NOTE: If a plot defaults JSON file exists in the current working directory, the default groups will be read from that file instead of using the default_groups argument. When a cd (CapData) object is provided, the file is named plot_defaults_{cd.name}.json to avoid conflicts between multiple CapData objects in the same session. Otherwise the file is named plot_defaults.json. Use the plot_defaults_path argument to override the path. Delete or manually edit the file to change the default groups. Columns in the file that are no longer present in the data are ignored with a warning.

Parameters:
  • cd (captest.CapData, optional) – The captest.CapData object.

  • cg (captest.ColumnGroups, optional) – The captest.ColumnGroups object. data must also be provided.

  • data (pd.DataFrame, optional) – The data to plot. cg must also be provided.

  • combine (dict, optional) – Dictionary of group names and regex strings to use to identify groups from column groups and individual tags (columns) to combine into new groups. See the parse_combine function for more details.

  • default_groups (list of str, optional) – List of regex strings to use to identify default groups to plot. See the find_default_groups function for more details.

  • group_width (int, optional) – The width of the plots on the Groups tab.

  • group_height (int, optional) – The height of the plots on the Groups tab.

  • plot_defaults_path (str or Path, optional) – Path to the plot defaults JSON file. Overrides the default naming scheme. When None and cd is provided, defaults to ./plot_defaults_{cd.name}.json. When None and cd is not provided, defaults to ./plot_defaults.json.

  • **kwargs (optional) – Pass additional keyword arguments to the holoviews options of the scatter plot on the ‘Scatter’ tab.

captest.plotting.plot_group_tag_overlay(data, group_tags, column_tags, width=1500, height=400)

Overlay curves of groups and individually selected columns.

Parameters:
  • data (pd.DataFrame) – The data to plot.

  • group_tags (list of str) – The tags to plot from the groups selected.

  • column_tags (list of str) – The tags to plot from the individually selected columns.

captest.plotting.plot_tag(data, tag, width=1500, height=250)
captest.plotting.plot_tag_groups(data, tags_to_plot, width=1500, height=250)

Plot groups of tags, one of overlayed curves per group.

Parameters:
  • data (pd.DataFrame) – The data to plot.

  • tags_to_plot (list) – List of lists of strings. One plot for each inner list.

captest.plotting.scatter_dboard(data, **kwargs)

Create a dashboard to plot any two columns of data against each other.

Parameters:
  • data (pd.DataFrame) – The data to plot.

  • **kwargs (optional) – Pass additional keyword arguments to the holoviews options of the scatter plot.

Returns:

The dashboard with a scatter plot of the data.

Return type:

pn.Column