This page was generated from docs/examples/complete_capacity_test.ipynb. Interactive online version:

Example Capacity Test using pvcaptest

This example goes through typical steps of performing a capacity test following the ASTM E2848 standard using the pvcaptest package.

Imports

[1]:

%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import pandas as pd

# import captest as pvc
import captest as ct
from captest import capdata as pvc
from bokeh.io import output_notebook, show

# uncomment below two lines to use cptest.scatter_hv in notebook
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')

#if working offline with the CapData.plot() method may fail
#run 'export BOKEH_RESOURCES=inline' at the command line before
#running the jupyter notebook

output_notebook()

Loading BokehJS ...

Load and Plot Measured Data

We begin by using the load_data function, which reads the file(s) specified by the path argument and returns an instance of the CapData class. In this example we will calculate reporting conditions from the measured data, so we load and filter the measured data first.

When given the path to a file, as shown here, load_data will try to read that file. If you pass a path to a directory, load_data will look for and attempt to load all files ending with ‘.csv’ in the specified directory. Other file types can also be loaded by passing your own function to the file_reader argument and including the extension (e.g. ‘xlsx’) as a kwarg.

[2]:

das = ct.load_data('./data/example_measured_data.csv')

The load_data method loads the data into a pandas DataFrame, which it assigns to the data attribute of the CapData object. Here we use the pandas DataFrame head method to return the first three rows.

[3]:

das.data.head(3)

[3]:

	met1_amb_temp	met2_amb_temp	met1_mod_temp1	met1_mod_temp2	...	met2_windspeed	meter_power	inv1_power	inv2_power	inv3_power	inv4_power	inv5_power	inv6_power	inv7_power	inv8_power
1990-10-09 00:00:00	17.750666	17.770821	15.640355	15.663692	...	-0.007221	-8868.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0
1990-10-09 00:05:00	17.737545	17.753030	15.551920	15.676843	...	-0.007195	-8868.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0
1990-10-09 00:10:00	17.648090	17.689437	15.541516	15.414247	...	0.002557	-8868.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0	-150.0

3 rows × 23 columns

In addition to loading data, by default the load_data function attempts to parse the column headers and group the columns based on the type of measurement recorded in each column. For each inferred measurement type, group_columns creates an abbreviated name and a list of columns that contain measurements of that type. The python dictionary created by group_columns is stored in the column_groups.data attribute. column_groups is a dictionary that display nicely and includes the groups as attibutes for easy access as shown below. If the column grouping returned is not correct, you can provide either your own function to group the columns or a yaml, json, or excel file mapping column group identifiers to the column headings.

[4]:

das.column_groups

[4]:

irr_poa_ref_cell:
    met1_poa_refcell
    met2_poa_refcell
irr_ghi_pyran:
    met1_ghi_pyranometer
    met2_ghi_pyranometer
irr_poa_pyran:
    met1_poa_pyranometer
    met2_poa_pyranometer
temp_amb_:
    met1_amb_temp
    met2_amb_temp
temp_mod_:
    met1_mod_temp1
    met1_mod_temp2
    met2_mod_temp1
    met2_mod_temp2
_mtr_:
    meter_power
wind__:
    met1_windspeed
    met2_windspeed
_inv_:
    inv1_power
    inv2_power
    inv3_power
    inv4_power
    inv5_power
    inv6_power
    inv7_power
    inv8_power

When working in an environment with table completion, like a Jupyter notebook, the group attributes are easy to access without needing to remember the group name.

[5]:

das.column_groups.irr_poa_pyran

[5]:

['met1_poa_pyranometer', 'met2_poa_pyranometer']

The CapData has the methodsloc and floc to select subsets of columns from the data and data_filtered DataFrames, respectively. These methods allow easy access to the groups of columns identified in column_groups using the column_group keys, column names, regression_cols keys, or a combination of the three. The regression_cols attribute is introduced below. The column_groups dictionary also enables much of the functionality of CapData methods to perform common capacity testing tasks, like generating scatter plots, filtering data, and performing regressions.

Using the loc method with the ‘irr_poa_ref_cell’ attribute key of column_groups to select data from the POA reference cell columns in the data DataFrame:

[6]:

das.loc[das.column_groups.irr_poa_ref_cell].iloc[100:103, :]

[6]:

	met1_poa_refcell	met2_poa_refcell
1990-10-09 08:20:00	534.845280	534.812723
1990-10-09 08:25:00	562.246349	553.728808
1990-10-09 08:30:00	539.686886	440.947099

Accessing the two irradiance columns of the ‘irr_poa_pyran’ group, the single column of the ‘mtr’ group, and the ‘met1_amb_temp’ column of the data_filtered DataFrame:

[7]:

das.floc[['irr_poa_pyran', '_mtr_', 'met1_amb_temp']]

[7]:

	met1_poa_pyranometer	met2_poa_pyranometer	meter_power	met1_amb_temp
1990-10-09 00:00:00	0.0	0.0	-8868.0	17.750666
1990-10-09 00:05:00	0.0	0.0	-8868.0	17.737545
1990-10-09 00:10:00	0.0	0.0	-8868.0	17.648090
1990-10-09 00:15:00	0.0	0.0	-8868.0	17.641772
1990-10-09 00:20:00	0.0	0.0	-8868.0	17.616870
...	...	...	...	...
1990-10-13 23:35:00	0.0	0.0	-8868.0	20.037996
1990-10-13 23:40:00	0.0	0.0	-8868.0	19.893458
1990-10-13 23:45:00	0.0	0.0	-8868.0	19.750967
1990-10-13 23:50:00	0.0	0.0	-8868.0	19.585205
1990-10-13 23:55:00	0.0	0.0	-8868.0	19.459186

1440 rows × 4 columns

For datasets that have multiple measurements of the same value, like the two POA irradiance measurements in this sample data, these values must be aggregated prior to filtering or regressing the data. The agg_sensors method provides a convient way to do this for all the groups of measurements in column_groups in one step.

The desired aggregations are specified by passing a dictionary to the agg_map argument where the keys are groups from column_groups and the values are aggregation functions. Here we are using string functions that are recognized by pandas. Most of the common aggregation functions (mean, median, max, sum, min, etc.) are available as string functions. If you would like to apply a different aggregation function, please refer to the pandas documentation for DataFrame.agg. By default, the agg_sensors method adds a new column to the dataframe in the data attribute for the results of each aggregation and copies over the data_filtered attribute with the new dataframe.

[8]:

das.agg_sensors(agg_map={'_inv_':'sum', 'irr_poa_pyran':'mean', 'temp_amb_':'mean', 'wind__':'mean'}, verbose=True)

Aggregating the below 8 columns of the _inv_ group using the sum function. New column name: _inv__sum_agg:
    inv1_power
    inv2_power
    inv3_power
    inv4_power
    inv5_power
    inv6_power
    inv7_power
    inv8_power


Aggregating the below 2 columns of the irr_poa_pyran group using the mean function. New column name: irr_poa_pyran_mean_agg:
    met1_poa_pyranometer
    met2_poa_pyranometer


Aggregating the below 2 columns of the temp_amb_ group using the mean function. New column name: temp_amb__mean_agg:
    met1_amb_temp
    met2_amb_temp


Aggregating the below 2 columns of the wind__ group using the mean function. New column name: wind___mean_agg:
    met1_windspeed
    met2_windspeed

Unless using a pre-defined test setup, pvcaptest does not attempt to determine which columns of data or groups of columns are the data to be used in the regressions. The link between regression variables and the imported data is made by a dictionary stored in the regression_cols attribute. This dictionary is also used to define any aggregations necessary to join multiple columns into a single column. Prior to v0.15.0 the aggregation step was performed with the agg_sensors method. For v0.15.0 on the aggregation step is performed by the process_regression_columns method.

[9]:

das.regression_cols = {
    'power': '_mtr_',
    'poa': ('irr_poa_pyran', 'mean'),
    't_amb': ('temp_amb_', 'mean'),
    'w_vel': ('wind__', 'mean')
}

[10]:

das.process_regression_columns()

Reusing existing column 'irr_poa_pyran_mean_agg'; skipping aggregation of the irr_poa_pyran group.

Reusing existing column 'temp_amb__mean_agg'; skipping aggregation of the temp_amb_ group.

Reusing existing column 'wind___mean_agg'; skipping aggregation of the wind__ group.

The process_regression_columns method updates the regression columns dictionary to point to the aggregated columns.

[11]:

das.regression_cols

[11]:

{'power': 'meter_power',
 'poa': 'irr_poa_pyran_mean_agg',
 't_amb': 'temp_amb__mean_agg',
 'w_vel': 'wind___mean_agg'}

Once the regression columns are set, the loc or floc methods will return the data for each type of sensor identified in the column_groups attribute. Because we’ve run the process_regression_columns accessing the poa data with the loc now returns the aggregated result.

[12]:

das.loc['poa'].iloc[100:103, :]

[12]:

	irr_poa_pyran_mean_agg
1990-10-09 08:20:00	538.959351
1990-10-09 08:25:00	559.041911
1990-10-09 08:30:00	519.485970

The plot method creates a dashboard with a group of time series plots that are useful for visually inspecting the imported data.

plot uses the structure of the column_groups attribute to create a layout of plots. A single plot is generated for each measurement type and each column with measurements of that type are plotted as a separate line on the plot. In this example there are two different weather stations, which each have pyranometers measuring plane of array and global horizontal irradiance. This arrangement of sensors results in two plots which each have two lines.

Note, the full functionality of the dashboard requires a live notebook. Try installing to run or using the launch binder button at the top of the page.

[13]:

combine = {'inv_sum_mtr_pwr': ['mtr', 'inv.*agg'], 'irr_all':['irr_poa', 'irr_ghi'], 'temp_all':['temp_amb', 'temp_mod']}
default_groups = ['inv_sum_mtr_pwr', 'irr_all', 'temp_all']
das.plot(combine=combine, default_groups=default_groups, width=900)

[13]:

Filtering Measured Data

The CapData class provides a number of convience methods to apply filtering steps as defined in ASTM E2848. The following section demonstrates the use of the more commonly used filtering steps to remove measured data points.

[14]:

# Uncomment and run to copy over the filtered dataset with the unfiltered data.
das.reset_filter()

A common first step is to review the scatter plot of the POA irradiance against the power production.

If you have the optional dependency Holoviews installed, scatter_hv will return an interactive scatter plot. Additionally, scatter_hv includes an option to return a timeseries plot of power that is linked to the scatter plot, so points selected in the scatter plot will be highlighted in the time series.

[15]:

# Uncomment the below line to use scatter_hv with linked time series
das.scatter_hv(timeseries=True)

[15]:

The filter_custom method provides a way to use your own filtering method within captest and update the summary data. The filter_custom method allows passing any function or method that takes a DataFrame as the first argument and returns a filtered dataframe with rows removed. Passed methods can be user-defined functions or Pandas DataFrame methods.

Below, we use the filter_custom method with the pandas DataFrame dropna method to removing missing data and update the summary data.

[16]:

das.filter_custom(pd.DataFrame.dropna)

The get_summary method will return a dataframe summarizing the filtering steps that have been applied, the agruments passed to them, the number of points prior to filtering, and the number of points after filtering.

[17]:

das.get_summary()

[17]:

		pts_after_filter	pts_removed	filter_arguments
meas	filter_custom	1424	16	DataFrame.dropna, ,

The filter_irr method provides a convient way to remove remove data based on the irradiance measurments. Here we use it to remove periods of low irradiance. Values greater than 2000 W/m2 will also be removed, if present.

[18]:

das.filter_irr(200, 2000)

[19]:

das.get_summary()

[19]:

		pts_after_filter	pts_removed	filter_arguments
meas	filter_custom	1424	16	DataFrame.dropna, ,
meas	filter_irr	552	872	200, 2000,

We can re-run the scatter method to see the results of the filtering steps.

[20]:

das.scatter_hv()

[20]:

The filter_outliers method uses scikit-learn’s elliptic envelope to remove outlier points. A future release will include a way to interactively select points to be removed.

[21]:

das.filter_outliers()

[22]:

das.scatter_hv()

[22]:

The fit_regression method performs a regression on the data stored in data_filtered using the regression equation specified by the standard. The regression equation is stored in the regression_formula attribute as shown below. Regressions are performed using the statsmodels package.

Below, we set the filter argument of the fit_regression method to True to remove time periods when the residual exceeds two standard deviations of the mean residual.

[23]:

das.regression_formula

[23]:

'power ~ poa + I(poa * poa) + I(poa * t_amb) + I(poa * w_vel) - 1'

[24]:

das.fit_regression(filter=True, summary=False)

NOTE: Regression used to filter outlying points.

[25]:

das.get_summary()

[25]:

		pts_after_filter	pts_removed	filter_arguments
meas	filter_custom	1424	16	DataFrame.dropna, ,
	filter_irr	552	872	200, 2000,
	filter_outliers	529	23	Default arguments
	fit_regression	493	36	filter: True, summary: False

Calculation of Reporting Conditions

The rep_cond method provide a variety of ways to calculate reporting conditions. Using rep_cond the reporting conditions are always calculated from the data store in the data_filtered attribute. Refer to the example notebook “Reporting Conditions Examples” for a thourough explanation of the rep_cond functionality. By default the reporting conditions are calcualted following the guidance of ASTM E2939-13.

[26]:

das.rep_cond()

Reporting conditions saved to rc attribute.
          poa      t_amb     w_vel
0  631.334142  24.753909  2.132683

Previously we used the irradiance filter to filter out data below 200 W/m2. The irradiance filter can also be used to filter irradiance based on a percentage band around a reference value. This approach is shown here to remove data where the irradiance is outside of +/- 50% of the reporting irradiance. Passing the string rep_irr to the key word argument (kwarg) ref_val uses the reporting POA irradiance stored int the rc attribute.

[27]:

das.filter_irr(0.5, 1.5, ref_val='rep_irr')

[28]:

das.scatter_hv()

[28]:

The fit_regression method is used again with the default arguments, which result in fitting the regression, printing and storing the results, but not filtering. The result of the regression is a statsmodels RegressionResultsWrapper object containing the regression coefficients and other information generated when performing the regression. This object is stored in the CapData regression_results attribute.

[29]:

das.fit_regression()

                                 OLS Regression Results
=======================================================================================
Dep. Variable:                  power   R-squared (uncentered):                   0.999
Model:                            OLS   Adj. R-squared (uncentered):              0.999
Method:                 Least Squares   F-statistic:                          9.872e+04
Date:                Wed, 24 Jun 2026   Prob (F-statistic):                        0.00
Time:                        16:38:11   Log-Likelihood:                         -5170.5
No. Observations:                 392   AIC:                                  1.035e+04
Df Residuals:                     388   BIC:                                  1.036e+04
Df Model:                           4
Covariance Type:            nonrobust
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
poa             8043.4648    102.486     78.484      0.000    7841.968    8244.962
I(poa * poa)      -0.2083      0.059     -3.514      0.000      -0.325      -0.092
I(poa * t_amb)   -71.8194      4.267    -16.831      0.000     -80.209     -63.430
I(poa * w_vel)    12.7656      8.998      1.419      0.157      -4.926      30.457
==============================================================================
Omnibus:                       46.720   Durbin-Watson:                   0.960
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               63.486
Skew:                          -0.837   Prob(JB):                     1.64e-14
Kurtosis:                       4.040   Cond. No.                     8.24e+03
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 8.24e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

The regression coefficients and p-values for each term are attributes available in the regression_results.

[30]:

das.regression_results.params

[30]:

poa               8043.464802
I(poa * poa)        -0.208321
I(poa * t_amb)     -71.819421
I(poa * w_vel)      12.765637
dtype: float64

[31]:

das.regression_results.pvalues

[31]:

poa               3.409848e-240
I(poa * poa)       4.930125e-04
I(poa * t_amb)     4.061184e-48
I(poa * w_vel)     1.568012e-01
dtype: float64

Load and Filter PVsyst Data

To load and filter the modeled data, often from PVsyst, we use the load_pvsyst method, which returns a CapData object with the pvsyst data loaded

[32]:

sim = ct.load_pvsyst('./data/pvsyst_example_HourlyRes_2.CSV')

[33]:

sim.column_groups

[33]:

pvsyt_losses__:
    IL Pmax
    IL Pmin
    IL Vmax
    IL Vmin
irr_ghi_:
    GlobHor
real_pwr__:
    E_Grid
index__:
    index
temp_amb_:
    T_Amb
temp_mod_:
    TArray
wind__:
    WindVel
_inv_:
    EOutInv
irr_poa_:
    GlobInc
shade__:
    FShdBm

[34]:

sim.set_regression_cols(power='real_pwr__', poa='irr_poa_', t_amb='temp_amb_', w_vel='wind__')

[35]:

# sim.plot()

[36]:

# Write over cptest.flt_sim dataframe with a copy of the original unfiltered dataframe
sim.reset_filter()

As a first step we use the filter_time method to select a 60 day period of data centered around the measured data.

[37]:

sim.filter_time(test_date='10/11/1990', days=60)

[38]:

sim.scatter_hv()

[38]:

[39]:

sim.filter_irr(200, 930)

[40]:

sim.scatter_hv()

[40]:

[41]:

sim.get_summary()

[41]:

		pts_after_filter	pts_removed	filter_arguments
pvsyst	filter_time	1441	7319	test_date: 10/11/1990, days: 60
pvsyst	filter_irr	397	1044	200, 930,

The filter_pvsyt method removes data for times when shade is present or the IL Pmin, IL Vmin, IL Pmax, IL Vmax output values are greater than 0.

[42]:

sim.filter_pvsyst()

[43]:

sim.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])

[44]:

sim.fit_regression()

                                 OLS Regression Results
=======================================================================================
Dep. Variable:                  power   R-squared (uncentered):                   1.000
Model:                            OLS   Adj. R-squared (uncentered):              1.000
Method:                 Least Squares   F-statistic:                          2.114e+06
Date:                Wed, 24 Jun 2026   Prob (F-statistic):                        0.00
Time:                        16:38:12   Log-Likelihood:                         -3683.6
No. Observations:                 319   AIC:                                      7375.
Df Residuals:                     315   BIC:                                      7390.
Df Model:                           4
Covariance Type:            nonrobust
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
poa             7620.3546     16.585    459.486      0.000    7587.724    7652.985
I(poa * poa)      -0.7783      0.013    -59.178      0.000      -0.804      -0.752
I(poa * t_amb)   -31.3177      0.535    -58.488      0.000     -32.371     -30.264
I(poa * w_vel)    -1.4710      1.252     -1.175      0.241      -3.935       0.993
==============================================================================
Omnibus:                       22.643   Durbin-Watson:                   2.100
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                9.032
Skew:                          -0.131   Prob(JB):                       0.0109
Kurtosis:                       2.218   Cond. No.                     5.85e+03
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 5.85e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Results

v0.15.0 API Changes - The capdata.get_summary function has been removed and replaced by the captest.CapTest.get_summary method.

The get_summary and captest_results_check_pvalues functions display the results of filtering on simulated and measured data and the final capacity test results comparing measured capacity to expected capacity, respectively.

For this example, which does not use a CapTest instance, you can instantiate one to call these results methods.

[45]:

# pvc.get_summary(das, sim) # As of v0.15.0 use equivalent method on CapTest class or pd.concat

# concat option
# pd.concat([
#     das.get_summary(),
#     sim.get_summary(),
# ])

[46]:

ts = ct.CapTest(meas=das, sim=sim, ac_nameplate=6_000, test_tolerance='+/- 7')

[47]:

ts.get_summary()

[47]:

		pts_after_filter	pts_removed	filter_arguments
meas	filter_custom	1424	16	DataFrame.dropna, ,
	filter_irr	552	872	200, 2000,
	filter_outliers	529	23	Default arguments
	fit_regression	493	36	filter: True, summary: False
	rep_cond	493	0	Default arguments
	filter_irr-1	392	101	0.5, 1.5, ref_val: 631.334
	fit_regression-1	392	0	Default arguments
pvsyst	filter_time	1441	7319	test_date: 10/11/1990, days: 60
	filter_irr	397	1044	200, 930,
	filter_pvsyst	397	0	Default arguments
	filter_irr-1	319	78	0.5, 1.5, ref_val: np.float64(631.334)
	fit_regression	319	0	Default arguments

[48]:

ts.captest_results_check_pvalues(print_res=True)

Using reporting conditions from meas.

Capacity Test Result:         PASS
Modeled test output:          4009358.528
Actual test output:           3889875.683
Tested output ratio:          0.970
Tested Capacity:              5821.194
Bounds:                       5580.0, 6420.0


Using reporting conditions from meas.

Capacity Test Result:         PASS
Modeled test output:          4011339.189
Actual test output:           3872687.576
Tested output ratio:          0.965
Tested Capacity:              5792.610
Bounds:                       5580.0, 6420.0


97.020% - Cap Ratio
96.540% - Cap Ratio after pval check

[48]:

	das_pvals	sim_pvals	das_params	sim_params
poa	0.00000	0.00000	8,043.46480	7,620.35460
I(poa * poa)	0.00049	0.00000	-0.20832	-0.77830
I(poa * t_amb)	0.00000	0.00000	-71.81942	-31.31767
I(poa * w_vel)	0.15680	0.24103	12.76564	-1.47104

Overlaying scatter plots from the measured and PVsyst data. This plot can be generated using CapTest.overlay_scatters when running a test using an instance of CapTest.

[49]:

(
    das.scatter_hv().relabel('Measured') *
    sim.scatter_hv().relabel('PVsyst')
).opts(
    opts.Scatter(alpha=0.3, width=600)
)

[49]: