This page was generated from /home/docs/checkouts/readthedocs.org/user_builds/pvcaptest/checkouts/stable/docs/examples/complete_capacity_test.ipynb.
Interactive online version:
Example Capacity Test using pvcaptest
This example goes through typical steps of performing a capacity test following the ASTM E2848 standard using the pvcaptest package.
Imports
[1]:
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
# import captest as pvc
import captest as ct
from captest import capdata as pvc
from bokeh.io import output_notebook, show
# uncomment below two lines to use cptest.scatter_hv in notebook
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
#if working offline with the CapData.plot() method may fail
#run 'export BOKEH_RESOURCES=inline' at the command line before
#running the jupyter notebook
output_notebook()
Load and Plot Measured Data
We begin by using the load_data function, which reads the file(s) specified by the path argument and returns an instance of the CapData class. In this example we will calculate reporting conditions from the measured data, so we load and filter the measured data first.
When given the path to a file, as shown here, load_data will try to read that file. If you pass a path to a directory, load_data will look for and attempt to load all files ending with ‘.csv’ in the specified directory. Other file types can also be loaded by passing your own function to the file_reader argument and including the extension (e.g. ‘xlsx’) as a kwarg.
[2]:
das = ct.load_data('./data/example_measured_data.csv')
before calling get common timestep
5min
The load_data method loads the data into a pandas DataFrame, which it assigns to the data attribute of the CapData object. Here we use the pandas DataFrame head method to return the first three rows.
[3]:
das.data.head(3)
[3]:
| met1_poa_refcell | met2_poa_refcell | met1_poa_pyranometer | met2_poa_pyranometer | met1_ghi_pyranometer | met2_ghi_pyranometer | met1_amb_temp | met2_amb_temp | met1_mod_temp1 | met1_mod_temp2 | ... | met2_windspeed | meter_power | inv1_power | inv2_power | inv3_power | inv4_power | inv5_power | inv6_power | inv7_power | inv8_power | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1990-10-09 00:00:00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.750666 | 17.770821 | 15.640355 | 15.663692 | ... | -0.007221 | -8868.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 |
| 1990-10-09 00:05:00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.737545 | 17.753030 | 15.551920 | 15.676843 | ... | -0.007195 | -8868.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 |
| 1990-10-09 00:10:00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.648090 | 17.689437 | 15.541516 | 15.414247 | ... | 0.002557 | -8868.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 |
3 rows × 23 columns
In addition to loading data, by default the load_data function attempts to parse the column headers and group the columns based on the type of measurement recorded in each column. For each inferred measurement type, group_columns creates an abbreviated name and a list of columns that contain measurements of that type. The python dictionary created by group_columns is stored in the column_groups.data attribute. column_groups is a dictionary that display nicely and includes the
groups as attibutes for easy access as shown below. If the column grouping returned is not correct, you can provide either your own function to group the columns or a yaml, json, or excel file mapping column group identifiers to the column headings.
[4]:
das.column_groups
[4]:
irr_ghi_pyran:
met1_ghi_pyranometer
met2_ghi_pyranometer
wind__:
met1_windspeed
met2_windspeed
_inv_:
inv1_power
inv2_power
inv3_power
inv4_power
inv5_power
inv6_power
inv7_power
inv8_power
temp_mod_:
met1_mod_temp1
met1_mod_temp2
met2_mod_temp1
met2_mod_temp2
irr_poa_ref_cell:
met1_poa_refcell
met2_poa_refcell
temp_amb_:
met1_amb_temp
met2_amb_temp
_mtr_:
meter_power
irr_poa_pyran:
met1_poa_pyranometer
met2_poa_pyranometer
When working in an environment with table completion, like a Jupyter notebook, the group attributes are easy to access without needing to remember the group name.
[5]:
das.column_groups.irr_poa_pyran
[5]:
['met1_poa_pyranometer', 'met2_poa_pyranometer']
The CapData has the methodsloc and floc to select subsets of columns from the data and data_filtered DataFrames, respectively. These methods allow easy access to the groups of columns identified in column_groups using the column_group keys, column names, regression_cols keys, or a combination of the three. The regression_cols attribute is introduced below. The column_groups dictionary also enables much of the functionality of CapData methods to perform
common capacity testing tasks, like generating scatter plots, filtering data, and performing regressions.
Using the loc method with the ‘irr_poa_ref_cell’ attribute key of column_groups to select data from the POA reference cell columns in the data DataFrame:
[6]:
das.loc[das.column_groups.irr_poa_ref_cell].iloc[100:103, :]
[6]:
| met1_poa_refcell | met2_poa_refcell | |
|---|---|---|
| 1990-10-09 08:20:00 | 534.845280 | 534.812723 |
| 1990-10-09 08:25:00 | 562.246349 | 553.728808 |
| 1990-10-09 08:30:00 | 539.686886 | 440.947099 |
Accessing the two irradiance columns of the ‘irr_poa_pyran’ group, the single column of the ‘mtr’ group, and the ‘met1_amb_temp’ column of the data_filtered DataFrame:
[7]:
das.floc[['irr_poa_pyran', '_mtr_', 'met1_amb_temp']]
[7]:
| met1_poa_pyranometer | met2_poa_pyranometer | meter_power | met1_amb_temp | |
|---|---|---|---|---|
| 1990-10-09 00:00:00 | 0.0 | 0.0 | -8868.0 | 17.750666 |
| 1990-10-09 00:05:00 | 0.0 | 0.0 | -8868.0 | 17.737545 |
| 1990-10-09 00:10:00 | 0.0 | 0.0 | -8868.0 | 17.648090 |
| 1990-10-09 00:15:00 | 0.0 | 0.0 | -8868.0 | 17.641772 |
| 1990-10-09 00:20:00 | 0.0 | 0.0 | -8868.0 | 17.616870 |
| ... | ... | ... | ... | ... |
| 1990-10-13 23:35:00 | 0.0 | 0.0 | -8868.0 | 20.037996 |
| 1990-10-13 23:40:00 | 0.0 | 0.0 | -8868.0 | 19.893458 |
| 1990-10-13 23:45:00 | 0.0 | 0.0 | -8868.0 | 19.750967 |
| 1990-10-13 23:50:00 | 0.0 | 0.0 | -8868.0 | 19.585205 |
| 1990-10-13 23:55:00 | 0.0 | 0.0 | -8868.0 | 19.459186 |
1440 rows × 4 columns
For datasets that have multiple measurements of the same value, like the two POA irradiance measurements in this sample data, these values must be aggregated prior to filtering or regressing the data. The agg_sensors method provides a convient way to do this for all the groups of measurements in column_groups in one step.
The desired aggregations are specified by passing a dictionary to the agg_map argument where the keys are groups from column_groups and the values are aggregation functions. Here we are using string functions that are recognized by pandas. Most of the common aggregation functions (mean, median, max, sum, min, etc.) are available as string functions. If you would like to apply a different aggregation function, please refer to the pandas documentation for DataFrame.agg. By default, the
agg_sensors method adds a new column to the dataframe in the data attribute for the results of each aggregation and copies over the data_filtered attribute with the new dataframe.
[8]:
das.agg_sensors(agg_map={'_inv_':'sum', 'irr_poa_pyran':'mean', 'temp_amb_':'mean', 'wind__':'mean'}, verbose=True)
Aggregating the below 8 columns of the _inv_ group using the sum function. New column name: _inv__sum_agg:
inv1_power
inv2_power
inv3_power
inv4_power
inv5_power
inv6_power
inv7_power
inv8_power
Aggregating the below 2 columns of the irr_poa_pyran group using the mean function. New column name: irr_poa_pyran_mean_agg:
met1_poa_pyranometer
met2_poa_pyranometer
Aggregating the below 2 columns of the temp_amb_ group using the mean function. New column name: temp_amb__mean_agg:
met1_amb_temp
met2_amb_temp
Aggregating the below 2 columns of the wind__ group using the mean function. New column name: wind___mean_agg:
met1_windspeed
met2_windspeed
Unless using a pre-defined test setup, pvcaptest does not attempt to determine which columns of data or groups of columns are the data to be used in the regressions. The link between regression variables and the imported data is made by a dictionary stored in the regression_cols attribute. This dictionary is also used to define any aggregations necessary to join multiple columns into a single column. Prior to v0.15.0 the aggregation step was performed with the agg_sensors method. For
v0.15.0 on the aggregation step is performed by the process_regression_columns method.
[9]:
das.regression_cols = {
'power': '_mtr_',
'poa': ('irr_poa_pyran', 'mean'),
't_amb': ('temp_amb_', 'mean'),
'w_vel': ('wind__', 'mean')
}
[10]:
das.process_regression_columns()
The process_regression_columns method updates the regression columns dictionary to point to the aggregated columns.
[11]:
das.regression_cols
[11]:
{'power': 'meter_power',
'poa': 'irr_poa_pyran_mean_agg',
't_amb': 'temp_amb__mean_agg',
'w_vel': 'wind___mean_agg'}
Once the regression columns are set, the loc or floc methods will return the data for each type of sensor identified in the column_groups attribute. Because we’ve run the process_regression_columns accessing the poa data with the loc now returns the aggregated result.
[12]:
das.loc['poa'].iloc[100:103, :]
[12]:
| irr_poa_pyran_mean_agg | |
|---|---|
| 1990-10-09 08:20:00 | 538.959351 |
| 1990-10-09 08:25:00 | 559.041911 |
| 1990-10-09 08:30:00 | 519.485970 |
The plot method creates a dashboard with a group of time series plots that are useful for visually inspecting the imported data.
plot uses the structure of the column_groups attribute to create a layout of plots. A single plot is generated for each measurement type and each column with measurements of that type are plotted as a separate line on the plot. In this example there are two different weather stations, which each have pyranometers measuring plane of array and global horizontal irradiance. This arrangement of sensors results in two plots which each have two lines.
Note, the full functionality of the dashboard requires a live notebook. Try installing to run or using the launch binder button at the top of the page.
[13]:
combine = {'inv_sum_mtr_pwr': ['mtr', 'inv.*agg'], 'irr_all':['irr_poa', 'irr_ghi'], 'temp_all':['temp_amb', 'temp_mod']}
default_groups = ['inv_sum_mtr_pwr', 'irr_all', 'temp_all']
das.plot(combine=combine, default_groups=default_groups, width=900)
[13]:
Filtering Measured Data
The CapData class provides a number of convience methods to apply filtering steps as defined in ASTM E2848. The following section demonstrates the use of the more commonly used filtering steps to remove measured data points.
[14]:
# Uncomment and run to copy over the filtered dataset with the unfiltered data.
das.reset_filter()
A common first step is to review the scatter plot of the POA irradiance against the power production.
If you have the optional dependency Holoviews installed, scatter_hv will return an interactive scatter plot. Additionally, scatter_hv includes an option to return a timeseries plot of power that is linked to the scatter plot, so points selected in the scatter plot will be highlighted in the time series.
[15]:
# Uncomment the below line to use scatter_hv with linked time series
das.scatter_hv(timeseries=True)
[15]:
The filter_custom method provides a way to use your own filtering method within captest and update the summary data. The filter_custom method allows passing any function or method that takes a DataFrame as the first argument and returns a filtered dataframe with rows removed. Passed methods can be user-defined functions or Pandas DataFrame methods.
Below, we use the filter_custom method with the pandas DataFrame dropna method to removing missing data and update the summary data.
[16]:
das.filter_custom(pd.DataFrame.dropna)
The get_summary method will return a dataframe summarizing the filtering steps that have been applied, the agruments passed to them, the number of points prior to filtering, and the number of points after filtering.
[17]:
das.get_summary()
[17]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| meas | filter_custom | 1424 | 16 | DataFrame.dropna, , |
The filter_irr method provides a convient way to remove remove data based on the irradiance measurments. Here we use it to remove periods of low irradiance. Values greater than 2000 W/m2 will also be removed, if present.
[18]:
das.filter_irr(200, 2000)
[19]:
das.get_summary()
[19]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| meas | filter_custom | 1424 | 16 | DataFrame.dropna, , |
| filter_irr | 552 | 872 | 200, 2000, |
We can re-run the scatter method to see the results of the filtering steps.
[20]:
das.scatter_hv()
[20]:
The filter_outliers method uses scikit-learn’s elliptic envelope to remove outlier points. A future release will include a way to interactively select points to be removed.
[21]:
das.filter_outliers()
[22]:
das.scatter_hv()
[22]:
The fit_regression method performs a regression on the data stored in data_filtered using the regression equation specified by the standard. The regression equation is stored in the regression_formula attribute as shown below. Regressions are performed using the statsmodels package.
Below, we set the filter argument of the fit_regression method to True to remove time periods when the residual exceeds two standard deviations of the mean residual.
[23]:
das.regression_formula
[23]:
'power ~ poa + I(poa * poa) + I(poa * t_amb) + I(poa * w_vel) - 1'
[24]:
das.fit_regression(filter=True, summary=False)
NOTE: Regression used to filter outlying points.
[25]:
das.get_summary()
[25]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| meas | filter_custom | 1424 | 16 | DataFrame.dropna, , |
| filter_irr | 552 | 872 | 200, 2000, | |
| filter_outliers | 529 | 23 | Default arguments | |
| fit_regression | 493 | 36 | filter: True, summary: False |
Calculation of Reporting Conditions
The rep_cond method provide a variety of ways to calculate reporting conditions. Using rep_cond the reporting conditions are always calculated from the data store in the data_filtered attribute. Refer to the example notebook “Reporting Conditions Examples” for a thourough explanation of the rep_cond functionality. By default the reporting conditions are calcualted following the guidance of ASTM E2939-13.
[26]:
das.rep_cond()
Reporting conditions saved to rc attribute.
poa t_amb w_vel
0 631.334142 24.753909 2.132683
Previously we used the irradiance filter to filter out data below 200 W/m2. The irradiance filter can also be used to filter irradiance based on a percentage band around a reference value. This approach is shown here to remove data where the irradiance is outside of +/- 50% of the reporting irradiance. Passing the string rep_irr to the key word argument (kwarg) ref_val uses the reporting POA irradiance stored int the rc attribute.
[27]:
das.filter_irr(0.5, 1.5, ref_val='rep_irr')
[28]:
das.scatter_hv()
[28]:
The fit_regression method is used again with the default arguments, which result in fitting the regression, printing and storing the results, but not filtering. The result of the regression is a statsmodels RegressionResultsWrapper object containing the regression coefficients and other information generated when performing the regression. This object is stored in the CapData regression_results attribute.
[29]:
das.fit_regression()
OLS Regression Results
=======================================================================================
Dep. Variable: power R-squared (uncentered): 0.999
Model: OLS Adj. R-squared (uncentered): 0.999
Method: Least Squares F-statistic: 9.872e+04
Date: Wed, 13 May 2026 Prob (F-statistic): 0.00
Time: 01:02:39 Log-Likelihood: -5170.5
No. Observations: 392 AIC: 1.035e+04
Df Residuals: 388 BIC: 1.036e+04
Df Model: 4
Covariance Type: nonrobust
==================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
poa 8043.4648 102.486 78.484 0.000 7841.968 8244.962
I(poa * poa) -0.2083 0.059 -3.514 0.000 -0.325 -0.092
I(poa * t_amb) -71.8194 4.267 -16.831 0.000 -80.209 -63.430
I(poa * w_vel) 12.7656 8.998 1.419 0.157 -4.926 30.457
==============================================================================
Omnibus: 46.720 Durbin-Watson: 0.960
Prob(Omnibus): 0.000 Jarque-Bera (JB): 63.486
Skew: -0.837 Prob(JB): 1.64e-14
Kurtosis: 4.040 Cond. No. 8.24e+03
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 8.24e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
The regression coefficients and p-values for each term are attributes available in the regression_results.
[30]:
das.regression_results.params
[30]:
poa 8043.464802
I(poa * poa) -0.208321
I(poa * t_amb) -71.819421
I(poa * w_vel) 12.765637
dtype: float64
[31]:
das.regression_results.pvalues
[31]:
poa 3.409848e-240
I(poa * poa) 4.930125e-04
I(poa * t_amb) 4.061184e-48
I(poa * w_vel) 1.568012e-01
dtype: float64
Load and Filter PVsyst Data
To load and filter the modeled data, often from PVsyst, we use the load_pvsyst method, which returns a CapData object with the pvsyst data loaded
[32]:
sim = ct.load_pvsyst('./data/pvsyst_example_HourlyRes_2.CSV')
[33]:
sim.column_groups
[33]:
wind__:
WindVel
_inv_:
EOutInv
real_pwr__:
E_Grid
temp_mod_:
TArray
temp_amb_:
T_Amb
irr_poa_:
GlobInc
index__:
index
irr_ghi_:
GlobHor
pvsyt_losses__:
IL Pmax
IL Pmin
IL Vmax
IL Vmin
shade__:
FShdBm
[34]:
sim.set_regression_cols(power='real_pwr__', poa='irr_poa_', t_amb='temp_amb_', w_vel='wind__')
[35]:
# sim.plot()
[36]:
# Write over cptest.flt_sim dataframe with a copy of the original unfiltered dataframe
sim.reset_filter()
As a first step we use the filter_time method to select a 60 day period of data centered around the measured data.
[37]:
sim.filter_time(test_date='10/11/1990', days=60)
[38]:
sim.scatter_hv()
[38]:
[39]:
sim.filter_irr(200, 930)
[40]:
sim.scatter_hv()
[40]:
[41]:
sim.get_summary()
[41]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| pvsyst | filter_time | 1441 | 7319 | test_date: 10/11/1990, days: 60 |
| filter_irr | 397 | 1044 | 200, 930, |
The filter_pvsyt method removes data for times when shade is present or the IL Pmin, IL Vmin, IL Pmax, IL Vmax output values are greater than 0.
[42]:
sim.filter_pvsyst()
[43]:
sim.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])
[44]:
sim.fit_regression()
OLS Regression Results
=======================================================================================
Dep. Variable: power R-squared (uncentered): 1.000
Model: OLS Adj. R-squared (uncentered): 1.000
Method: Least Squares F-statistic: 2.114e+06
Date: Wed, 13 May 2026 Prob (F-statistic): 0.00
Time: 01:02:39 Log-Likelihood: -3683.6
No. Observations: 319 AIC: 7375.
Df Residuals: 315 BIC: 7390.
Df Model: 4
Covariance Type: nonrobust
==================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
poa 7620.3546 16.585 459.486 0.000 7587.724 7652.985
I(poa * poa) -0.7783 0.013 -59.178 0.000 -0.804 -0.752
I(poa * t_amb) -31.3177 0.535 -58.488 0.000 -32.371 -30.264
I(poa * w_vel) -1.4710 1.252 -1.175 0.241 -3.935 0.993
==============================================================================
Omnibus: 22.643 Durbin-Watson: 2.100
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9.032
Skew: -0.131 Prob(JB): 0.0109
Kurtosis: 2.218 Cond. No. 5.85e+03
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 5.85e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Results
v0.15.0 API Changes - The capdata.get_summary function has been removed and replaced by the captest.CapTest.get_summary method.
The get_summary and captest_results_check_pvalues functions display the results of filtering on simulated and measured data and the final capacity test results comparing measured capacity to expected capacity, respectively.
For this example, which does not use a CapTest instance, you can instantiate one to call these results methods.
[45]:
# pvc.get_summary(das, sim) # As of v0.15.0 use equivalent method on CapTest class or pd.concat
# concat option
# pd.concat([
# das.get_summary(),
# sim.get_summary(),
# ])
[46]:
ts = ct.CapTest(meas=das, sim=sim, ac_nameplate=6_000, test_tolerance='+/- 7')
[47]:
ts.get_summary()
[47]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| meas | filter_custom | 1424 | 16 | DataFrame.dropna, , |
| filter_irr | 552 | 872 | 200, 2000, | |
| filter_outliers | 529 | 23 | Default arguments | |
| fit_regression | 493 | 36 | filter: True, summary: False | |
| rep_cond | 493 | 0 | Default arguments | |
| filter_irr-1 | 392 | 101 | 0.5, 1.5, ref_val: 631.334 | |
| fit_regression-1 | 392 | 0 | Default arguments | |
| pvsyst | filter_time | 1441 | 7319 | test_date: 10/11/1990, days: 60 |
| filter_irr | 397 | 1044 | 200, 930, | |
| filter_pvsyst | 397 | 0 | Default arguments | |
| filter_irr-1 | 319 | 78 | 0.5, 1.5, ref_val: np.float64(631.334) | |
| fit_regression | 319 | 0 | Default arguments |
[48]:
ts.captest_results_check_pvalues(print_res=True)
Using reporting conditions from meas.
Capacity Test Result: PASS
Modeled test output: 4009358.528
Actual test output: 3889875.683
Tested output ratio: 0.970
Tested Capacity: 5821.194
Bounds: 5580.0, 6420.0
Using reporting conditions from meas.
Capacity Test Result: PASS
Modeled test output: 4011339.189
Actual test output: 3872687.576
Tested output ratio: 0.965
Tested Capacity: 5792.610
Bounds: 5580.0, 6420.0
97.020% - Cap Ratio
96.540% - Cap Ratio after pval check
[48]:
| das_pvals | sim_pvals | das_params | sim_params | |
|---|---|---|---|---|
| poa | 0.00000 | 0.00000 | 8,043.46480 | 7,620.35460 |
| I(poa * poa) | 0.00049 | 0.00000 | -0.20832 | -0.77830 |
| I(poa * t_amb) | 0.00000 | 0.00000 | -71.81942 | -31.31767 |
| I(poa * w_vel) | 0.15680 | 0.24103 | 12.76564 | -1.47104 |
Overlaying scatter plots from the measured and PVsyst data. This plot can be generated using CapTest.overlay_scatters when running a test using an instance of CapTest.
[49]:
(
das.scatter_hv().relabel('Measured') *
sim.scatter_hv().relabel('PVsyst')
).opts(
opts.Scatter(alpha=0.3, width=600)
)
[49]: