This page was generated from /home/docs/checkouts/readthedocs.org/user_builds/pvcaptest/checkouts/v0.13.3/docs/examples/complete_capacity_test.ipynb.
Interactive online version:
Example Capacity Test using pvcaptest
This example goes through typical steps of performing a capacity test following the ASTM E2848 standard using the pvcaptest package.
Imports
[1]:
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
# import captest as pvc
import captest as ct
from captest import capdata as pvc
from bokeh.io import output_notebook, show
# uncomment below two lines to use cptest.scatter_hv in notebook
import holoviews as hv
hv.extension('bokeh')
#if working offline with the CapData.plot() method may fail
#run 'export BOKEH_RESOURCES=inline' at the command line before
#running the jupyter notebook
output_notebook()
Load and Plot Measured Data
We begin by using the load_data function, which reads the file(s) specified by the path argument and returns an instance of the CapData class. In this example we will calculate reporting conditions from the measured data, so we load and filter the measured data first.
When given the path to a file, as shown here, load_data will try to read that file. If you pass a path to a directory, load_data will look for and attempt to load all files ending with ‘.csv’ in the specified directory. Other file types can also be loaded by passing your own function to the file_reader argument and including the extension (e.g. ‘xlsx’) as a kwarg.
[2]:
das = ct.load_data('./data/example_measured_data.csv')
The load_data method loads the data into a pandas DataFrame, which it assigns to the data attribute of the CapData object. Here we use the pandas DataFrame head method to return the first three rows.
[3]:
das.data.head(3)
[3]:
| met1_poa_refcell | met2_poa_refcell | met1_poa_pyranometer | met2_poa_pyranometer | met1_ghi_pyranometer | met2_ghi_pyranometer | met1_amb_temp | met2_amb_temp | met1_mod_temp1 | met1_mod_temp2 | ... | met2_windspeed | meter_power | inv1_power | inv2_power | inv3_power | inv4_power | inv5_power | inv6_power | inv7_power | inv8_power | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1990-10-09 00:00:00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.750666 | 17.770821 | 15.640355 | 15.663692 | ... | -0.007221 | -8868.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 |
| 1990-10-09 00:05:00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.737545 | 17.753030 | 15.551920 | 15.676843 | ... | -0.007195 | -8868.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 |
| 1990-10-09 00:10:00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.648090 | 17.689437 | 15.541516 | 15.414247 | ... | 0.002557 | -8868.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 | -150.0 |
3 rows × 23 columns
In addition to loading data, by default the load_data function attempts to parse the column headers and group the columns based on the type of measurement recorded in each column. For each inferred measurement type, group_columns creates an abbreviated name and a list of columns that contain measurements of that type. The python dictionary created by group_columns is stored in the column_groups.data attribute. column_groups is a dictionary that display nicely and includes the
groups as attibutes for easy access as shown below. If the column grouping returned is not correct, you can provide either your own function to group the columns or a yaml, json, or excel file mapping column group identifiers to the column headings.
[4]:
das.column_groups
[4]:
_inv_:
inv1_power
inv2_power
inv3_power
inv4_power
inv5_power
inv6_power
inv7_power
inv8_power
irr_ghi_pyran:
met1_ghi_pyranometer
met2_ghi_pyranometer
temp_mod_:
met1_mod_temp1
met1_mod_temp2
met2_mod_temp1
met2_mod_temp2
wind__:
met1_windspeed
met2_windspeed
irr_poa_ref_cell:
met1_poa_refcell
met2_poa_refcell
irr_poa_pyran:
met1_poa_pyranometer
met2_poa_pyranometer
temp_amb_:
met1_amb_temp
met2_amb_temp
_mtr_:
meter_power
When working in an environment with table completion, like a Jupyter notebook, the group attributes are easy to access without needing to remember the group name.
[5]:
das.column_groups.irr_poa_pyran
[5]:
['met1_poa_pyranometer', 'met2_poa_pyranometer']
The CapData has the methodsloc and floc to select subsets of columns from the data and data_filtered DataFrames, respectively. These methods allow easy access to the groups of columns identified in column_groups using the column_group keys, column names, regression_cols keys, or a combination of the three. The regression_cols attribute is introduced below. The column_groups dictionary also enables much of the functionality of CapData methods to perform
common capacity testing tasks, like generating scatter plots, filtering data, and performing regressions.
Using the loc method with the ‘irr_poa_ref_cell’ attribute key of column_groups to select data from the POA reference cell columns in the data DataFrame:
[6]:
das.loc[das.column_groups.irr_poa_ref_cell].iloc[100:103, :]
[6]:
| met1_poa_refcell | met2_poa_refcell | |
|---|---|---|
| 1990-10-09 08:20:00 | 534.845280 | 534.812723 |
| 1990-10-09 08:25:00 | 562.246349 | 553.728808 |
| 1990-10-09 08:30:00 | 539.686886 | 440.947099 |
Accessing the two irradiance columns of the ‘irr_poa_pyran’ group, the single column of the ‘mtr’ group, and the ‘met1_amb_temp’ column of the data_filtered DataFrame:
[7]:
das.floc[['irr_poa_pyran', '_mtr_', 'met1_amb_temp']]
[7]:
| met1_poa_pyranometer | met2_poa_pyranometer | meter_power | met1_amb_temp | |
|---|---|---|---|---|
| 1990-10-09 00:00:00 | 0.0 | 0.0 | -8868.0 | 17.750666 |
| 1990-10-09 00:05:00 | 0.0 | 0.0 | -8868.0 | 17.737545 |
| 1990-10-09 00:10:00 | 0.0 | 0.0 | -8868.0 | 17.648090 |
| 1990-10-09 00:15:00 | 0.0 | 0.0 | -8868.0 | 17.641772 |
| 1990-10-09 00:20:00 | 0.0 | 0.0 | -8868.0 | 17.616870 |
| ... | ... | ... | ... | ... |
| 1990-10-13 23:35:00 | 0.0 | 0.0 | -8868.0 | 20.037996 |
| 1990-10-13 23:40:00 | 0.0 | 0.0 | -8868.0 | 19.893458 |
| 1990-10-13 23:45:00 | 0.0 | 0.0 | -8868.0 | 19.750967 |
| 1990-10-13 23:50:00 | 0.0 | 0.0 | -8868.0 | 19.585205 |
| 1990-10-13 23:55:00 | 0.0 | 0.0 | -8868.0 | 19.459186 |
1440 rows × 4 columns
pvcaptest does not attempt to determine which columns of data or groups of columns are the data to be used in the regressions. The link between regression variables and the imported data is made by a dictionary stored in the regression_cols attribute. pvcaptest provides the convience method set_regression_cols for this purpose. regression_cols should be set immediately after loading data as many other CapData methods rely on this attribute.
[8]:
das.set_regression_cols(power='_mtr_', poa='irr_poa_pyran', t_amb='temp_amb_', w_vel='wind__')
Once the regression columns are set, the loc or floc methods will return the data for each type of sensor identified in the column_groups attribute.
Here we are accessing the same POA irradiance data as above with loc and the group name, but now using the regression variable id.
[9]:
das.loc['poa'].iloc[100:103, :]
[9]:
| met1_poa_pyranometer | met2_poa_pyranometer | |
|---|---|---|
| 1990-10-09 08:20:00 | 536.301164 | 541.617538 |
| 1990-10-09 08:25:00 | 558.687967 | 559.395854 |
| 1990-10-09 08:30:00 | 475.935337 | 563.036604 |
For datasets that have multiple measurements of the same value, like the two POA irradiance measurements in this sample data, these values must be aggregated prior to filtering or regressing the data. The agg_sensors method provides a convient way to do this for all the groups of measurements in column_groups in one step.
The desired aggregations are specified by passing a dictionary to the agg_map argument where the keys are groups from column_groups and the values are aggregation functions. Here we are using string functions that are recognized by pandas. Most of the common aggregation functions (mean, median, max, sum, min, etc.) are available as string functions. If you would like to apply a different aggregation function, please refer to the pandas documentation for DataFrame.agg. By default, the
agg_sensors method adds a new column to the dataframe in the data attribute for the results of each aggregation and copies over the data_filtered attribute with the new dataframe.
There is a also a method, filter_sensors, for filtering data on comparisons between measurements of the same value described below.
[10]:
das.agg_sensors(agg_map={'_inv_':'sum', 'irr_poa_pyran':'mean', 'temp_amb_':'mean', 'wind__':'mean'})
Regression variable 'poa' has been remapped: 'irr_poa_pyran' to 'irr_poa_pyran_mean_agg'
Regression variable 't_amb' has been remapped: 'temp_amb_' to 'temp_amb__mean_agg'
Regression variable 'w_vel' has been remapped: 'wind__' to 'wind___mean_agg'
The plot method creates a dashboard with a group of time series plots that are useful for visually inspecting the imported data.
plot uses the structure of the column_groups attribute to create a layout of plots. A single plot is generated for each measurement type and each column with measurements of that type are plotted as a separate line on the plot. In this example there are two different weather stations, which each have pyranometers measuring plane of array and global horizontal irradiance. This arrangement of sensors results in two plots which each have two lines.
Note, the full functionality of the dashboard requires a live notebook. Try installing to run or using the launch binder button at the top of the page.
[11]:
combine = {'inv_sum_mtr_pwr': ['mtr', 'inv.*agg'], 'irr_all':['irr_poa', 'irr_ghi'], 'temp_all':['temp_amb', 'temp_mod']}
default_groups = ['inv_sum_mtr_pwr', 'irr_all', 'temp_all']
das.plot(combine=combine, default_groups=default_groups, width=900)
[11]:
Filtering Measured Data
The CapData class provides a number of convience methods to apply filtering steps as defined in ASTM E2848. The following section demonstrates the use of the more commonly used filtering steps to remove measured data points.
[12]:
# Uncomment and run to copy over the filtered dataset with the unfiltered data.
das.reset_filter()
A common first step is to review the scatter plot of the POA irradiance against the power production.
If you have the optional dependency Holoviews installed, scatter_hv will return an interactive scatter plot. Additionally, scatter_hv includes an option to return a timeseries plot of power that is linked to the scatter plot, so points selected in the scatter plot will be highlighted in the time series.
[13]:
# Uncomment the below line to use scatter_hv with linked time series
das.scatter_hv(timeseries=True)
[13]:
In this example, we have multiple measurements of the same value from different sensors. In this case a common first step is to compare measurements from the different sensors and remove data for timestamps where the measurements differ above some acceptable threshold. The filter_sensors method provides a convient method to accomplish this taks for the groups of measurements identified as regression values.
[14]:
das.filter_sensors()
The get_summary method will return a dataframe summarizing the filtering steps that have been applied, the agruments passed to them, the number of points prior to filtering, and the number of points after filtering.
[15]:
das.get_summary()
[15]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| meas | filter_sensors | 1242 | 198 | Default arguments |
The filter_custom method provides a way to use your own filtering method within captest and update the summary data. The filter_custom method allows passing any function or method that takes a DataFrame as the first argument and returns a filtered dataframe with rows removed. Passed methods can be user-defined functions or Pandas DataFrame methods.
Below, we use the filter_custom method with the pandas DataFrame dropna method to removing missing data and update the summary data.
[16]:
das.filter_custom(pd.DataFrame.dropna)
The filter_irr method provides a convient way to remove remove data based on the irradiance measurments. Here we use it to remove periods of low irradiance. Values greater than 2000 W/m2 will also be removed, if present.
[17]:
das.get_summary()
[17]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| meas | filter_sensors | 1242 | 198 | Default arguments |
| filter_custom | 1233 | 9 | DataFrame.dropna, , |
[18]:
das.filter_irr(200, 2000)
We can re-run the scatter method to see the results of the filtering steps.
[19]:
das.scatter_hv()
[19]:
The filter_outliers method uses scikit-learn’s elliptic envelope to remove outlier points. A future release will include a way to interactively select points to be removed.
[20]:
das.filter_outliers()
[21]:
das.scatter_hv()
[21]:
The fit_regression method performs a regression on the data stored in data_filtered using the regression equation specified by the standard. The regression equation is stored in the regression_formula attribute as shown below. Regressions are performed using the statsmodels package.
Below, we set the filter argument of the fit_regression method to True to remove time periods when the residual exceeds two standard deviations of the mean residual.
[22]:
das.regression_formula
[22]:
'power ~ poa + I(poa * poa) + I(poa * t_amb) + I(poa * w_vel) - 1'
[23]:
das.fit_regression(filter=True, summary=False)
NOTE: Regression used to filter outlying points.
[24]:
das.get_summary()
[24]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| meas | filter_sensors | 1242 | 198 | Default arguments |
| filter_custom | 1233 | 9 | DataFrame.dropna, , | |
| filter_irr | 418 | 815 | 200, 2000, | |
| filter_outliers | 401 | 17 | Default arguments | |
| fit_regression | 378 | 23 | filter: True, summary: False |
Calculation of Reporting Conditions
The rep_cond method provide a variety of ways to calculate reporting conditions. Using rep_cond the reporting conditions are always calculated from the data store in the df_flt attribute. Refer to the example notebook “Reporting Conditions Examples” for a thourough explanation of the rep_cond functionality. By default the reporting conditions are calcualted following the guidance of ASTM E2939-13.
[25]:
das.rep_cond()
Reporting conditions saved to rc attribute.
poa t_amb w_vel
0 768.613558 24.108496 1.991954
Previously we used the irradiance filter to filter out data below 200 W/m2. The irradiance filter can also be used to filter irradiance based on a percentage band around a reference value. This approach is shown here to remove data where the irradiance is outside of +/- 50% of the reporting irradiance.
[26]:
das.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])
[27]:
das.scatter_hv()
[27]:
The fit_regression method is used again with the default arguments, which result in fitting the regression, printing and storing the results, but not filtering. The result of the regression is a statsmodels RegressionResultsWrapper object containing the regression coefficients and other information generated when performing the regression. This object is stored in the CapData regression_results attribute.
[28]:
das.fit_regression()
OLS Regression Results
=======================================================================================
Dep. Variable: power R-squared (uncentered): 1.000
Model: OLS Adj. R-squared (uncentered): 1.000
Method: Least Squares F-statistic: 4.328e+05
Date: Fri, 06 Feb 2026 Prob (F-statistic): 0.00
Time: 18:00:00 Log-Likelihood: -3710.1
No. Observations: 298 AIC: 7428.
Df Residuals: 294 BIC: 7443.
Df Model: 4
Covariance Type: nonrobust
==================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
poa 7757.9322 54.443 142.495 0.000 7650.784 7865.081
I(poa * poa) -0.4610 0.033 -13.877 0.000 -0.526 -0.396
I(poa * t_amb) -50.7082 2.302 -22.029 0.000 -55.238 -46.178
I(poa * w_vel) 13.1425 4.424 2.971 0.003 4.435 21.850
==============================================================================
Omnibus: 51.658 Durbin-Watson: 0.398
Prob(Omnibus): 0.000 Jarque-Bera (JB): 81.823
Skew: -1.017 Prob(JB): 1.71e-18
Kurtosis: 4.566 Cond. No. 9.88e+03
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 9.88e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
The regression coefficients and p-values for each term are attributes available in the regression_results.
[29]:
das.regression_results.params
[29]:
poa 7757.932173
I(poa * poa) -0.460987
I(poa * t_amb) -50.708228
I(poa * w_vel) 13.142495
dtype: float64
[30]:
das.regression_results.pvalues
[30]:
poa 2.413291e-273
I(poa * poa) 5.054412e-34
I(poa * t_amb) 3.440838e-64
I(poa * w_vel) 3.217387e-03
dtype: float64
Load and Filter PVsyst Data
To load and filter the modeled data, often from PVsyst, we use the load_pvsyst method, which returns a CapData object with the pvsyst data loaded
[31]:
sim = ct.load_pvsyst('./data/pvsyst_example_HourlyRes_2.CSV')
[32]:
sim.column_groups
[32]:
shade__:
FShdBm
_inv_:
EOutInv
irr_ghi_:
GlobHor
pvsyt_losses__:
IL Pmax
IL Pmin
IL Vmax
IL Vmin
temp_mod_:
TArray
wind__:
WindVel
temp_amb_:
T_Amb
index__:
index
irr_poa_:
GlobInc
real_pwr__:
E_Grid
[33]:
sim.set_regression_cols(power='real_pwr__', poa='irr_poa_', t_amb='temp_amb_', w_vel='wind__')
[34]:
# sim.plot()
[35]:
# Write over cptest.flt_sim dataframe with a copy of the original unfiltered dataframe
sim.reset_filter()
As a first step we use the filter_time method to select a 60 day period of data centered around the measured data.
[36]:
sim.filter_time(test_date='10/11/1990', days=60)
[37]:
sim.scatter_hv()
[37]:
[38]:
sim.filter_irr(200, 930)
[39]:
sim.scatter_hv()
[39]:
[40]:
sim.get_summary()
[40]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| pvsyst | filter_time | 1441 | 7319 | test_date: 10/11/1990, days: 60 |
| filter_irr | 397 | 1044 | 200, 930, |
The filter_pvsyt method removes data for times when shade is present or the IL Pmin, IL Vmin, IL Pmax, IL Vmax output values are greater than 0.
[41]:
sim.filter_pvsyst()
[42]:
sim.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])
[43]:
sim.fit_regression()
OLS Regression Results
=======================================================================================
Dep. Variable: power R-squared (uncentered): 1.000
Model: OLS Adj. R-squared (uncentered): 1.000
Method: Least Squares F-statistic: 2.609e+06
Date: Fri, 06 Feb 2026 Prob (F-statistic): 0.00
Time: 18:00:00 Log-Likelihood: -3221.5
No. Observations: 282 AIC: 6451.
Df Residuals: 278 BIC: 6465.
Df Model: 4
Covariance Type: nonrobust
==================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
poa 7667.3056 15.520 494.023 0.000 7636.754 7697.858
I(poa * poa) -0.8365 0.013 -64.861 0.000 -0.862 -0.811
I(poa * t_amb) -31.3616 0.484 -64.768 0.000 -32.315 -30.408
I(poa * w_vel) -1.2114 1.132 -1.070 0.286 -3.440 1.017
==============================================================================
Omnibus: 24.535 Durbin-Watson: 2.008
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9.460
Skew: -0.176 Prob(JB): 0.00883
Kurtosis: 2.175 Cond. No. 6.16e+03
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 6.16e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Results
The get_summary and captest_results_check_pvalues functions display the results of filtering on simulated and measured data and the final capacity test results comparing measured capacity to expected capacity, respectively.
[44]:
pvc.get_summary(das, sim)
[44]:
| pts_after_filter | pts_removed | filter_arguments | ||
|---|---|---|---|---|
| meas | filter_sensors | 1242 | 198 | Default arguments |
| filter_custom | 1233 | 9 | DataFrame.dropna, , | |
| filter_irr | 418 | 815 | 200, 2000, | |
| filter_outliers | 401 | 17 | Default arguments | |
| fit_regression | 378 | 23 | filter: True, summary: False | |
| rep_cond | 378 | 0 | Default arguments | |
| filter_irr-1 | 298 | 80 | 0.5, 1.5, ref_val: np.float64(768.614) | |
| fit_regression-1 | 298 | 0 | Default arguments | |
| pvsyst | filter_time | 1441 | 7319 | test_date: 10/11/1990, days: 60 |
| filter_irr | 397 | 1044 | 200, 930, | |
| filter_pvsyst | 397 | 0 | Default arguments | |
| filter_irr-1 | 282 | 115 | 0.5, 1.5, ref_val: np.float64(768.614) | |
| fit_regression | 282 | 0 | Default arguments |
[45]:
pvc.captest_results_check_pvalues(sim, das, 6000, '+/- 7', print_res=True)
Using reporting conditions from das.
Capacity Test Result: PASS
Modeled test output: 4816002.984
Actual test output: 4771008.279
Tested output ratio: 0.991
Tested Capacity: 5943.944
Bounds: 5580.0, 6420.0
Using reporting conditions from das.
Capacity Test Result: PASS
Modeled test output: 4817857.760
Actual test output: 4771008.279
Tested output ratio: 0.990
Tested Capacity: 5941.655
Bounds: 5580.0, 6420.0
99.070% - Cap Ratio
99.030% - Cap Ratio after pval check
[45]:
| das_pvals | sim_pvals | das_params | sim_params | |
|---|---|---|---|---|
| poa | 0.00000 | 0.00000 | 7,757.93217 | 7,667.30564 |
| I(poa * poa) | 0.00000 | 0.00000 | -0.46099 | -0.83655 |
| I(poa * t_amb) | 0.00000 | 0.00000 | -50.70823 | -31.36155 |
| I(poa * w_vel) | 0.00322 | 0.28555 | 13.14250 | -1.21145 |
Uncomment and run the below lines to produce a scatter plot overlaying the final measured and PVsyst data.
[46]:
%%opts Scatter (alpha=0.3)
%%opts Scatter [width=600]
das.scatter_hv().relabel('Measured') * sim.scatter_hv().relabel('PVsyst')
[46]: