Example Capacity Test using pvcaptest

This example goes through typical steps of performing a capacity test following the ASTM E2848 standard using the pvcaptest package.

Imports

[1]:
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import pandas as pd

# import captest as pvc
import captest as ct
from captest import capdata as pvc
from bokeh.io import output_notebook, show

# uncomment below two lines to use cptest.scatter_hv in notebook
import holoviews as hv
hv.extension('bokeh')

#if working offline with the CapData.plot() method may fail
#run 'export BOKEH_RESOURCES=inline' at the command line before
#running the jupyter notebook

output_notebook()
Loading BokehJS ...

Load and Plot Measured Data

We begin by using the load_data function, which reads the file(s) specified by the path argument and returns an instance of the CapData class. In this example we will calculate reporting conditions from the measured data, so we load and filter the measured data first.

When given the path to a file, as shown here, load_data will try to read that file. If you pass a path to a directory, load_data will look for and attempt to load all files ending with ‘.csv’ in the specified directory. Other file types can also be loaded by passing your own function to the file_reader argument and including the extension (e.g. ‘xlsx’) as a kwarg.

[2]:
das = ct.load_data('./data/example_measured_data.csv')

The load_data method loads the data into a pandas DataFrame, which it assigns to the data attribute of the CapData object. Here we use the pandas DataFrame head method to return the first three rows.

[3]:
das.data.head(3)
[3]:
met1_poa_refcell met2_poa_refcell met1_poa_pyranometer met2_poa_pyranometer met1_ghi_pyranometer met2_ghi_pyranometer met1_amb_temp met2_amb_temp met1_mod_temp1 met1_mod_temp2 ... met2_windspeed meter_power inv1_power inv2_power inv3_power inv4_power inv5_power inv6_power inv7_power inv8_power
1990-10-09 00:00:00 0.0 0.0 0.0 0.0 0.0 0.0 17.750666 17.770821 15.640355 15.663692 ... -0.007221 -8868.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0
1990-10-09 00:05:00 0.0 0.0 0.0 0.0 0.0 0.0 17.737545 17.753030 15.551920 15.676843 ... -0.007195 -8868.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0
1990-10-09 00:10:00 0.0 0.0 0.0 0.0 0.0 0.0 17.648090 17.689437 15.541516 15.414247 ... 0.002557 -8868.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0 -150.0

3 rows × 23 columns

In addition to loading data, by default the load_data function attempts to parse the column headers and group the columns based on the type of measurement recorded in each column. For each inferred measurement type, group_columns creates an abbreviated name and a list of columns that contain measurements of that type. The python dictionary created by group_columns is stored in the column_groups.data attribute. column_groups is a dictionary that display nicely and includes the groups as attibutes for easy access as shown below. If the column grouping returned is not correct, you can provide either your own function to group the columns or a yaml, json, or excel file mapping column group identifiers to the column headings.

[4]:
das.column_groups
[4]:
_inv_:
    inv1_power
    inv2_power
    inv3_power
    inv4_power
    inv5_power
    inv6_power
    inv7_power
    inv8_power
irr_ghi_pyran:
    met1_ghi_pyranometer
    met2_ghi_pyranometer
temp_mod_:
    met1_mod_temp1
    met1_mod_temp2
    met2_mod_temp1
    met2_mod_temp2
wind__:
    met1_windspeed
    met2_windspeed
irr_poa_ref_cell:
    met1_poa_refcell
    met2_poa_refcell
irr_poa_pyran:
    met1_poa_pyranometer
    met2_poa_pyranometer
temp_amb_:
    met1_amb_temp
    met2_amb_temp
_mtr_:
    meter_power

When working in an environment with table completion, like a Jupyter notebook, the group attributes are easy to access without needing to remember the group name.

[5]:
das.column_groups.irr_poa_pyran
[5]:
['met1_poa_pyranometer', 'met2_poa_pyranometer']

The CapData has the methodsloc and floc to select subsets of columns from the data and data_filtered DataFrames, respectively. These methods allow easy access to the groups of columns identified in column_groups using the column_group keys, column names, regression_cols keys, or a combination of the three. The regression_cols attribute is introduced below. The column_groups dictionary also enables much of the functionality of CapData methods to perform common capacity testing tasks, like generating scatter plots, filtering data, and performing regressions.

Using the loc method with the ‘irr_poa_ref_cell’ attribute key of column_groups to select data from the POA reference cell columns in the data DataFrame:

[6]:
das.loc[das.column_groups.irr_poa_ref_cell].iloc[100:103, :]
[6]:
met1_poa_refcell met2_poa_refcell
1990-10-09 08:20:00 534.845280 534.812723
1990-10-09 08:25:00 562.246349 553.728808
1990-10-09 08:30:00 539.686886 440.947099

Accessing the two irradiance columns of the ‘irr_poa_pyran’ group, the single column of the ‘mtr’ group, and the ‘met1_amb_temp’ column of the data_filtered DataFrame:

[7]:
das.floc[['irr_poa_pyran', '_mtr_', 'met1_amb_temp']]
[7]:
met1_poa_pyranometer met2_poa_pyranometer meter_power met1_amb_temp
1990-10-09 00:00:00 0.0 0.0 -8868.0 17.750666
1990-10-09 00:05:00 0.0 0.0 -8868.0 17.737545
1990-10-09 00:10:00 0.0 0.0 -8868.0 17.648090
1990-10-09 00:15:00 0.0 0.0 -8868.0 17.641772
1990-10-09 00:20:00 0.0 0.0 -8868.0 17.616870
... ... ... ... ...
1990-10-13 23:35:00 0.0 0.0 -8868.0 20.037996
1990-10-13 23:40:00 0.0 0.0 -8868.0 19.893458
1990-10-13 23:45:00 0.0 0.0 -8868.0 19.750967
1990-10-13 23:50:00 0.0 0.0 -8868.0 19.585205
1990-10-13 23:55:00 0.0 0.0 -8868.0 19.459186

1440 rows × 4 columns

pvcaptest does not attempt to determine which columns of data or groups of columns are the data to be used in the regressions. The link between regression variables and the imported data is made by a dictionary stored in the regression_cols attribute. pvcaptest provides the convience method set_regression_cols for this purpose. regression_cols should be set immediately after loading data as many other CapData methods rely on this attribute.

[8]:
das.set_regression_cols(power='_mtr_', poa='irr_poa_pyran', t_amb='temp_amb_', w_vel='wind__')

Once the regression columns are set, the loc or floc methods will return the data for each type of sensor identified in the column_groups attribute.

Here we are accessing the same POA irradiance data as above with loc and the group name, but now using the regression variable id.

[9]:
das.loc['poa'].iloc[100:103, :]
[9]:
met1_poa_pyranometer met2_poa_pyranometer
1990-10-09 08:20:00 536.301164 541.617538
1990-10-09 08:25:00 558.687967 559.395854
1990-10-09 08:30:00 475.935337 563.036604

For datasets that have multiple measurements of the same value, like the two POA irradiance measurements in this sample data, these values must be aggregated prior to filtering or regressing the data. The agg_sensors method provides a convient way to do this for all the groups of measurements in column_groups in one step.

The desired aggregations are specified by passing a dictionary to the agg_map argument where the keys are groups from column_groups and the values are aggregation functions. Here we are using string functions that are recognized by pandas. Most of the common aggregation functions (mean, median, max, sum, min, etc.) are available as string functions. If you would like to apply a different aggregation function, please refer to the pandas documentation for DataFrame.agg. By default, the agg_sensors method adds a new column to the dataframe in the data attribute for the results of each aggregation and copies over the data_filtered attribute with the new dataframe.

There is a also a method, filter_sensors, for filtering data on comparisons between measurements of the same value described below.

[10]:
das.agg_sensors(agg_map={'_inv_':'sum', 'irr_poa_pyran':'mean', 'temp_amb_':'mean', 'wind__':'mean'})
Regression variable 'poa' has been remapped: 'irr_poa_pyran' to 'irr_poa_pyran_mean_agg'
Regression variable 't_amb' has been remapped: 'temp_amb_' to 'temp_amb__mean_agg'
Regression variable 'w_vel' has been remapped: 'wind__' to 'wind___mean_agg'

The plot method creates a dashboard with a group of time series plots that are useful for visually inspecting the imported data.

plot uses the structure of the column_groups attribute to create a layout of plots. A single plot is generated for each measurement type and each column with measurements of that type are plotted as a separate line on the plot. In this example there are two different weather stations, which each have pyranometers measuring plane of array and global horizontal irradiance. This arrangement of sensors results in two plots which each have two lines.

Note, the full functionality of the dashboard requires a live notebook. Try installing to run or using the launch binder button at the top of the page.

[11]:
combine = {'inv_sum_mtr_pwr': ['mtr', 'inv.*agg'], 'irr_all':['irr_poa', 'irr_ghi'], 'temp_all':['temp_amb', 'temp_mod']}
default_groups = ['inv_sum_mtr_pwr', 'irr_all', 'temp_all']
das.plot(combine=combine, default_groups=default_groups, width=900)
[11]:

Filtering Measured Data

The CapData class provides a number of convience methods to apply filtering steps as defined in ASTM E2848. The following section demonstrates the use of the more commonly used filtering steps to remove measured data points.

[12]:
# Uncomment and run to copy over the filtered dataset with the unfiltered data.
das.reset_filter()

A common first step is to review the scatter plot of the POA irradiance against the power production.

If you have the optional dependency Holoviews installed, scatter_hv will return an interactive scatter plot. Additionally, scatter_hv includes an option to return a timeseries plot of power that is linked to the scatter plot, so points selected in the scatter plot will be highlighted in the time series.

[13]:
# Uncomment the below line to use scatter_hv with linked time series
das.scatter_hv(timeseries=True)
[13]:

In this example, we have multiple measurements of the same value from different sensors. In this case a common first step is to compare measurements from the different sensors and remove data for timestamps where the measurements differ above some acceptable threshold. The filter_sensors method provides a convient method to accomplish this taks for the groups of measurements identified as regression values.

[14]:
das.filter_sensors()

The get_summary method will return a dataframe summarizing the filtering steps that have been applied, the agruments passed to them, the number of points prior to filtering, and the number of points after filtering.

[15]:
das.get_summary()
[15]:
pts_after_filter pts_removed filter_arguments
meas filter_sensors 1242 198 Default arguments

The filter_custom method provides a way to use your own filtering method within captest and update the summary data. The filter_custom method allows passing any function or method that takes a DataFrame as the first argument and returns a filtered dataframe with rows removed. Passed methods can be user-defined functions or Pandas DataFrame methods.

Below, we use the filter_custom method with the pandas DataFrame dropna method to removing missing data and update the summary data.

[16]:
das.filter_custom(pd.DataFrame.dropna)

The filter_irr method provides a convient way to remove remove data based on the irradiance measurments. Here we use it to remove periods of low irradiance. Values greater than 2000 W/m2 will also be removed, if present.

[17]:
das.get_summary()
[17]:
pts_after_filter pts_removed filter_arguments
meas filter_sensors 1242 198 Default arguments
filter_custom 1233 9 DataFrame.dropna, ,
[18]:
das.filter_irr(200, 2000)

We can re-run the scatter method to see the results of the filtering steps.

[19]:
das.scatter_hv()
[19]:

The filter_outliers method uses scikit-learn’s elliptic envelope to remove outlier points. A future release will include a way to interactively select points to be removed.

[20]:
das.filter_outliers()
[21]:
das.scatter_hv()
[21]:

The fit_regression method performs a regression on the data stored in data_filtered using the regression equation specified by the standard. The regression equation is stored in the regression_formula attribute as shown below. Regressions are performed using the statsmodels package.

Below, we set the filter argument of the fit_regression method to True to remove time periods when the residual exceeds two standard deviations of the mean residual.

[22]:
das.regression_formula
[22]:
'power ~ poa + I(poa * poa) + I(poa * t_amb) + I(poa * w_vel) - 1'
[23]:
das.fit_regression(filter=True, summary=False)
NOTE: Regression used to filter outlying points.


[24]:
das.get_summary()
[24]:
pts_after_filter pts_removed filter_arguments
meas filter_sensors 1242 198 Default arguments
filter_custom 1233 9 DataFrame.dropna, ,
filter_irr 418 815 200, 2000,
filter_outliers 401 17 Default arguments
fit_regression 378 23 filter: True, summary: False

Calculation of Reporting Conditions

The rep_cond method provide a variety of ways to calculate reporting conditions. Using rep_cond the reporting conditions are always calculated from the data store in the df_flt attribute. Refer to the example notebook “Reporting Conditions Examples” for a thourough explanation of the rep_cond functionality. By default the reporting conditions are calcualted following the guidance of ASTM E2939-13.

[25]:
das.rep_cond()
Reporting conditions saved to rc attribute.
          poa      t_amb     w_vel
0  768.613558  24.108496  1.991954

Previously we used the irradiance filter to filter out data below 200 W/m2. The irradiance filter can also be used to filter irradiance based on a percentage band around a reference value. This approach is shown here to remove data where the irradiance is outside of +/- 50% of the reporting irradiance.

[26]:
das.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])
[27]:
das.scatter_hv()
[27]:

The fit_regression method is used again with the default arguments, which result in fitting the regression, printing and storing the results, but not filtering. The result of the regression is a statsmodels RegressionResultsWrapper object containing the regression coefficients and other information generated when performing the regression. This object is stored in the CapData regression_results attribute.

[28]:
das.fit_regression()
                                 OLS Regression Results
=======================================================================================
Dep. Variable:                  power   R-squared (uncentered):                   1.000
Model:                            OLS   Adj. R-squared (uncentered):              1.000
Method:                 Least Squares   F-statistic:                          4.328e+05
Date:                Fri, 06 Feb 2026   Prob (F-statistic):                        0.00
Time:                        18:00:00   Log-Likelihood:                         -3710.1
No. Observations:                 298   AIC:                                      7428.
Df Residuals:                     294   BIC:                                      7443.
Df Model:                           4
Covariance Type:            nonrobust
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
poa             7757.9322     54.443    142.495      0.000    7650.784    7865.081
I(poa * poa)      -0.4610      0.033    -13.877      0.000      -0.526      -0.396
I(poa * t_amb)   -50.7082      2.302    -22.029      0.000     -55.238     -46.178
I(poa * w_vel)    13.1425      4.424      2.971      0.003       4.435      21.850
==============================================================================
Omnibus:                       51.658   Durbin-Watson:                   0.398
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               81.823
Skew:                          -1.017   Prob(JB):                     1.71e-18
Kurtosis:                       4.566   Cond. No.                     9.88e+03
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 9.88e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

The regression coefficients and p-values for each term are attributes available in the regression_results.

[29]:
das.regression_results.params
[29]:
poa               7757.932173
I(poa * poa)        -0.460987
I(poa * t_amb)     -50.708228
I(poa * w_vel)      13.142495
dtype: float64
[30]:
das.regression_results.pvalues
[30]:
poa               2.413291e-273
I(poa * poa)       5.054412e-34
I(poa * t_amb)     3.440838e-64
I(poa * w_vel)     3.217387e-03
dtype: float64

Load and Filter PVsyst Data

To load and filter the modeled data, often from PVsyst, we use the load_pvsyst method, which returns a CapData object with the pvsyst data loaded

[31]:
sim = ct.load_pvsyst('./data/pvsyst_example_HourlyRes_2.CSV')
[32]:
sim.column_groups
[32]:
shade__:
    FShdBm
_inv_:
    EOutInv
irr_ghi_:
    GlobHor
pvsyt_losses__:
    IL Pmax
    IL Pmin
    IL Vmax
    IL Vmin
temp_mod_:
    TArray
wind__:
    WindVel
temp_amb_:
    T_Amb
index__:
    index
irr_poa_:
    GlobInc
real_pwr__:
    E_Grid
[33]:
sim.set_regression_cols(power='real_pwr__', poa='irr_poa_', t_amb='temp_amb_', w_vel='wind__')
[34]:
# sim.plot()
[35]:
# Write over cptest.flt_sim dataframe with a copy of the original unfiltered dataframe
sim.reset_filter()

As a first step we use the filter_time method to select a 60 day period of data centered around the measured data.

[36]:
sim.filter_time(test_date='10/11/1990', days=60)
[37]:
sim.scatter_hv()
[37]:
[38]:
sim.filter_irr(200, 930)
[39]:
sim.scatter_hv()
[39]:
[40]:
sim.get_summary()
[40]:
pts_after_filter pts_removed filter_arguments
pvsyst filter_time 1441 7319 test_date: 10/11/1990, days: 60
filter_irr 397 1044 200, 930,

The filter_pvsyt method removes data for times when shade is present or the IL Pmin, IL Vmin, IL Pmax, IL Vmax output values are greater than 0.

[41]:
sim.filter_pvsyst()
[42]:
sim.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])
[43]:
sim.fit_regression()
                                 OLS Regression Results
=======================================================================================
Dep. Variable:                  power   R-squared (uncentered):                   1.000
Model:                            OLS   Adj. R-squared (uncentered):              1.000
Method:                 Least Squares   F-statistic:                          2.609e+06
Date:                Fri, 06 Feb 2026   Prob (F-statistic):                        0.00
Time:                        18:00:00   Log-Likelihood:                         -3221.5
No. Observations:                 282   AIC:                                      6451.
Df Residuals:                     278   BIC:                                      6465.
Df Model:                           4
Covariance Type:            nonrobust
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
poa             7667.3056     15.520    494.023      0.000    7636.754    7697.858
I(poa * poa)      -0.8365      0.013    -64.861      0.000      -0.862      -0.811
I(poa * t_amb)   -31.3616      0.484    -64.768      0.000     -32.315     -30.408
I(poa * w_vel)    -1.2114      1.132     -1.070      0.286      -3.440       1.017
==============================================================================
Omnibus:                       24.535   Durbin-Watson:                   2.008
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                9.460
Skew:                          -0.176   Prob(JB):                      0.00883
Kurtosis:                       2.175   Cond. No.                     6.16e+03
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 6.16e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Results

The get_summary and captest_results_check_pvalues functions display the results of filtering on simulated and measured data and the final capacity test results comparing measured capacity to expected capacity, respectively.

[44]:
pvc.get_summary(das, sim)
[44]:
pts_after_filter pts_removed filter_arguments
meas filter_sensors 1242 198 Default arguments
filter_custom 1233 9 DataFrame.dropna, ,
filter_irr 418 815 200, 2000,
filter_outliers 401 17 Default arguments
fit_regression 378 23 filter: True, summary: False
rep_cond 378 0 Default arguments
filter_irr-1 298 80 0.5, 1.5, ref_val: np.float64(768.614)
fit_regression-1 298 0 Default arguments
pvsyst filter_time 1441 7319 test_date: 10/11/1990, days: 60
filter_irr 397 1044 200, 930,
filter_pvsyst 397 0 Default arguments
filter_irr-1 282 115 0.5, 1.5, ref_val: np.float64(768.614)
fit_regression 282 0 Default arguments
[45]:
pvc.captest_results_check_pvalues(sim, das, 6000, '+/- 7', print_res=True)
Using reporting conditions from das.

Capacity Test Result:         PASS
Modeled test output:          4816002.984
Actual test output:           4771008.279
Tested output ratio:          0.991
Tested Capacity:              5943.944
Bounds:                       5580.0, 6420.0


Using reporting conditions from das.

Capacity Test Result:         PASS
Modeled test output:          4817857.760
Actual test output:           4771008.279
Tested output ratio:          0.990
Tested Capacity:              5941.655
Bounds:                       5580.0, 6420.0


99.070% - Cap Ratio
99.030% - Cap Ratio after pval check
[45]:
  das_pvals sim_pvals das_params sim_params
poa 0.00000 0.00000 7,757.93217 7,667.30564
I(poa * poa) 0.00000 0.00000 -0.46099 -0.83655
I(poa * t_amb) 0.00000 0.00000 -50.70823 -31.36155
I(poa * w_vel) 0.00322 0.28555 13.14250 -1.21145

Uncomment and run the below lines to produce a scatter plot overlaying the final measured and PVsyst data.

[46]:
%%opts Scatter (alpha=0.3)
%%opts Scatter [width=600]
das.scatter_hv().relabel('Measured') * sim.scatter_hv().relabel('PVsyst')
[46]: