{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example Capacity Test using pvcaptest\n",
    "This example goes through typical steps of performing a capacity test following the ASTM E2848 standard using the pvcaptest package.\n",
    "\n",
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "# import captest as pvc\n",
    "import captest as ct\n",
    "from captest import capdata as pvc\n",
    "from bokeh.io import output_notebook, show\n",
    "\n",
    "# uncomment below two lines to use cptest.scatter_hv in notebook\n",
    "import holoviews as hv\n",
    "from holoviews import opts\n",
    "hv.extension('bokeh')\n",
    "\n",
    "#if working offline with the CapData.plot() method may fail\n",
    "#run 'export BOKEH_RESOURCES=inline' at the command line before\n",
    "#running the jupyter notebook\n",
    "\n",
    "output_notebook()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load and Plot Measured Data\n",
    "\n",
    "We begin by using the `load_data` function, which reads the file(s) specified by the `path` argument and returns an instance of the `CapData` class.  In this example we will calculate reporting conditions from the measured data, so we load and filter the measured data first.\n",
    "\n",
    "When given the path to a file, as shown here, `load_data` will try to read that file. If you pass a path to a directory, `load_data` will look for and attempt to load all files ending with '.csv' in the specified directory. Other file types can also be loaded by passing your own function to the `file_reader` argument and including the extension (e.g. 'xlsx') as a kwarg."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das = ct.load_data('./data/example_measured_data.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `load_data` method loads the data into a pandas DataFrame, which it assigns to the `data` attribute of the `CapData` object.  Here we use the pandas DataFrame `head` method to return the first three rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.data.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to loading data, by default the `load_data` function attempts to parse the column headers and group the columns based on the type of measurement recorded in each column.  For each inferred measurement type, `group_columns` creates an abbreviated name and a list of columns that contain measurements of that type. The python dictionary created by `group_columns` is stored in the `column_groups.data` attribute. `column_groups` is a dictionary that display nicely and includes the groups as attibutes for easy access as shown below. If the column grouping returned is not correct, you can provide either your own function to group the columns or a yaml, json, or excel file mapping column group identifiers to the column headings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.column_groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When working in an environment with table completion, like a Jupyter notebook, the group attributes are easy to access without needing to remember the group name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.column_groups.irr_poa_pyran"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The CapData has the methods`loc` and `floc` to select subsets of columns from the `data` and `data_filtered` DataFrames, respectively. These methods allow easy access to the groups of columns identified in `column_groups` using the `column_group` keys, column names, `regression_cols` keys, or a combination of the three. The `regression_cols` attribute is introduced below.  The `column_groups` dictionary also enables much of the functionality of `CapData` methods to perform common capacity testing tasks, like generating scatter plots, filtering data, and performing regressions.\n",
    "\n",
    "Using the `loc` method with the 'irr_poa_ref_cell' attribute key of `column_groups` to select data from the POA reference cell columns in the `data` DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.loc[das.column_groups.irr_poa_ref_cell].iloc[100:103, :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Accessing the two irradiance columns of the 'irr_poa_pyran' group, the single column of the '_mtr_' group, and the 'met1_amb_temp' column of the `data_filtered` DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.floc[['irr_poa_pyran', '_mtr_', 'met1_amb_temp']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For datasets that have multiple measurements of the same value, like the two POA irradiance measurements in this sample data, these values must be aggregated prior to filtering or regressing the data.  The `agg_sensors` method provides a convient way to do this for all the groups of measurements in `column_groups` in one step.\n",
    "\n",
    "The desired aggregations are specified by passing a dictionary to the `agg_map` argument where the keys are groups from `column_groups` and the values are aggregation functions.  Here we are using string functions that are recognized by pandas.  Most of the common aggregation functions (mean, median, max, sum, min, etc.) are available as string functions.  If you would like to apply a different aggregation function, please refer to the pandas documentation for `DataFrame.agg`. By default, the `agg_sensors` method adds a new column to the dataframe in the `data` attribute for the results of each aggregation and copies over the `data_filtered` attribute with the new dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.agg_sensors(agg_map={'_inv_':'sum', 'irr_poa_pyran':'mean', 'temp_amb_':'mean', 'wind__':'mean'}, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unless using a pre-defined test setup, pvcaptest does not attempt to determine which columns of data or groups of columns are the data to be used in the regressions. The link between regression variables and the imported data is made by a dictionary stored in the `regression_cols` attribute. This dictionary is also used to define any aggregations necessary to join multiple columns into a single column. Prior to `v0.15.0` the aggregation step was performed with the `agg_sensors` method. For `v0.15.0` on the aggregation step is performed by the `process_regression_columns` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.regression_cols = {\n",
    "    'power': '_mtr_',\n",
    "    'poa': ('irr_poa_pyran', 'mean'),\n",
    "    't_amb': ('temp_amb_', 'mean'),\n",
    "    'w_vel': ('wind__', 'mean')\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.process_regression_columns()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `process_regression_columns` method updates the regression columns dictionary to point to the aggregated columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.regression_cols"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the regression columns are set, the `loc` or `floc` methods will return the data for each type of sensor identified in the `column_groups` attribute. Because we've run the `process_regression_columns` accessing the `poa` data with the `loc` now returns the aggregated result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.loc['poa'].iloc[100:103, :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The plot method creates a dashboard with a group of time series plots that are useful for visually inspecting the imported data.\n",
    "\n",
    "`plot` uses the structure of the `column_groups` attribute to create a layout of plots. A single plot is generated for each measurement type and each column with measurements of that type are plotted as a separate line on the plot. In this example there are two different weather stations, which each have pyranometers measuring plane of array and global horizontal irradiance. This arrangement of sensors results in two plots which each have two lines.\n",
    "\n",
    "Note, the full functionality of the dashboard requires a live notebook. Try installing to run or using the launch binder button at the top of the page. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "combine = {'inv_sum_mtr_pwr': ['mtr', 'inv.*agg'], 'irr_all':['irr_poa', 'irr_ghi'], 'temp_all':['temp_amb', 'temp_mod']}\n",
    "default_groups = ['inv_sum_mtr_pwr', 'irr_all', 'temp_all']\n",
    "das.plot(combine=combine, default_groups=default_groups, width=900)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering Measured Data\n",
    "The `CapData` class provides a number of convience methods to apply filtering steps as defined in ASTM E2848.  The following section demonstrates the use of the more commonly used filtering steps to remove measured data points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment and run to copy over the filtered dataset with the unfiltered data.\n",
    "das.reset_filter()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A common first step is to review the scatter plot of the POA irradiance against the power production.  \n",
    "\n",
    "If you have the optional dependency Holoviews installed, `scatter_hv` will return an interactive scatter plot.  Additionally, `scatter_hv` includes an option to return a timeseries plot of power that is linked to the scatter plot, so points selected in the scatter plot will be highlighted in the time series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment the below line to use scatter_hv with linked time series\n",
    "das.scatter_hv(timeseries=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `filter_custom` method provides a way to use your own filtering method within captest and update the summary data.  The `filter_custom` method allows passing any function or method that takes a DataFrame as the first argument and returns a filtered dataframe with rows removed.  Passed methods can be user-defined functions or Pandas DataFrame methods.\n",
    "\n",
    "Below, we use the `filter_custom` method with the pandas DataFrame `dropna` method to removing missing data and update the summary data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.filter_custom(pd.DataFrame.dropna)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `get_summary` method will return a dataframe summarizing the filtering steps that have been applied, the agruments passed to them, the number of points prior to filtering, and the number of points after filtering."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.get_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `filter_irr` method provides a convient way to remove remove data based on the irradiance measurments.  Here we use it to remove periods of low irradiance. Values greater than 2000 W/m<sup>2</sup> will also be removed, if present."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.filter_irr(200, 2000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.get_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can re-run the `scatter` method to see the results of the filtering steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.scatter_hv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `filter_outliers` method uses scikit-learn's elliptic envelope to remove outlier points.  A future release will include a way to interactively select points to be removed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.filter_outliers()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.scatter_hv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `fit_regression` method performs a regression on the data stored in `data_filtered` using the regression equation specified by the standard.  The regression equation is stored in the `regression_formula` attribute as shown below.  Regressions are performed using the statsmodels package.\n",
    "\n",
    "Below, we set the filter argument of the `fit_regression` method to `True` to remove time periods when the residual exceeds two standard deviations of the mean residual."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.regression_formula"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.fit_regression(filter=True, summary=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.get_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "____\n",
    "#### Calculation of Reporting Conditions\n",
    "\n",
    "The `rep_cond` method provide a variety of ways to calculate reporting conditions.  Using `rep_cond` the reporting conditions are always calculated from the data store in the `data_filtered` attribute.  Refer to the example notebook \"Reporting Conditions Examples\" for a thourough explanation of the `rep_cond` functionality.  By default the reporting conditions are calcualted following the guidance of ASTM E2939-13."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.rep_cond()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Previously we used the irradiance filter to filter out data below 200 W/m<sup>2</sup>.  The irradiance filter can also be used to filter irradiance based on a percentage band around a reference value.  This approach is shown here to remove data where the irradiance is outside of +/- 50% of the reporting irradiance. Passing the string `rep_irr` to the key word argument (kwarg) `ref_val` uses the reporting POA irradiance stored int the `rc` attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.filter_irr(0.5, 1.5, ref_val='rep_irr')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.scatter_hv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `fit_regression` method is used again with the default arguments, which result in fitting the regression, printing and storing the results, but not filtering. The result of the regression is a statsmodels `RegressionResultsWrapper` object containing the regression coefficients and other information generated when performing the regression.  This object is stored in the CapData `regression_results` attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.fit_regression()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The regression coefficients and p-values for each term are attributes available in the `regression_results`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.regression_results.params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "das.regression_results.pvalues"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load and Filter PVsyst Data\n",
    "\n",
    "To load and filter the modeled data, often from PVsyst, we use the `load_pvsyst` method, which returns a CapData object with the pvsyst data loaded"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim = ct.load_pvsyst('./data/pvsyst_example_HourlyRes_2.CSV')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.column_groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.set_regression_cols(power='real_pwr__', poa='irr_poa_', t_amb='temp_amb_', w_vel='wind__')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sim.plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write over cptest.flt_sim dataframe with a copy of the original unfiltered dataframe\n",
    "sim.reset_filter()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a first step we use the `filter_time` method to select a 60 day period of data centered around the measured data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.filter_time(test_date='10/11/1990', days=60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.scatter_hv()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.filter_irr(200, 930)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.scatter_hv()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.get_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `filter_pvsyt` method removes data for times when shade is present or the `IL Pmin`, `IL Vmin`, `IL Pmax`, `IL Vmax` output values are greater than 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.filter_pvsyst()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim.fit_regression()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results\n",
    "\n",
    "**v0.15.0 API Changes** - The `capdata.get_summary` function has been removed and replaced by the `captest.CapTest.get_summary` method. \n",
    "\n",
    "The `get_summary` and `captest_results_check_pvalues` functions display the results of filtering on simulated and measured data and the final capacity test results comparing measured capacity to expected capacity, respectively.\n",
    "\n",
    "For this example, which does not use a CapTest instance, you can instantiate one to call these results methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pvc.get_summary(das, sim) # As of v0.15.0 use equivalent method on CapTest class or pd.concat\n",
    "\n",
    "# concat option\n",
    "# pd.concat([\n",
    "#     das.get_summary(),\n",
    "#     sim.get_summary(),\n",
    "# ])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts = ct.CapTest(meas=das, sim=sim, ac_nameplate=6_000, test_tolerance='+/- 7')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts.get_summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts.captest_results_check_pvalues(print_res=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overlaying scatter plots from the measured and PVsyst data. This plot can be generated using `CapTest.overlay_scatters` when running a test using an instance of `CapTest`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "(\n",
    "    das.scatter_hv().relabel('Measured') *\n",
    "    sim.scatter_hv().relabel('PVsyst')\n",
    ").opts(\n",
    "    opts.Scatter(alpha=0.3, width=600)\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}