Statistical power is defined as the probability of detecting an effect, given that there is a true effect present. Effect in this sense can be interpreted in a number of different ways and essentially represents some property of the data that one may be interested in. From a practical perspective, this is generally a trend over time, the impact of a particular driver variable, the difference between treatment groups in the population or differences in responses over spatial domains. From a modelling perspective it represents a particular coefficient (parameter) fitted to a corresponding covariate. In a standard linear regression with a single covariate, the effect is therefore given by the coefficient that represents the slope of that regression line. Understanding the power to detect such effects is important in order to understand how robust any inference is. For example, if one was to fit a particular model to a data set and not find any significant effect of a covariate of interest, is that because there genuinely is no effect or because the power to detect any effect was low, perhaps due to low sample size? In the absence of any estimate of power it is impossible to untangle these possibilities. It is therefore desirable that any monitoring scheme has sufficiently high power, typically assumed to be of the order of 70-80%, to detect any effects. This ensures that any inference is robust and that we can be more confident of detecting any effects that are truly there and hence be more confident when we don’t find an effect that it may be genuine.
This tool is designed to be used to simulate data under different survey designs and expected change scenarios and then use this to calculate the expected power of the specified survey design in detecting the specified change. For a full description of how this tool works see the Explanation tab, for a worked example see the Example tab or to get started on using the app go to either the Site-based Data tab or the Data from Individuals tab.
This power analysis tool is based on a simulation approach, whereby pseudo data sets are generated according to a hypothetical design, a realistic distribution and some effect (which may by a trend over time, a difference between groups or a driver effect). This pseudo data is then analysed as if the data had been obtained in reality and a formal statistical test is conducted to assess whether the effect (trend or treatment) is significant. This process is repeated, storing the number of times that the effect was detected.
This is therefore based on the premise of simulating data that represents the particular metrics under consideration and the different scenarios relating to the monitoring intensity and extent as well as a hypothetical change to detect. Data are simulated using existing data sources to understand the distribution and variability over space and time of particular metrics. Critically, existing data are used to establish the key properties and parameters of the distribution of the data to enable realistic simulations to be derived. Important parameters to be established include the shape of the distribution, any upper or lower limits, variation over space, and variation over time. From these properties, pseudo data sets are generated according to the spatial and temporal monitoring scenarios in a parametric bootstrap approach.
The following steps indicate the processes involved in the proposed power analysis:
1. Use existing data, reviewed and quality assured to check for outliers and anomalies, to establish parameters for data distribution
2. Define monitoring intensity and extent (e.g. number of individuals per year, how many years)
3. Define hypothetical effect to detect (e.g. a change over time of a specific percentage rate)
4. Simulate data to produce a pseudo data set over the required length of time with the required sample size and effect size
5. Analyse data using modelling approach to estimate effect size and significance. Any dependence structure present within the data, such as repeats over time or spatial correlation will be accounted for within this step as will the data type – e.g. counts, proportions or concentrations.
6. Store the results from the fitted model and repeat steps 4 and 5.
7. Statistical power is then obtained by the proportion of the results in step 6 that infer a significant effect
To simulate realistic data, or data that conforms to a particular hypothetical scenario, there are a number of features of the data that we need to define.
We start with considering the structure of the survey design. The two major components of the design are how many observations are there and over what time period is being considered. For individual-based observations, this is simply just the number of individuals per year observed and the total number of years of the (hypothtical) survey.
For site-based data, there is another aspect. There is the number of sites per year and the total time period (as with the indivudual case), but there is also the possibility of taking multiple measurements (often called replciates) from the same site. This is shown below.
The number of replicate observations taken at a particular site can be defined below and will define the default value in the power analyses tab for site-based data.
For site-based data, where the same unit (e.g. the site itself) can be repeatedly sampled, we also need to define how often samples are taken. In other words, how often we expect sites to be revisited. In simple cases, this may just be every year, but there may be other designs whereby each site is only repeated every few years but with some sampling of sites undertaken in every year of survey. This is shown in the figure below and is often referred to as a rotating panel design, where the panels in this sense represent a particular subset of sites.
In this example sites are revisited every 5 years - shown by the same subset of point coloured 5 years apart - but an equal number of sites is still monitored each year. The total number of unique sites is therefore given by the number of sites per year multiplied by the length of time between revisits. In this case, 20 site in each year multiplied by 5 year repeat frequency equals 100 unique sites.
How often sites are revisited in this manner can be defined below (in years) and will define the default value in the power analyses tab for site based data. A value of 1 corresponds to the same sites being surveyed every year, a value of 5 has a 5 year delay between repeat samples as shown in the figure above.
Whilst the concept if repeat visits does not make sense when considering data from individuals (the same individuals cannot be sampled more than once), we do consider the possibility that observations may not be taken every year.
We have therefore included a scenario that allows the user to specify the intensity of monitoring of individuals, be that every year or every few years. The figure below shows the examples of taking measurements from individuals every 2 years.
This enables a comparison between scenarios of twice the number samples every two years versus a half the number of samples every year, for example.
How often individuals are sampled in this manner can be defined below (in years) and will define the default value in the power analyses tab for data from individuals. A value of 1 corresponds to sampliong individuals every year, whereas a value of 2 will presume samples are only taken every other year as shown in the figure above.
The particular effect that we are interested in these series of power analyses is a trend over time.
The power analysis therefore, is the power to detect a change in the response over time. The change is assumed to be a linear trend and when generating the pseudo data sets, we specify the hypothetical year on year change in the response metric that we are interested to test whether it can be significantly detected or not.
There are a number of key aspects to the site-based data that is simulated within the application. The structure follows the assumption that there are a number of sites monitored over time, within which a number of replicate samples are taken. The sites are then repeatedly surveyed throughout the survey according to a specified frequency. The distribution of the data was assumed to be normally distributed on the log scale. The variance of this distribution represents the residual error in the data. The mean of this distribution was defined in a hierarchical manner with each replicate sample having its own mean value, which depended on the site level mean. This is shown diagrammatically in the figure below showing the structure of the data with site (shown in blue) and replicate samples (shown in grey) following some underlying distribution (i.e. each sample at each site has its own average value). On top of this a trend effect is imposed (red dots) and finally some residual variability is added (black squares). There are four different parameters to simulate the data which starts with an average site value (shown in orange) and some between site variation around this (blue). Repeat observations at each site are then distributed around the corresponding site mean with some additional between replicate variance (grey). The time trend (effect) is then imposed on the data and some residual error around this is included (black). This structure was established based on a key exemplar data set.
Within the site-based data we consider responses that can be either continuous or binary. A continuous variable is shown in the graph above, while an example of a binary variable is whether the concentration of some pollutant is above or below some detection limit. Whether the variable is continuous or binary has implications for how we interpret the effect size/year on year change. For continuous variables the effect size is simply the increase in the mean per year. However, the effect size for binary variables is more complicated as it acts upon the log-odds ratio rather than the percentage of sites that are in each category which means a given effect size will lead to differing impacts dependent on the initial conditions. Use the below controls to see what change a given effect size causes to different percentages.
Data from individuals
The type of data considered is assumed to come from individual animals opportunistically sampled over time. There is therefore no structure within the design such as repeated observations or so-called nested values, such as replicates within a site. It is therefore reasonable to assume, in this case, that observations are independent. There are just three components to the data that is simulated that represent the mean (red line in the figure below) and variance (the spread of the histogram below) of the concentrations in each individual and an additional residual variation terms which adds some random noise around the imposed trend (effect of interest) over time. Concentrations are assumed to follow a log-normal distribution.
Here in this page we go through two examples to demonstrate some different usages of the app. The first using the site-based data where we try and find what change over time can currently be detected by an existing monitoring scheme. The second uses data from individuals where we try and design a scheme based on the effect size we want to detect.
Example of site-based data
Here we use an example from a hypothetical existing monitoring scheme where we are interested in what kinds of effects we can detect given the existing sampling intensity.
For this analysis we are looking at 5 years of survey, and say there are 100 sites visited per year, with only 1 replicate per site and where each site is visited every year. We want to look at a range of potential change values so we click the 'Multiple Change Scenarios' button and drag the range to go from 0.05 to 0.25. We also want to run a reasonable number of simulations so we can be more confident in the estimated power so we set the number of simulations to 200. See the image on the right to see what the selection panel looks like after all this.
We specify the input parameters according to data observed from the scheme to date, and simulatie binary data representing observation above or below a specified threshold:
And now we are ready to click 'Update Analysis'!
This now takes a little while to calculate the results.
The top right figure on the page (and the figure to the left below) represents a visualisation of a single dataset that follows the rules we set out in the left-hand panel. Note that the effect size used within this figure (and also within the table) is the change over time in the box, not from the range of the Multiple Change Scenario slider. Here we can see the trend in the mean (plus standard error) over time in black, and individual sites are represented as coloured dots and lines. To make it clearer what is going on we can double click on one of the sites in the legend and the rest of the sites disappear - shown in the figure to the right for site 3. We can see that the site goes between having measurements above and below the detection limit over time.
What we are particularly interested in is what kind of change over time we can reliably detect with this data, and that is shown in the figure below the table:
We can see that this survey design can detect change over time that is above around 0.2 per year around 80% of the time. Unlike in continuous data, as this is binary data this 0.2 per year does not translate simply to percentage change, but we can use the widget below to show the change we would expect based on the initial starting percentage of 16% (given in parameter values). As this power analysis works for both positive and negative change we can say that 80% of the time we would detect either a drop from 16% to 7% of sites being under the limit or an increase from 16% to 34% of sites being over the limit.
Example using data from individuals
We can use this app to not only see the effect size that can be detected by a given survey design but also to see how we should design a survey to detect an effect we are interested in. Here we will use the tab titled 'Data from Individuals' to investigate how we might try and detect a change in the response of 0.05 per year - this is in log concentration of the pollutant in our case but could be any continuous variable.
We know that we have 8 years in which to detect this change and that we have variable levels of funding available to test individuals for the pollutant so we can try multiple individual scenarios. We set up the controls such that we run multiple individual scenarios from 10 to 70 and examine the plot output under the table:
From this we can see that we have to test over 50 individuals per year over the 8 year period in order to detect a change of 0.05 80% of the time.
Unfortunately, testing 50 individuals a year for 8 years (a total of 400 individuals) will exceed our budget. However, we know that we have individuals in the freezer that we could test now. We have 5 individuals collected at 2 year intervals for the past 7 years (a total of 20 individuals). So we can set up the app to account for this historic data, by selecting 'include historic data' and including a change in design at year 0. We can switch from sampling every two years to every year and up the number of samples taken. We can't use the multiple individual scenarios option to do this but we can do it manually by trying out a few different options ranging from 10 to 30 samples per year within the new survey. An example of how we set this up is shown to the right.
The results of our testing are shown in the table below, we have used the interactive sorting option to order the rows from highest to lowest power.
From these results we can see that adding the historic data allows us to detect change of 0.05 units 80% of the time just by sampling 25 individuals per year from now on - a total of 220 individuals to test, and a saving of around 45%.