In statistics, sample data is often used to help find estimates of population parameters. Common parameters that experimenters try to estimate include population means, standard deviations, and proportions. Estimates called confidence intervals are used to estimate these parameters.
What Is a Confidence Interval?
The sample statistics (or point estimates) – such as the mean, standard deviation, proportion, etc. – are used to make inference about a population based on a random sample from that population. The point estimate likely does not equal the population parameter it estimates, but should be close. The confidence interval is a range around the point estimate that has a specific probability of containing the population parameter, typically 0.95 for a 95% confidence interval. The confidence interval gives a better estimate of the population parameter of interest because it gives the idea of the range in which the population parameter is.
Confidence Intervals for Single Means and Standard Deviations in STATISTICA
In STATISTICA, you can use the Descriptive Statistics analysis available via the Basic Statistics module to find confidence intervals for a single mean or single standard deviation. To access this analysis, first open a data file, and then select the Statistics tab. In the Base group, click Basic Statistics.
In the Basic Statistics and Tables Startup Panel, select Descriptive Statistics and click OK to display the Descriptive Statistics dialog box. The options for the confidence intervals for the mean and standard deviation are on the Advanced tab. You can specify the confidence level for each via the respective Interval edit box.
You would then click the Summary button to get the requested statistics, which would include these confidence intervals.
Using STATISTICA to Find a Confidence Interval for a Single Proportion
The Descriptive Statistics analysis is useful for finding statistics regarding continuous data. Proportions are not continuous, but counts. Tools such as Frequency Tables and Tables and Banners can find proportions. You can find a confidence interval for a single proportion using the Power Analysis module. This module is often used to calculate statistical power for a given analysis or to calculate the sample size required to attain a certain power level for a given analysis, but it can also be used to calculate, for a given analysis type, specialized confidence intervals not generally available in the general-purpose statistical packages.
Confidence Interval for a Single Proportion Example
In this example, researchers took a sample of 500 randomly selected subjects who completed four years of college. They found that 75 of them smoked on a regular basis. Thus, the sample proportion (often designated as p̂) of people who smoked and had a four-year college education is 75/500=0.15 (or 15%). If we wanted an estimate of the true proportion (usually designated as p) of people who smoke that have a four-year education, we could construct a confidence interval for the proportion.
The simplest and most commonly used formula for this type of confidence interval relies on approximating the binomial distribution with a normal distribution (the proportion is binomial because the person sampled either smoked or did not smoke). The formula is:
where z₁-α⁄2 is the 1-α⁄2 percentile of the standard normal distribution; α is the Type I error rate and is the complement of the confidence level. Thus, for a 95% confidence level, the error α is 5% or 0.05.
This z-score can be calculated within STATISTICA. On the Statistics tab in the Base group, click Basic Statistics to display the Basic Statistics and Tables Startup Panel. Select Probability calculator.
Click OK to display the Probability Distribution Calculator.
In the Distribution field, select Z (Normal). Select the Inverse, Two-tailed, and (1-cumulative p) check boxes. We are using α = 0.05, so enter this value for p. Click the Compute button to calculate the z critical value (which is given in the X edit field). It is found to be 1.959964, which is commonly rounded to 1.96.
Thus, the confidence interval for the true proportion is 0.15-1.96*sqrt[(0.15)(0.85)/500] < p < 0.15+1.96*sqrt[(0.15)(0.85)/500]→0.11870131 < p < 0.18129869.
Finding the Confidence Interval in STATISTICA
As previously mentioned, we can find this same confidence interval for a single proportion using the Power Analysis module in STATISTICA.
With any data file opened, select the Statistics tab. In the Advanced/Multivariate group, click Power Analysis. In the Power Analysis and Interval Estimation Startup Panel, select Interval Estimation as the analysis category, and then select One Proportion, Z, Chi-Square Test as the analysis type.
In the Single Proportion: Interval Estimation dialog box, enter 0.15 for Observed Proportion p, 500 for Sample Size (N), and 0.95 for Conf. Level.
Click Compute to calculate the confidence interval.
The Pi (Crude) results should match what was calculated earlier by hand as these are the estimates using the normal approximation to the binomial distribution (note that the hand calculations could be off a little due to rounding the z critical value to 1.96; STATISTICA will carry this out to more decimals for better accuracy).
The results in the Interval Estimation spreadsheet also include two other ways to calculate the confidence interval for a proportion – Pi (Exact) (the confidence intervals are the "exact, Clopper-Pearson" confidence intervals) and Pi (Approximate) (the confidence intervals employ a score method with a continuity correction). For more information on how these two methods are computed, see methods 4 and 5 from Robert Newcombe’s paper, Two-Sided Confidence Intervals for the Single
Proportion: Comparison of Seven Methods (1998, Statistics in Medicine, 17, 857-872).
Sometimes a researcher wants to estimate the true proportion of a population of interest by finding the confidence interval for that proportion. In STATISTICA, the Power Analysis module provides the means to find this estimate.