### Individual Blogs

- statistics(8)
- study(1)
- medicine(2)
- vitamin(1)
- statistical significance(1)
- malaria(1)
- medicine(1)
- cryptography(1)
- security(1)
- analytics(6)
- software(1)
- business(1)
- big data(2)
- Poland(2)
- power(1)
- statistics(3)
- health care(2)
- treatment(1)
- randomized trial(1)
- India(2)
- Analytics(1)
- business(3)
- big data(7)
- scatterplot(3)
- graphs(1)
- weather(1)
- regression(1)
- glm(1)
- matrix(1)
- ill-conditioned(1)
- huge data(1)
- massively parallel(1)
- in-memory(1)
- data processing(1)
- multi-core(1)
- Windows(1)
- software(2)
- government(1)
- data mining(12)
- poll(1)
- pizza(1)
- claims(1)
- business intelligence(2)
- data storage(2)
- regression model(1)
- credit scoring(3)
- predictive analytics(11)
- edinburgh(1)
- electricity(1)
- power(1)
- cloud(1)
- data center(1)
- charts(1)
- graphs(1)
- language(1)
- dialect(1)
- heat map(1)
- axis(1)
- axis labels(1)
- multivariate analysis(1)
- mva(1)
- process(0)
- logistic regression(2)
- ciac(1)
- norway(2)
- trends(1)
- KDnuggets(2)
- Thomas Hill(1)
- cognitive psychology(2)
- rexer(1)
- drug interaction(1)
- web search(1)
- Germany(1)
- STATISTICA 12(3)
- workspace(1)
- amcor(1)
- adfors(1)
- predictive models(2)
- Apps(1)
- Social Media(2)
- Blogs(1)
- Password(1)
- prosensus(1)
- database query(1)
- SQL query(1)
- inner join(1)
- STATISTICA(2)
- toshiba(1)
- manufacturing(1)
- multiple scales(1)
- multiple variables(1)
- telenor(1)
- mobile(1)
- credit risk(1)
- decisioning(1)
- finance(1)
- risk management(1)
- svb(2)
- statistica visual basic(2)
- predictive modeling(3)
- categorization(1)
- text mining(1)
- jam study(1)
- too many choices(1)
- boosted trees(1)
- data modeling(1)
- bar graph(1)
- histogram(1)
- Dell(2)
- Software Management (1)
- Software Management Group(1)
- advanced analytics(1)
- acquisition(1)
- Dell Software(3)
- StatSoft(2)
- Tulsa(1)
- continuous variables(1)
- novum(1)
- netherlands(1)
- banking(1)
- ANOVA(1)
- M(1)
- MANOVA(1)
- General Linear Models(1)
- Rats(1)
- propensity modeling(1)
- martial arts(1)
- customer segmentation(1)
- percentiles(1)
- weight of evidence(1)
- automatic binning(1)
- WoE(1)
- C&RT(1)
- healthcare(1)
- expertise(1)
- diy(1)
- analytics sol(1)
- analytics software(1)
- bus(1)

#
*STATISTICA* News and Blogs

## How to Find Confidence Intervals for a Single Proportion

In statistics, sample data is often used to help find estimates of population parameters. Common parameters that experimenters try to estimate include population means, standard deviations, and proportions. Estimates called *confidence intervals *are used to estimate these parameters.

**What Is a Confidence Interval?**

The sample statistics (or point estimates) – such as the mean, standard deviation, proportion, etc. – are used to make inference about a population based on a random sample from that population. The point estimate likely does not equal the population parameter it estimates, but should be close. The confidence interval is a range around the point estimate that has a specific probability of containing the population parameter, typically 0.95 for a 95% confidence interval. The confidence interval gives a better estimate of the population parameter of interest because it gives the idea of the range in which the population parameter is.

**Confidence Intervals for Single Means and Standard Deviations in STATISTICA**

In *STATISTICA*, you can use the *Descriptive Statistics *analysis available via the *Basic Statistics* module to find confidence intervals for a single mean or single standard deviation. To access this analysis, first open a data file, and then select the * Statistics* tab. In the

*group, click*

**Base***.*

**Basic Statistics**In the* Basic Statistics and Tables *Startup Panel, select

**and click**

*Descriptive Statistics**to display the*

**OK***dialog box. The options for the confidence intervals for the mean and standard deviation are on the*

**Descriptive Statistics***tab. You can specify the confidence level for each via the respective*

**Advanced***Interval*edit box.

You would then click the * Summary* button to get the requested statistics, which would include these confidence intervals.

**Using STATISTICA to Find a Confidence Interval for a Single Proportion**

The * Descriptive Statistics *analysis is useful for finding statistics regarding continuous data. Proportions are not continuous, but counts. Tools such as

*and*

**Frequency Tables***can find proportions. You can find a confidence interval for a single proportion using the*

**Tables and Banners***Power Analysis*module. This module is often used to calculate statistical power for a given analysis or to calculate the sample size required to attain a certain power level for a given analysis, but it can also be used to calculate, for a given analysis type, specialized confidence intervals not generally available in the general-purpose statistical packages.

**Confidence Interval for a Single Proportion Example**

In this example, researchers took a sample of 500 randomly selected subjects who completed four years of college. They found that 75 of them smoked on a regular basis. Thus, the sample proportion (often designated as *p̂*) of people who smoked and had a four-year college education is 75/500=0.15 (or 15%). If we wanted an estimate of the true proportion (usually designated as *p*) of people who smoke that have a four-year education, we could construct a confidence interval for the proportion.

The simplest and most commonly used formula for this type of confidence interval relies on approximating the binomial distribution with a normal distribution (the proportion is binomial because the person sampled either smoked or did not smoke). The formula is:

where *z₁-α⁄2* is the *1-α⁄2* percentile of the standard normal distribution; *α* is the Type I error rate and is the complement of the confidence level. Thus, for a 95% confidence level, the error *α* is 5% or 0.05.

This *z*-score can be calculated within *STATISTICA*. On the* Statistics *tab in the

*group, click*

**Base***to display the*

**Basic Statistics***Startup Panel. Select*

**Basic Statistics and Tables***.*

**Probability calculator**

Click O* K *to display the

*.*

**Probability Distribution Calculator**In the

*field, select*

**Distribution***. Select the*

**Z (Normal)***, and (*

**Inverse, Two-tailed***) check boxes. We are using*

**1-cumulative p***α*= 0.05, so enter this value for

**. Click the**

*p**button to calculate the*

**Compute****critical value (which is given in the**

*z**edit field). It is found to be 1.959964, which is commonly rounded to 1.96.*

**X**

Thus, the confidence interval for the true proportion is 0.15-1.96*sqrt[(0.15)(0.85)/500] < *p* < 0.15+1.96*sqrt[(0.15)(0.85)/500]→0.11870131 <* p* < 0.18129869.

**Finding the Confidence Interval in ****STATISTICA**

As previously mentioned, we can find this same confidence interval for a single proportion using the *Power Analysis *module in *STATISTICA*.

With any data file opened, select the * Statistics* tab. In the

*group, click*

**Advanced/Multivariate***. In the*

**Power Analysis***Startup Panel, select*

**Power Analysis and Interval Estimation***as the analysis category, and then select*

**Interval Estimation***as the analysis type.*

**One Proportion, Z, Chi-Square Test**

Click * OK*.

In the* Single Proportion: Interval Estimation *dialog box, enter 0.15 for

*, 500 for*

**Observed Proportion p***, and 0.95 for*

**Sample Size (N)***.*

**Conf. Level**

Click * Compute* to calculate the confidence interval.

The* Pi (Crude)* results should match what was calculated earlier by hand as these are the estimates using the normal approximation to the binomial distribution (note that the hand calculations could be off a little due to rounding the *z* critical value to 1.96; *STATISTICA* will carry this out to more decimals for better accuracy).

The results in the* Interval Estimation *spreadsheet also include two other ways to calculate the confidence interval for a proportion – *Pi (Exact)* (the confidence intervals are the "exact, Clopper-Pearson" confidence intervals) and* Pi (Approximate)* (the confidence intervals employ a score method with a continuity correction). For more information on how these two methods are computed, see methods 4 and 5 from Robert Newcombe’s paper, Two-Sided Confidence Intervals for the Single

Proportion: Comparison of Seven Methods (1998, *Statistics in Medicine, 17*, 857-872).

**Conclusion**

Sometimes a researcher wants to estimate the true proportion of a population of interest by finding the confidence interval for that proportion. In *STATISTICA*, the *Power Analysis *module provides the means to find this estimate.

When you absolutely, positively must know how to use *STATISTICA* right the first time.