Fri, 17 May 2013 20:28:00 GMT
Mon, 13 May 2013 08:48:00 GMT
Tue, 07 May 2013 10:58:00 GMT
Download free 30 day STATISTICA 10 Trial
STATISTICA Advanced Linear/Nonlinear Models offers a wide array of the most advanced linear and nonlinear modeling tools on the market; supports continuous and categorical predictors, interactions, and hierarchical models; includes automatic model selection facilities as well as variance components, time series, and many other methods; and all analyses incorporate extensive, interactive graphical support and built-in complete Visual Basic scripting.
It features the following modules:
Distributions and Simulation enables users to automatically fit a large number of distributions for continuous and categorical variables to lists of variables. Standard distributions are available (normal, halfnormal, log-normal, Weibull, etc.), but also included are specialized and general distributions (Johnson, Gaussian Mixture, Generalized Pareto, Generalized Extreme Value), and STATISTICA automatically ranks the quality of the fit for each selected distribution and variable.
In addition, the distributions fit to the list of selected variables and the covariance between the selected variables can be saved for deployment. The Distributions & Simulation module uses this deployment information to generate simulated data sets that not only faithfully reproduce the respective distributions, but also the covariances between variables. In short, in addition to facilitating efficient distribution fitting to large numbers of variables, this module enables users to fit general multivariate distributions, and simulate from those distributions, using cutting edge simulation techniques (e.g., Latin-Hypercube simulation).When data are not available for which to fit distributions, the Design Simulation tool allows you to generate data from a correlation matrix and selection of distributions.These methods have proven useful in various domains such as modern DOE, reliability engineering, and risk modeling.
Back to Top
Variance Components and Mixed Model ANOVA/ANCOVA. is a specialized module for designs with random effects and/or factors with many levels; options for handling random effects and for estimating variance components are also provided in the General Linear Models module. Random effects (factors) occur frequently in industrial research, when the levels of a factor represent values sampled from a random variable (as opposed to being deliberately chosen or arranged by the experimenter). The Variance Components module will allow you to analyze designs with any combinations of fixed effects, random effects, and covariates. Extremely large ANOVA/ANCOVA designs can be efficiently analyzed: Factors can have several hundreds of levels. The program will analyze standard factorial (crossed) designs as well as hierarchically nested designs, and compute the standard Type I, II, and III analysis of variance sums of squares and mean squares for the effects in the model. In addition, you can compute the table of expected mean squares for the effects in the design, the variance components for the random effects in the model, the coefficients for the denominator synthesis, and the complete ANOVA table with tests based on synthesized error sums of squares and degrees of freedom (using Satterthwaite's method). Other methods for estimating variance components are also supported (e.g., MIVQUE0, Maximum Likelihood [ML], Restricted Maximum Likelihood [REML]). For maximum likelihood estimation, both the Newton-Raphson and Fisher scoring algorithms are used, and the model will not be arbitrarily changed (reduced) during estimation to handle situations where most components are at or near zero. Several options for reviewing the weighted and unweighted marginal means, and their confidence intervals, are also available. Extensive graphics options can be used to visualize the results.
This module features a comprehensive implementation of a variety of techniques for analyzing censored data from social, biological, and medical research, as well as procedures used in engineering and marketing (e.g., quality control, reliability estimation, etc.). In addition to computing life tables with various descriptive statistics and Kaplan-Meier product limit estimates, the user can compare the survivorship functions in different groups using a large selection of methods (including the Gehan test, Cox F-test, Cox-Mantel test, Log-rank test, and Peto & Peto generalized Wilcoxon test). Also, Kaplan-Meier plots can be computed for groups (uncensored observations are identified in graphs with different point markers). The program also features a selection of survival function fitting procedures (including the Exponential, Linear Hazard, Gompertz, and Weibull functions) based on either unweighted and weighted least squares methods (maximum-likelihood parameter estimates for various distributions, including Weibull, can also be computed via the STATISTICA Process Analysis module). Finally, the program offers full implementations of four general explanatory models (Cox's proportional hazard model, exponential regression model, log-normal and normal regression models) with extended diagnostics, including stratified analysis and graphs of survival for user-specified values of predictors. For Cox proportional hazard regression, the user can choose to stratify the sample to permit different baseline hazards in different strata (but a constant coefficient vector), or the user can allow for different baseline hazards as well as coefficient vectors. In addition, general facilities are provided to define one or more time-dependent covariates. Time-dependent covariates can be specified via a flexible formula interpreter that allows the user to define the covariates via arithmetic expressions which may include time, as well as the standard logical functions (e.g., timdep=age+age*log(t_)*(age>45), where t_ references survival time) and a wide variety of distribution functions. As in all other modules of STATISTICA, the user can access and change the technical parameters of all procedures (or accept dynamic defaults). The module also offers an extensive selection of graphics and specialized diagrams to aid in the interpretation of results (including plots of cumulative proportions surviving/failing, patterns of censored data, hazard and cumulative hazard functions, probability density functions, group comparison plots, distribution fitting plots, various residual plots, and many others). For engineering applications, see also Weibull Analysis.
The Cox Proportional Hazards Models module is a highly scalable tool which includes:
This tool allows for flexible handling of censored data, categorical predictors, and designs that include interactions and/or nested effects. It uses model building techniques such as best subsets and stepwise regression. Deployment of the survival functions on new data is available with STATISTICA Rapid Deployment.
The Nonlinear Estimation module allows the user to fit essentially any type of nonlinear model. One of the unique features of this module is that (unlike traditional nonlinear estimation programs) it does not impose any limits on the size of data files that it can process.
The models can be fit using least squares or maximum-likelihood estimation, or any user-specified loss function. When using the least-squares criterion, the very efficient Levenberg-Marquardt and Gauss-Newton algorithms can be used to estimate the parameters for arbitrary linear and nonlinear regression problems. For large datasets or for difficult nonlinear regression problems (such as those rated "higher difficulty" among the Statistical Reference Datasets provided by the National Institute of Standards and Technology; see http://www.nist.gov/itl/div898/strd/index.html), when using the least-squares criterion, this is the recommended method for computing precise parameter estimates. When using arbitrary loss functions, the user can choose from among four very different, powerful estimation procedures (quasi-Newton, Simplex, Hooke-Jeeves pattern moves, and Rosenbrock pattern search method of rotating coordinates) so that stable parameter estimates can be obtained in practically all cases, and even in extremely numerically-demanding conditions (see the Validation Benchmarks ).
The user can specify any type of model by typing in the respective equation into an equation editor. The equations may include logical operators; thus, discontinuous (piecewise) regression models and models including indicator variables can also be estimated. The equations may also include a wide selection of distribution functions and cumulative distribution functions (Beta, Binomial, Cauchy, Chi-square, Exponential, Extreme value, F, Gamma, Geometric, Laplace, Logistic, Normal, Log-Normal, Pareto, Poisson, Rayleigh, t (Student), or Weibull distribution). The user has full control over all aspects of the estimation procedure (e.g., starting values, step sizes, convergence criteria, etc.). The most common nonlinear regression models are predefined in the Nonlinear Estimation module, and can be chosen simply as menu options. Those regression models include stepwise Probit and Logit regression, the exponential regression model, and linear piecewise (break point) regression. Note that STATISTICA also includes implementations of powerful algorithms for fitting generalized linear models, including probit and multinomial logit models, and generalized additive models; see the respective descriptions for additional details.
In addition to various descriptive statistics, standard results of the nonlinear estimation include the parameter estimates and their standard errors (computed independently of the estimation itself, via finite differencing to optimize precision; see the Validation Benchmarks ); the variance/covariance matrix of parameter estimates, the predicted values, residuals, and appropriate measures of goodness-of-fit (e.g., log-likelihood of estimated/null models and Chi-square test of difference, proportion of variance accounted for, classification of cases and odds-ratios for Logit and Probit models, etc.). Predicted and residual values can be appended to the data file for further analyses. For Probit and Logit models, the incremental fit is also automatically computed when adding or deleting parameters from the regression model (thus, the user can explore the data via a stepwise nonlinear estimation procedure; options for automatic forward and backward stepwise regression as well as best-subset selection of predictors in logit and probit models is provided in the Generalized Linear Models module, below).
All output is integrated with extensive selections of graphs, including interactively-adjustable 2D and 3D (surface) arbitrary function fitting graphs which allow the user to visualize the quality of the fit and identify outliers or ranges of discrepancy between the model and the data; the user can interactively adjust the equation of the fitted function (as shown in the graph) without re-processing the data and visualize practically all aspects of the nonlinear fitting process). Many other specialized graphs are provided to evaluate the fitting process and visualize the results, such as histograms of all selected variables and residual values, scatterplots of observed versus predicted values and predicted versus residual values, normal and half-normal probability plots of residuals, and many others.
This module offers a complete implementation of log-linear modeling procedures for multi-way frequency tables. Note that STATISTICA also includes the Generalized Linear Models module, which provides options for analyzing binomial and multinomial logit models with coded ANOVA/ANCOVA-like designs. In the Log-Linear Analysis module, the user can analyze up to 7-way tables in a single run. Both complete and incomplete tables (with structural zeros) can be analyzed. Frequency tables can be computed from raw data, or may be entered directly into the program. The Log-Linear Analysis module provides a comprehensive selection of advanced modeling procedures in an interactive and flexible environment that greatly facilitates exploratory and confirmatory analyses of complex tables. The user may at all times review the complete observed table as well as marginal tables, and fitted (expected) values, and may evaluate the fit of all partial and marginal association models or select specific models (marginal tables) to be fitted to the observed data. The program also offers an intelligent automatic model selection procedure that first determines the necessary order of interaction terms required for a model to fit the data, and then, through backwards elimination, determines the best sufficient model to satisfactorily fit the data (using criteria determined by the user). The standard output includes G-square (Maximum-Likelihood Chi-square), the standard Pearson Chi-square with the appropriate degrees of freedom and significance levels, the observed and expected tables, marginal tables, and other statistics. Graphics options available in the Log-linear module include a variety of 2D and 3D graphs designed to visualize 2-way and multi-way frequency tables (including interactive, user-controlled cascades of categorized histograms and 3D histograms revealing "slices" of multi-way tables), plots of observed and fitted frequencies, plots of various residuals (standardized, components of Maximum-Likelihood Chi-square, Freeman-Tukey deviates, etc.), and many others.
The Time Series module contains a wide range of descriptive, modeling, decomposition, and forecasting methods for both time and frequency domain models. These procedures are integrated, that is, the results of one analysis (e.g., ARIMA residuals) can be used directly in subsequent analysis (e.g., to compute the autocorrelation of the residuals). Also, numerous flexible options are provided to review and plot single or multiple series. Analyses can be performed on even very long series. Multiple series can be maintained in the active work area
of the program (e.g., multiple raw input data series or series resulting from different stages of the analysis); the series can be reviewed and compared. The program will automatically keep track of successive analyses, and maintain a log of transformations and other results (e.g., ARIMA residuals, seasonal components, etc.). Thus, the user can always return to prior transformations or compare (plot) the original series together with its transformations. Information about the consecutive transformations is maintained in the form of long variable labels, so if you save the newly created variables into a dataset, the "history" of each of the series will be permanently preserved. The specific Time Series procedures are described in the following subsections.
The available time series transformations allow the user to fully explore patterns in the input series, and to perform all common time series transformations, including: de-trending, removal of autocorrelation, moving average smoothing (unweighted and weighted, with user-defined or Daniell, Tukey, Hamming, Parzen, or Bartlett weights), moving median smoothing, simple exponential smoothing (see also the description of all exponential smoothing options below), differencing, integrating, residualizing, shifting, 4253H smoothing, tapering, Fourier (and inverse) transformations, and others. Autocorrelation, partial autocorrelation, and crosscorrelation analyses can also be performed.
The Time Series module offers a complete implementation of ARIMA. Models may include a constant, and the series can be transformed prior to the analysis; these transformations will automatically be "undone" when ARIMA forecasts are computed, so that the forecasts and their standard errors are expressed in terms of the values of the original input series. Approximate and exact maximum-likelihood conditional sums of squares can be computed, and the ARIMA implementation in the Time Series module is uniquely suited to fitting models with long seasonal periods (e.g., periods of 30 days). Standard results include the parameter estimates and their standard errors and the parameter correlations. Forecasts and their standard errors can be computed and plotted, and appended to the input series. In addition, numerous options for examining the ARIMA residuals (for model adequacy) are available, including a large selection of graphs. The implementation of ARIMA in the Time Series module also allows the user to perform interrupted time series (intervention) analysis. Several simultaneous interventions may be modeled, which can either be single-parameter abrupt-permanent interventions, or two-parameter gradual or temporary interventions (graphs of different impact patterns can be reviewed). Forecasts can be computed for all intervention models, which can be plotted (together with the input series) as well as appended to the original series.
The Time Series module contains a complete implementation of all 12 common exponential smoothing models. Models can be specified to contain an additive or multiplicative seasonal component and/or linear, exponential, or damped trend; thus, available models include the popular Holt-Winter linear trend models. The user may specify the initial value for the smoothing transformation, initial trend value, and seasonal factors (if appropriate). Separate smoothing parameters can be specified for the trend and seasonal components. The user can also perform a grid search of the parameter space in order to identify the best parameters; the respective results spreadsheet will report for all combinations of parameter values the mean error, mean absolute error, sum of squares error, mean square error, mean percentage error, and mean absolute percentage error. The smallest value for these fit indices will be highlighted in the spreadsheet. In addition, the user can also request an automatic search for the best parameters with regard to the mean square error, mean absolute error, or mean absolute percentage error (a general function minimization procedure is used for this purpose). The results of the respective exponential smoothing transformation, the residuals, as well as the requested number of forecasts, are available for further analyses and plots. A summary plot is also available to assess the adequacy of the respective exponential smoothing model; that plot will show the original series together with the smoothed values and forecasts, as well as the smoothing residuals plotted separately against the right-Y axis.
The user may specify the length of the seasonal period, and choose either the additive or multiplicative seasonal model. The program will compute the moving averages, ratios or differences, seasonal factors, the seasonally adjusted series, the smoothed trend-cycle component, and the irregular component. Those components are available for further analysis; for example, the user may compute histograms, normal probability plots, etc. for any or all of these components (e.g., to test model adequacy).
The Time Series module contains a full-featured implementation of the US Bureau of the Census X-11 variant of the Census Method II seasonal adjustment procedure. While the original X-11 algorithms were not year-2000 compatible (only data prior to January 2000 could be analyzed), the STATISTICA implementation of X11 can handle data containing dates prior to January 1, 2000, after that date, or series that will start prior to that date but terminate in or after the year 2000. The arrangement of options and dialogs closely follows the definitions and conventions described in the Bureau of the Census documentation. Additive and multiplicative seasonal models may be specified. The user may also specify prior trading-day factors and seasonal adjustment factors. Trading-day variation can be estimated via regression (controlling for extreme observations), and used to adjust the series (conditionally if requested). The standard options are provided for graduating extreme observations, for computing the seasonal factors, and for computing the trend-cycle component (the user can choose between various types of weighted moving averages; optimal lengths and types of moving averages can also automatically be chosen by the program). The final components (seasonal, trend-cycle, irregular) and the seasonally adjusted series are automatically available for further analyses and plots; those components can also be saved for further analyses with other programs. The program will produce the plots of the different components, including categorized plots by months (or quarters).
The implementation of the polynomial distributed lag methods in the Time Series module will estimate models with unconstrained lags as well as (constrained) Almon distributed lags models. A selection of graphs are available to examine the distributions of the model variables.
The Time Series module includes a full implementation of spectrum (Fourier decomposition) analysis and cross-spectrum analysis techniques. The program is particularly suited for the analysis of unusually long time series (e.g., with over 250,000 observations), and it will not impose any constraints on the length of the series (i.e., the length of input series does not have to be a multiple of 2). However, the user may also choose to pad or truncate the series prior to the analysis. Standard pre-analysis transformations include tapering, subtraction of the mean, and detrending. For single spectrum analysis, the standard results include the frequency, period, sine and cosine coefficients, periodogram values, and spectral density estimates. The density estimates can be computed using Daniell, Hamming, Bartlett, Tukey, Parzen, or user-defined weights and user-defined window widths. An option that is particularly useful for long input series is to display only a user-defined number of the largest periodogram or density values in descending order; thus, the most salient periodogram or density peaks can be easily identified in long series. The user can compute the Kolmogorov-Smirnov d test for the periodogram values to test whether they follow an exponential distribution (i.e., whether the input is a white-noise series). Numerous plots are available to summarize the results; the user can plot the sine and cosine coefficients, periodogram values, log-periodogram values, spectral density values, and log-density values against the frequencies, period, or log-period. For long input series, the user can choose the segment (period) for which to plot the respective periodogram or density values, thus enhancing the "resolution" of the periodogram or density plot. For cross-spectrum analysis, in addition to the single spectrum results for each series, the program computes the cross-periodogram (real and imaginary part), co-spectral density, quadrature spectrum, cross-amplitude, coherency values, gain values, and the phase spectrum. All of these can also be plotted against the frequency, period, or log-period, either for all periods (frequencies) or only for a user-defined segment. A user-defined number of the largest cross-periodogram values (real or imaginary) can also be displayed in a spreadsheet in descending order of magnitude to facilitate the identification of salient peaks when analyzing long input series. As with all other procedures in the Time Series module, all of these result series can be appended to the active work area, and will be available for further analyses with other time series methods or other STATISTICA modules.
Finally, STATISTICA offers regression-based time series techniques for lagged or non-lagged variables (including regression through the origin, nonlinear regression, and interactive what-if forecasting).
STATISTICA includes a comprehensive implementation of structural equation modeling techniques with flexible Monte Carlo simulation facilities (SEPATH). The module is a state-of-the art program with an "intelligent" user-interface. It offers a comprehensive selection of modeling procedures integrated with unique user-interface tools allowing you to specify even complex models without using any command syntax. Via Wizards and Path Tools, you can define the analysis in simple functional terms using menus and dialog boxes (unlike other programs for structural equation modeling, no complex "language" must be mastered).
SEPATH is a complete implementation that includes numerous advanced features: The program can analyze correlation, covariance, and moment matrices (structured means, models with intercepts); all models can be specified via the Path Wizard, Factor Analysis Wizard, and General Path tools; these facilities are highly efficient and allow users to specify even complex models in minutes by making choices from dialogs. The SEPATH module will compute, using constrained optimization techniques, the appropriate standard errors for standardized models, and for models fitted to correlation matrices. The results options include a comprehensive set of diagnostic statistics including the standard fit indices as well as noncentrality-based indices of fit, reflecting the most recent developments in the area of structural equation modeling. The user may fit models to multiple samples (groups), and can specify for each group fixed, free, or constrained (to be equal across groups) parameters. When analyzing moment matrices, these facilities allow you to test complex hypotheses for structured means in different groups. The SEPATH module documentation contains numerous detailed descriptions of examples from the literature, including examples of confirmatory factor analysis, path analysis, test theory models for congeneric tests, multi-trait-multi-method matrices, longitudinal factor analysis, compound symmetry, structured means, etc.
The STATISTICA Structural Equation Modeling (SEPATH) module includes powerful simulation options: the user can generate (and save) datasets for predefined models, based on normal or skewed distributions. Bootstrap estimates can be computed, as well as distributions for various diagnostic statistics, parameter estimates, etc. over the Monte Carlo trials. Numerous flexible graphing options are available to visualize the results (e.g., distributions of parameters) from Monte Carlo runs.
STATISTICA includes five powerful types of analyses for analyzing linear and nonlinear models: General Linear Models (GLM), General Regression Models (GRM), General Discriminant Analysis Models (GDA), Generalized Linear Model (GLZ), and General Partial Least Squares Models (PLS). Note that STATISTICA also includes implementations of Generalized Additive Models (GAM), Classification and Regression Trees (C&RT)and General CHAID (Chi-square Automatic Interaction Detection) available in STATISTICA Data Miner; these modules can also be used to fit nonlinear (ANOVA/ANCOVA-like) models to continuous or categorical dependent (criterion) variables.
All of these modules are extremely comprehensive and advanced implementations of the respective methods, and all of them share some general user interface solutions.
Three alternative user-interfaces: (1) Quick-specs dialogs, (2) Wizard, and (3) Syntax. All modules offer three alternative user-interfaces for specifying research designs (e.g., ANOVA/ANCOVA designs, regression designs, response surface designs, mixture designs, etc.; see the description of GLM for details):
Automatically generating the syntax statements. One of the unique features of this user-interface is that in the background STATISTICA will automatically generate the complete set of syntax statements for any design specified via the Quick-specs dialogs (see point 1 above) or the Wizard (see point 2). These "active" logs of even the most complex and customized designs can be re-run, saved for future use, modified, included in STATISTICA Visual Basic scripts to be routinely run on new datasets, etc. Because the syntax for specifying general linear model designs is shared by all of these modules, it is also easy to move specifications form one type of analysis to another, for example, in order to fit the same model in GLM and GLZ.
Computation (training) sample, cross-validation (verification) sample, and prediction sample. All five modules will compute detailed residual statistics that can be saved for further analyses with other modules. Another unique feature of these programs is that the predicted and residual statistics can be computed separately for those observations from which the respective results were computed (i.e., the computation or training sample), for observations explicitly excluded from the model fitting computations (the cross-validation or verification sample), and for cases without observed data for the dependent (response) variables (prediction sample). Moreover, all graphical results options (e.g., probability plots, histograms, scatterplots of selected predicted or residual statistics) can be requested for these samples. Thus, all five programs offer exceptionally thorough diagnostic methods for evaluating the quality of the fit of the model.
Comparing analyses; modifying analyses. Like all analytic facilities of STATISTICA, multiple instances of all modules can be kept open at the same time, so multiple analyses can simultaneously be performed on the same or on different datasets. This is extremely useful for comparing the results from different analyses of the same data or the same analyses of different data. Modifying an analysis does not require complete respecification of the analysis; only desired changes need to be specified. Results from different modifications of an analysis can be easily compared. STATISTICA GLM, GRM, GDA, GLZ, and PLS can take what-if analyses to a new level, by allowing comparisons of different data and different analyses at the same time.
STATISTICA General Linear Models (GLM) analyzes responses on one or more continuous dependent variables as a function of one or more categorical or continuous independent variables. GLM is not only the most computationally advanced GLM tool currently on the market, but it is also the most comprehensive and complete application available, offering a larger selection of options, graphs, accompanying statistics and extended diagnostics than any other program. Designed with a "no compromise approach", GLM offers the most extensive selection of options to handle GLM's so-called "controversial problems" that do not have any widely agreed upon solutions. GLM will compute all the standard results, including ANOVA tables with univariate and multivariate tests, descriptive statistics, etc. GLM offers a large number of results and graphics options that are usually not available in other programs. GLM also offers simple ways to test linear combinations of parameter estimate; specifications of custom error terms and effects; comprehensive post-hoc comparison methods for between group effects as well as repeated measures effects, and the interactions between repeated measures.
The following sections summarize some of the most important specific advantages of GLM over other programs, and the unique features and facilities offered in this module; however, it is important to start by stressing the fact that GLM is not only the most computationally advanced GLM tool available on the market but it is also the most comprehensive and complete application that offers a wider selection of options, more graphs, more accompanying statistics and extended diagnostics than any other program. It has been designed with a "no compromise approach" to address the most challenging problems in the area of GLM and also to offer the most comprehensive selections of user-selectable options to handle so-called "controversial problems" that do not have any widely agreed upon solutions.
Designs. The user can choose simple or highly customized one-way, main-effect, factorial, or nested ANOVA or MANOVA designs, repeated measures designs, simple, multiple and polynomial regression designs, response surface designs (with or without blocking), mixture surface designs, simple or complex analysis of covariance designs (e.g., with separate slopes), or general multivariate MANCOVA designs. Factors can be fixed or random (in which case synthesized error terms will be computed). All of these designs can be efficiently specified via any of the three types of user interfaces described above, and customized in various ways (e.g., you can drop effects, specify custom hypotheses, etc.). Also, GLM can handle extremely large analysis designs; for example, repeated measures factors with 1000 levels can be specified, models may include 1000 covariates, or you can analyze very efficiently literally huge between-group designs.
The overparameterized and sigma-restricted model. A detailed discussion is beyond the scope of this summary; most programs only offer the overparameterized model, and a few only the sigma restricted model; STATISTICA GLM is the only program available on the market that offers both. Note that each of the two models has its advantages and disadvantages; however, both approaches are necessary to offer a truly comprehensive GLM computational platform, capable of properly handling even the most advanced and demanding analytic problems. For example, nested designs and separate slope designs are best analyzed using the overparameterized model; the most common way to estimate variance components, and to compute synthesized error terms in mixed model ANOVA is based on the overparameterized model. Factorial designs with large numbers of factors are best analyzed using the sigma restricted model; in short, a simple 2-way interaction of two two-level factors requires only a single column in the design matrix using the sigma restricted parameterization, but 4 columns in the overparameterized model; as a result, analyzing, for example, an 8-way full factorial design with GLM only requires a few seconds.
Handling missing cell designs. STATISTICA GLM will compute the customary Type I through IV sums of squares for unbalanced and incomplete designs; however, as is widely acknowledged (e.g., Searle, 1987; Milliken & Johnson, 1986), applying these methods to "messy" designs with missing cells in more or less random locations in the design can lead to misleading, and even blatantly nonsensical results. STATISTICA GLM therefore also offers two additional methods for analyzing missing cell designs: Hockings (1985) "effective hypothesis decomposition," and a method that will automatically drop effects that cannot be fully estimated (e.g., when the least squares means do not exist for all levels of the respective main effect or interaction effect). The latter method is the one commonly applied to the analysis of highly fractionalized designs in industrial experimentation (see also STATISTICA DOE). This method leads to results that are unique (not dependent on the ordering of factor levels), easily interpretable, and consistent with the industrial experimentation literature. This highly useful feature is unique to GLM.
Results statistics. GLM will compute all the standard results, including ANOVA tables with univariate and multivariate tests, descriptive statistics, etc. GLM also offers a large number of results options and in particular graphics options that are usually not available in other programs. For example, GLM includes a comprehensive selection of types of plots of means (observed, least squares, weighted) for higher-order interactions,
with error bars (standard errors) for effects involving between-group factors as well as repeated measures factors;
extensive residual analyses and plots (for the "training" or computation sample, for a cross-validation or "verification" sample, or for a prediction sample without observed values for the dependent or response variables), plots of variance components; desirability profiler and response optimization for any model;
and adjusted means for traditional analysis of covariance designs. Extensive and flexible options for specifying planned comparisons are provided including facilities to specify contrasts using either the traditional command syntax or an extremely simple to use (Wizard-style) sequence of "intelligent" contrast dialogs
(you can enter contrast coefficients for clearly labeled levels of factors or cells in the design; the program will then evaluate the comparison for the least squares ("predicted") means, i.e., for the means as predicted by and consistent with the current model; this is a unique solution to the problem of planned comparisons in complex and incomplete designs); simple ways to test linear combinations of parameter estimates (e.g., to test for the equality of specific regression coefficients); specifications of custom error terms and effects; comprehensive post-hoc comparison methods for between group effects as well as repeated measures effects, and the interactions between repeated measures and between effects including: Fisher LSD, Bonferroni, Scheffé, Tukey HSD, Unequal N HSD, Newman Keuls, Duncan, and Dunnett's test
(with flexible options for estimating the appropriate error terms for those tests), tests of assumptions (e.g., Levene's test, plots of means vs. standard deviations, etc.).
STATISTICA General Regression Models (GRM) provides the user with a unique, highly flexible implementation of the standard and unique results options in the general linear models, as well as including a comprehensive set of stepwise regression and best-subset model building techniques supporting both continuous and categorical variables. Stepwise and best subset methods to build models for highly complex designs can be used in GRM, including designs with effects for categorical predictor variables. Thus, the "general" in General Regression Models refers both to the use of the general linear models, and to the fact that unlike most other stepwise regression programs, GRM is not limited to the analysis of designs that contain only continuous predictor variables. In addition, unique regression-specific results options include Pareto charts of parameter estimates, whole model summaries (tests) with various methods for evaluating no-intercept models, partial and semi-partial correlations, etc.
Stepwise and best-subset selection for continuous and categorical predictors (ANOVA models) for models with multiple dependent variables. GRM is a "sister program" to STATISTICA General Linear Model (GLM) module. In addition to the large number of unique analytic options available in GLM (including planned comparisons, custom-hypotheses, a wide selection of post-hoc tests, residual analyses options, etc.), the General Regression Models (GRM) module allows you to build models via stepwise and best subset methods. GRM makes these techniques available not only for traditional analytic problems with a single dependent variable, but extends them to analyses of problems with multiple dependent variables; thus, in a sense, GRM can be considered a (very unique) stepwise and best-subset canonical analysis program. These methods can be used with designs that include continuous and/or categorical predictor variables (i.e., ANOVA or ANCOVA designs), and the techniques used in GRM will ensure that multiple degree of freedom effects will be considered (moved in or out of the model) in blocks. Specifically, GRM allows you build models via forward- or backward-only selection (effects can only be entered or removed once during the selection process), standard forward or backward selection (effects can be moved in or out of the model at each step, according to F or p to enter or remove criteria), or via best subset selection; this latter method gives the user flexible options to control the models considered during the subset search (e.g., maximum and minimum subset sizes, Mallow's CP, R-square, and adjusted R-for best subset selection, etc.).
Results. The General Regression Models (GRM) module offers all standard and unique results options described in the context of the GLM module in the previous section (including desirability profiling, predicted and residual statistics for the computation or training sample, cross-validation or verification sample, and prediction sample; tests of assumptions, means plots, etc.). In addition, unique regression-specific results options are also available, including Pareto charts of parameter estimates, whole model summaries (tests) with various methods for evaluating no-intercept models, partial and semi-partial correlations, etc.
The Generalized Linear Models (GLZ) allows the user to search for both linear and nonlinear relationships between a response variable and categorical or continuous predictor variables (including multinomial logit and probit, signal detection models, and many others). Special applications of generalized linear models include a number of widely used types of analyses, such as binomial and multinomial logit and probit regression, Signal Detection Theory (SDT) or Tweedie models.
The Tweedie distribution is actually a family of distributions belonging to the class of exponential dispersion models such that the variance is of the form Var(Y) = φμP, where φ > 0 is the dispersion/scale parameter and μ is the mean. P must be in the interval (-∞, 0] U [1, ∞).
Note that STATISTICA Data Miner also includes an implementation of Generalized Additive Models, GAM).The GLZ module will compute all standard results statistics, including likelihood ratio tests, and Wald and score tests for significant effects, parameter estimates and their standard errors and confidence intervals, etc. The user-interfaces, methods for specifying designs, and "touch-and-feel" of the program is similar to GLM, GRM, and PLS. The user is able to easily specify ANOVA or ANCOVA-like designs, response surface designs, mixture surface designs, etc.; thus, even novice users will have no difficulty applying generalized linear models to analyze their data. In addition, GLZ includes a comprehensive selection of model checking tools such as spreadsheets and graphs for various residuals and outlier detection statistics, including raw residuals, Pearson residuals, deviance residuals, studentized Pearson residuals, studentized deviance residuals, likelihood residuals, differential Chi-square statistics, differential deviance, and generalized Cook distances, etc.
Models and link functions. A wide range of distributions (from the exponential family) can be specified for the response variable: Normal, Poisson, gamma, binomial, multinomial, ordinal multinomial, and inverse Gaussian. Further, the nature of the relationship between the predictors and the responses can be specified by choosing a so-called link function from a comprehensive list of (common and special-purpose) functions. Available link functions include: log, power, identity, logit, probit, complimentary log-log, and log-log links. Unlike other nonlinear models, these models can be fitted via fast estimation procedures, and allow meaningful interpretations (similar to general linear models), and hence, they are extensively employed in the analysis of non-linear relationships in science as well as applied research.
Stepwise and best-subset selection for continuous and categorical predictors (ANOVA-like models). In addition to the standard model fitting techniques, STATISTICA GLZ also provides unique options for exploratory analyses, including model building facilities like forward- or backward-only selection of effects (effects can only be selected for inclusion or removal once during the selection process), standard forward or backward stepwise selection of effects (effects can be entered or removed at each step, using a p to enter or remove criterion), and best subset regression methods (using the likelihood score statistic, model likelihood, or Akaike information criterion). These powerful methods can be applied to categorical predictors (ANOVA-like designs; effects will be moved in or out of the model as multiple-parameter blocks) as well as continuous predictors, and will save significant amounts of time when building appropriate models for complex data.
Results. The Generalized Linear Model module will compute all standard results statistics, including likelihood ratio tests, and Wald and score tests for significant effects, parameter estimates and their standard errors and confidence intervals, etc. In addition, for ANOVA-like designs, tables and plots of predicted means (the equivalent of least squares means computed in the general linear model) with their standard errors can be computed, to aid in the interpretation of results. GLZ also includes a comprehensive selection of model checking tools such as Spreadsheets and graphs for various residuals and outlier detection statistics, including raw residuals, Pearson residuals, deviance residuals, studentized Pearson residuals, studentized deviance residuals, likelihood residuals, differential Chi-square statistics, differential deviance, and generalized Cook distances, etc. As described earlier, predicted and residual statistics can be requested for observations that were used for fitting the model, and those that were not (i.e., for the cross-validation sample).
Partial Least Squares (PLS) includes a comprehensive selection of algorithms for univariate and multivariate partial least squares problems. PLS will compute all the standard results for a partial least squares analysis; in addition, it offers a large number of results options and in particular graphics options that are usually not available in other implementations; for example, graphs of parameter values as a function of the number of components, two-dimensional plots for all output statistics (parameters, factor loadings, etc.), two-dimensional plots for all residuals statistics, etc. Because PLS offers an identical selection of flexible user interfaces to that of GLM, GRM and GLZ, it is very easy to set up models in one module and quickly analyze the data using the same model in PLS. This unique flexibility allows even novice users to apply these powerful techniques to their analysis problems. The partial least squares method is a powerful data mining technique, particularly well suited for determining a smaller number of dimensions in a large number of predictors and response variables. These methods for analyzing linear systems have become popular only in the last few years; thus, many of the algorithms and statistics are still the subject of ongoing research.
The overparameterized and sigma-restricted model for categorical predictors. Like GLM and GLZ, PLS offers both the overparameterized and sigma restricted parameterization methods for categorical predictors (ANOVA-like models). In partial least squares models, the sigma restricted solution can be particularly useful, because it may produce less complex results (explain more variability with fewer components, made up of design vectors coded in sigma-restricted form).
Algorithms. STATISTICA PLS implements the two most general algorithms for partial least squares analysis: SIMPLS and NIPALS.
Results. PLS will compute all the standard results for a partial least squares analysis, and also offers a large number of results options and in particular graphics options that are usually not available in other implementations; for example, graphs of parameter values as a function of the number of components, two-dimensional plots for all output statistics (parameters, factor loadings, etc.), two-dimensional plots for all residuals statistics, etc. Also, like GLM, GRM, and GLZ, the Partial Least Squares module offers extensive residual analysis options, and predicted and residual statistics can be requested for observations that were used for fitting the model (the "training" sample), those that were not (i.e., the cross-validation or verification sample), and for cases without observed data on the dependent (response) variables (the prediction sample).
STATISTICA Advanced Linear/Non-Linear Models is compatible with Windows XP, Windows Vista, and Windows 7.
Native 64-bit versions and highly optimized multiprocessor versions are available.