BayesAnova

class robusta.groupwise.BayesAnova(which_models='withmain', iterations=10000, scale_prior_fixed='medium', scale_prior_random='nuisance', r_scale_effects=<rpy2.rinterface_lib.sexp.NULLType object> [RTYPES.NILSXP], multi_core=False, method='auto', no_sample=False, include_subject=False, **kwargs)

Bases: robusta.groupwise.models.Anova

Run a between, within or mixed Bayesian ANOVA.

Parameters
  • data (pd.DataFrame) – Containing the subject, dependent, between’ and `within variables as columns.

  • formula (str, optional) – An R-style formula describing the statistical model. In the form of (dependent ~ between + within | subject). If used, the parsed formula will overrides the following arguments dependent, between, within and subject.

  • dependent (key in data, optional) – The name of the column identifying the dependent variable (i.e., response variable) in the data. The column data type should be numeric or a string that can be coerced to numeric. Overriden by formula if specified. Required if formula is not specified.

  • between (key(s) in data (str or array-like), optional) – The name of the column identifying the independent variable (i.e., predictor variable) in the data. Identifies variables that are manipulated between different subject units (i.e., exogenous variable). Overriden by formula if specified. Not required if formula is not specified, given within is is specified.

  • within (key(s) in data (str or array-like), optional) – The name of the column identifying the independent variable in the data (i.e., predictor variable). The Identifies variables that are manipulated within different subject units (i.e., endogenous variable). Overriden by formula if specified. Not required if formula is not specified, given between is is specified.

  • subject (str or key in data, optional) – The name of the column identifying the sampling unit in the data (i.e., subject). Overriden by formula if specified. Required if formula is not specified.

  • agg_func (str (name of pandas aggregation function) or callable, optional) – Specified how to aggregate observations within sampling.

  • which_models (str, optional) – Setting which_models to ‘all’ will test all models that can be created by including or not including a main effect or interaction. ‘top’ will test all models that can be created by removing or leaving in a main effect or interaction term from the full model. ‘bottom’ creates models by adding single factors or interactions to the null model. ‘withmain’ will test all models, with the constraint that if an interaction is included, the corresponding main effects are also included. Default value is ‘withmain’.

  • iterations (int, optional) – Number of iterations used to estimate Bayes factor. Default value is 10000.

  • scale_prior_fixed ([float, str], optional) –

    Controls the scale of the prior distribution for fixed factors. Default value is 1.0 which yields a standard Cauchy prior. It is also possible to pass ‘medium’, ‘wide’ or ‘ultrawide’ as input

    arguments instead of a float (matching the values of

    \(\frac{\sqrt{2}}{2}, 1, \sqrt{2}\), respectively).

  • scale_prior_random ([float, str], optional) – Similar to scale_prior_fixed but applies to random factors and can except all values specified above and also ‘nuisance’ - variance in the data that may stem from variables which are irrelevant to the model, such as participants. Default value is ‘nuisance’. r_scale_effects=pyr.rinterface.NULL,

  • multi_core (bool, optional) – Whether to use multiple cores for estimation. Not available on Windows. Default value is False.

  • method (str, optional) –

    The method used to estimate the Bayes factor depends on the method

    argument. “simple” is most accurate for small to moderate sample sizes, and uses the Monte Carlo sampling method described in Rouder et al. (2012). “importance” uses an importance sampling algorithm with an importance distribution that is multivariate normal on log(g). “laplace” does not sample, but uses a Laplace approximation to the integral. It is expected to be more accurate for large sample sizes, where MC sampling is slow. If method=”auto”, then an initial run with both samplers is done, and the sampling method that yields the least-variable samples is chosen. The number of initial test iterations is determined by options(BFpretestIterations).

    no_samplebool, optional

    Will prevent sampling when possible (i.e., rely on calculation of BayesFactor). Default value is False. Currently True is not implemented and may lead to an error. TODO test how we would handle no_sample = True

Notes

R function anovaBF: https://www.rdocumentation.org/packages/BayesFactor/versions/0.9.12-4.2/topics/anovaBF from the BayesFactor packages 1.

References

1

Morey, R. D., Rouder, J. N., Jamil, T., & Morey, M. R. D. (2015). Package ‘bayesfactor’.