# Approximately Sufficient Statistics and Bayesian Computation

@article{Joyce2008ApproximatelySS, title={Approximately Sufficient Statistics and Bayesian Computation}, author={Paul Joyce and Paul Marjoram}, journal={Statistical Applications in Genetics and Molecular Biology}, year={2008}, volume={7} }

The analysis of high-dimensional data sets is often forced to rely upon well-chosen summary statistics. A systematic approach to choosing such statistics, which is based upon a sound theoretical framework, is currently lacking. In this paper we develop a sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference. Our method can be applied to high-dimensional data sets for which exact likelihood equations are… Expand

#### 235 Citations

Choosing summary statistics by least angle regression for approximate Bayesian computation

- Mathematics
- 2016

ABSTRACT Bayesian statistical inference relies on the posterior distribution. Depending on the model, the posterior can be more or less difficult to derive. In recent years, there has been a lot of… Expand

Local dimension reduction of summary statistics for likelihood-free inference

- Computer Science, Mathematics
- Stat. Comput.
- 2020

A localization strategy is introduced for any projection-based dimension reduction method, in which the transformation is estimated in the neighborhood of the observed data instead of the whole space, to improve the estimation accuracy for localized versions of linear regression and partial least squares. Expand

Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits

- Mathematics, Computer Science
- ArXiv
- 2018

This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem, which allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing Summary statistics as opposed to a fixed set of statistics. Expand

On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

- Medicine, Mathematics
- Statistical applications in genetics and molecular biology
- 2010

It was found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged. Expand

Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC

- Mathematics, Computer Science
- 2010

This work shows how to construct appropriate summary statistics for ABC in a semi-automatic manner, and shows that optimal summary statistics are the posterior means of the parameters, while these cannot be calculated analytically. Expand

A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

- Biology, Medicine
- Genetics
- 2012

An approach for choosing summary statistics based on boosting, a technique from the machine-learning literature, is proposed and it is found that ABC with summary statistics chosen locally via boosting with the L2-loss performs best. Expand

Choice of Summary Statistic Weights in Approximate Bayesian Computation

- Computer Science, Medicine
- Statistical applications in genetics and molecular biology
- 2011

In this paper, we develop a Genetic Algorithm that can address the fundamental problem of how one should weight the summary statistics included in an approximate Bayesian computation analysis built… Expand

Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation (with Discussion)

- Computer Science, Mathematics
- 2012

This work shows how to construct appropriate summary statistics for ABC in a semi-automatic manner, and shows that optimal summary statistics are the posterior means of the parameters. Expand

Simulation-based bayesian analysis of complex data

- Computer Science, Medicine
- SummerSim
- 2015

This paper argues for the advantage of a simulation-based approximate Bayesian method that remains tractable when tractability of other methods is lost, and demonstrates the utility of simulation- based analyses of large datasets within a rigorous statistical framework. Expand

#### References

SHOWING 1-10 OF 22 REFERENCES

Approximate Bayesian computation in population genetics.

- Biology, Medicine
- Genetics
- 2002

A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Expand

Partition structures and sufficient statistics

- Mathematics
- 1998

Is the Ewens distribution the only one-parameter family of partition structures where the total number of types sampled is a sufficient statistic? In general, the answer is no. It is shown that all… Expand

Monte Carlo Sampling Methods Using Markov Chains and Their Applications

- Mathematics
- 1970

SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and… Expand

Approximate Bayesian Computation and MCMC

- Computer Science
- 2004

Methods for simulating observations from posterior distributions without the use of likelihoods are discussed, using an example concerning inference in the fossil record and a novel Markov chain Monte Carlo approach. Expand

Markov chain Monte Carlo without likelihoods

- Computer Science, Medicine
- Proceedings of the National Academy of Sciences of the United States of America
- 2003

A Markov chain Monte Carlo method for generating observations from a posterior distribution without the use of likelihoods is presented, which can be used in frequentist applications, in particular for maximum-likelihood estimation. Expand

Sequential Monte Carlo without likelihoods

- Computer Science, Medicine
- Proceedings of the National Academy of Sciences
- 2007

This work proposes a sequential Monte Carlo sampler that convincingly overcomes inefficiencies of existing methods and demonstrates its implementation through an epidemiological study of the transmission rate of tuberculosis. Expand

Modern computational approaches for analysing molecular genetic variation data

- Biology, Medicine
- Nature Reviews Genetics
- 2006

This work outlines some of these model-based approaches, including the coalescent, and discusses the applicability of the computational methods that are necessary given the highly complex nature of current and future data sets. Expand

Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites

- Biology, Medicine
- Genetics
- 2005

A “haplotype configuration test” of neutrality (HCT) based on the full haplotype frequency distribution is developed and the utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster. Expand

Coalescent Theory

- 1986

The coalescent process is a powerful modeling tool for population genetics. The allelic states of all homologous gene copies in a population are determined by the genealogical and mutational history… Expand

The sampling theory of selectively neutral alleles.

- Mathematics, Medicine
- Theoretical population biology
- 1972

This paper considers deductive and subsequently inductive questions relating to a sample of genes from a selectively neutral locus, and the test of the hypothesis that the alleles being sampled are indeed selectively neutral will be considered. Expand