Statistics (Conference papers)

Permanent URI for this collection

Browse

Recent Submissions

  • Publication
    Program quality with pair programming in CS1
    (2005) Krnjaji c, Milovan; |~|
    In several regression applications, a different structural relationship might be anticipated for the higher or lower responses than the average responses. In such cases, quantile regression analysis can uncover important features that would likely be overlooked by mean regression. We develop two distinct Bayesian approaches to fully nonparametric model-based quantile regression. The first approach utilizes an additive regression framework with Gaussian process priors for the quantile regression functions and a scale uniform Dirichlet process mixture prior for the error distribution, which yields flexible unimodal error density shapes. Under the second approach, the joint distribution of the response and the covariates is modeled with a Dirichlet process mixture of multivariate normals, with posterior inference for different quantile curves emerging through the conditional distribution of the response given the covariates. The proposed nonparametric prior probability models allow the data to uncover non-linearities in the quantile regression function and non-standard distributional features in the response distribution. Inference is implemented using a combination of posterior simulation methods for Dirichlet process mixtures. We illustrate the performance of the proposed models using simulated and real data sets.
  • Publication
    Bayesian Model Specification: Some problems related to model choice and calibration
    (2011) Krnjajic, Milovan; |~|
    In the development of Bayesian model specification for inference and prediction we focus on the conditional distributions p([theta],[beta]) and p(D[theta],[beta]), with data D and background assumptions [beta], and consider calibration (an assessment of how often we get the right answers) as an important integral step of the model development. We compare several predictive model-choice criteria and present related calibration results. In particular, we have implemented a simulation study to compare predictive model-choice criteria LS[cv] , a log-score based on cross-validation, LS[fs], a full-sample log score, with deviance information criterion, DIC. We show that for several classes of models DIC and LS[cv] are (strongly) negatively correlated; that LS[fs] has better small-sample model discrimination performance than either DIC, or LS[cv] ; we further demonstrate that when validating the model-choice results, a standard use of posterior predictive tail-area for hypothesis testing can be poorly calibrated and present a method for its proper calibration.
  • Publication
    Quantifying the Price of Uncertainty in Bayesian Models
    (2013) Krnjajic, Milovan; |~|
    During the exploratory phase of a typical statistical analysis it is natural to look at the data in order to narrow down the scope of the subsequent steps, mainly by selecting a set of families of candidate models (parametric, for example). One needs to exercise caution when using the same data to assess the parameters of a specific model and deciding how to search the model space, in order not to underestimate the overall uncertainty, which usually occurs by failing to account for the second order randomness involved in exploring the modelling space. In order to rank the models based on their fit or predictive performance we use practical tools such as Bayes factors, log-scores and deviance information criterion. Price for model uncertainty can be paid automatically when using Bayesian nonparametric (BNP) specification, by adopting weak priors on the (functional) space of possible models, or in a version of cross validation, where only a part of the observed sample is used to fit and validate the model, whereas the assessment of the calibration of the overall modelling process is based on the as-yet unused part of the data set. It is interesting to see if we can determine how much data needs to be set aside for calibration in order to obtain an assessment of uncertainty approximately equivalent to that of the BNP approach.
  • Publication
    Bioassays with natural mortality: handling overdispersion using random effects
    (2011) Hinde, John; |~|
    In fitting dose-response models to entomological data it is often necessary to take account of natural mortality and/or overdispersion. The standard approach to handle natural mortality is to use Abbott's formula (Abbott, 1925), which allows for a constant underlying mortality rate. Standard overdispersion models include beta-binomial models, logistic-normal, and discrete mixtures. We extend the standard model (Morgan, 1992), and include a random effect in the dose levels, using the approach described in Aitkin et al. (2009). We consider the application of this model to data from an experiment on the use of a virus (PhopGV) for the biological control of worm larvae (Phthorimaea operculella) in potatoes, using a procedure implemented in software R. Using the model with random effects in the dose levels, we obtained a better fit than that provided by the standard model.
  • Publication
    A random effects continuation-ratio model for replicated toxicological data
    (2011) Martinez, Marie-Jose; Hinde, John; |~|
  • Publication
    Analysis of an Observational Study
    (2011) Dooley, Cara; Hinde, John; Conesa, D, Forte, A, Lopez-Quilez, A and Munoz, F.; |~|
    The study presented below aimed to compare survival of colorectal cancer patients against survival of a sub-population with a secondary disease, in ammatory bowel disease (IBD). The data were taken from a observational study, that is there was no explicit design. The study had many complications, but the most significant aspect was that the number of controls was much greater than the number of cases of interest. Some techniques are used to overcome these obstacles, including: matching of the dataset, to make the controls and cases as similar as possible at time of diagnosis, effectively retrospectively fi tting a design; weighting of the data, using both the propensity score and the number of similar patients found in matching.
  • Publication
    Analysis of an Observational Studies - An Example Using Data from the Irish Cancer Registry
    (2011) Dooley, Cara; Hinde, John; |~|
    The study presented below aimed to compare survival of colorectal cancer patients against survival of a sub-population with a secondary disease, in ammatory bowel disease (IBD). The data were taken from a observational study, that is there was no explicit design. The study had many complications, but the most signi cant aspect was that the number of controls was much greater than the number of cases of interest. Some techniques are used to overcome these obstacles, including: matching of the dataset, to make the controls and cases as similar as possible at time of diagnosis, e ectively retrospectively tting a design; weighting of the data, using both the propensity score and the number of similar patients found in matching.