Sessions

Size of Datasets for Analytics and Implications for R

Size of Datasets for Analytics and Implications for R

2 years ago
With so much hype about "big data" and the industry pushing for distributed computing vs traditional single-machine tools, one wonders about the future of R. In this talk I will argue that most data analysts/data scientists don’t actually work with big data the majority of the time, therefore using immature "big data" tools is in […]
permuter: An R package for randomization inference

permuter: An R package for randomization inference

2 years ago
Software packages for randomization inference are few and far between. This forces researchers either to rely on specialized stand-alone programs or to use classical statistical tests that may require implausible assumptions about their data-generating process. The absence of a flexible and comprehensive package for randomization inference is an obstacle for researchers from a wide range […]
Using Spark with Shiny and R Markdown

Using Spark with Shiny and R Markdown

2 years ago
R is well-suited to handle data that can fit in memory but additional tools are needed when the amount of data you want to analyze in R grows beyond the limits of your machine’s RAM. There have been a variety of solutions to this problem over the years that aim to solve this problem in […]
Compiling parts of R using the NIMBLE system for programming algorithms

Compiling parts of R using the NIMBLE system for programming algorithms

2 years ago
The NIMBLE R package provides a flexible system for programming statistical algorithms for hierarchical models specified using the BUGS language. As part of the system, we compile R code for algorithms and seamlessly link the compiled objects back into R, with our focus being on mathematical operations. Our compiler first generates C++, including Eigen code […]
Implementing R in old economy companies: From proof-of-concept to production

Implementing R in old economy companies: From proof-of-concept to production

2 years ago
In old economy companies, the introduction of R is typically a button-up process that follows a pattern of three major stages of maturity: At the first stage, guerrilla projects use R parallel to the "official" IT environment. The usage of R is often initiated by interns, student assistants or newly recruited graduates. At the second […]
Two-sample testing in high dimensions

Two-sample testing in high dimensions

2 years ago
Estimation for high-dimensional models has been widely studied. However, uncertainty quantification remains challenging. We put forward novel methodology for two-sample testing in high dimensions (Städler and Mukherjee, JRSSB, 2016). The key idea is to exploit sparse structure in the construction of the test statistics and in p-value calculation. This renders the test effective but leads […]
Connecting R to the OpenML project for Open Machine Learning

Connecting R to the OpenML project for Open Machine Learning

2 years ago
OpenML is an online machine learning platform where researchers can automatically log and share data, code, and experiments, and organize them online to work and collaborate more effectively. We present an R package to interface the OpenML platform and illustrate its usage both as a stand-alone package and in combination with the mlr machine learning […]
Bayesian analysis of generalized linear mixed models with JAGS

Bayesian analysis of generalized linear mixed models with JAGS

2 years ago
BUGS is a language for describing hierarchical Bayesian models which syntactically resembles R. BUGS allows large complex models to be built from smaller components. JAGS is a BUGS interpreter written in C++ which enables Bayesian inference using Markov Chain Monte Carlo (MCMC). Several R packages provide interfaces to JAGS (e.g. jags, runjags, R2jags, bayesmix, iBUGS, […]
R/qtl: Just Barely Sustainable

R/qtl: Just Barely Sustainable

2 years ago
R/qtl is an R package for mapping quantitative trait loci (genetic loci that contribute to variation in quantitative traits, such as blood pressure) in experimental crosses (such as in mice). I began its development in 2000; there have been 46 software releases since 2001. The latest version contains 39k lines of R code, 24k lines […]
trackeR: Intrastructure for running and cycling data from GPS-enabled tracking devices in R

trackeR: Intrastructure for running and cycling data from GPS-enabled tracking devices in R

2 years ago
The use of GPS-enabled tracking devices and heart rate monitors is becoming increasingly common in sports and fitness activities. The trackeR package aims to fill the gap between the routine collection of data from such devices and their analyses in a modern statistical environment like R. The package provides methods to read tracking data and […]
Efficient tabular data ingestion and manipulation with MonetDBLite

Efficient tabular data ingestion and manipulation with MonetDBLite

2 years ago
We present "MonetDBLite", a new R package containing an embedded version of MonetDB. MonetDB is a free and open source relational database focused on analytical applications. MonetDBLite provides fast complex query answers and unprecedented speeds for data availability and data transfer to and from R. MonetDBLite greatly simplifies database installation, setup and maintenance. It is […]
On the emergence of R as a platform for emergency outbreak response

On the emergence of R as a platform for emergency outbreak response

2 years ago
The recent Ebola virus disease outbreak in West Africa has been a terrible reminder of the necessities of rapid evaluation and response to emerging infectious disease threats. For such response to be fully informed, complex epidemiological data including dates of symptom onsets, locations of the cases, hospitalisation, contact tracing information and pathogen genome sequences have […]