Erik Igelström

Curated list of R packages

There are often several R packages that do more or less the same thing, and it can take a bit of hunting to work out which one is most worth using. To save myself time spent googling the same things over and over, I've started making notes whenever I have to pick a package to do a thing. I'll keep those notes here, and maybe it can help save you time too.

I'll update this page from time to time as I think of more things to add. My aim is to keep these notes pretty objective and uncontroversial, but if I accidentally end up taking a side in any major debates, I hope you'll excuse me!

Layout inspired by blessed.rs.

Reading and writing files

Use casePackage
General purpose
  • rio: Provides a single, unified interface (import() and export()) for lots of file formats

Microsoft Excel
  • readxl: Read only. Most popular, fastest, and part of the tidyverse.
  • openxlsx: Read and write.
  • xlsx: Read and write; historically popular but has a dependency on Java (unlike openxlsx).

Stata/SPSS/SAS/etc.
  • haven: Most widely used, part of the tidyverse.
  • foreign: Older and supports some formats that haven doesn't support.
  • readstata13: Behaves more like foreign than haven, and might work better for older Stata versions?

JSON

XML
  • xml2: Most modern and useful.
  • XML: Older but has some specialised features (e.g. xmlToDateFrame for tabular data, and readKeyValueDB for plist files).

API endpoints

Some packages for reading specific text formats (e.g. jsonlite, xml2) can also take URLs as input, bypassing the need for a separate package.

Visualisation and plots

Use casePackage
Combine multiple plots
  • patchwork: Modern solution, neat syntax. If you're not tied to any particular solution, this would be my first choice.
  • cowplot: plot_grid() was historically more popular and still works fine.
  • gridExtra: grid.arrange() also exists but is less popular.

Specialised plots
  • pROC: ROC curves
  • GGally: Pairwise correlation plots
  • treemapify: Treemaps
  • waffle: Waffle plots (package seems to be abandoned)
  • ggwaffle: Waffle plots (package seems to be abandoned)

Data analysis and modelling

Use casePackage
Time series
  • zoo:
  • xts:
  • Tidyverts: Less popular than zoo and xts; intended to follow "tidy data" principles a bit more (see introduction vignette for the tsibble package).
  • R Cookbook chapter: Not a package, but has a lot of practical information on how to do things you might want to do with time series.

Both zoo and xts are widely used and recommended, and both provide an object class to represent time series data, and a variety of functions for operating on them. xts and zoo objects are interoperable at least to some extent. There is also a built-in ts class in R, but both zoo and xts seem preferable and more widely used.


Comments

Fill in the form below to add a comment. I manually review all comments before publishing them. Your name and any website link you provide will be made public, but your email address will not.