Search notes:

R package: ggplot2

The ggplot2 package features a graphic paradigm which is called the »grammar of graphics«.
The package was created by Hadley Wickham, the author of »ggplot2: Elegant Graphics for Data Analysis« (Springer 2009).
ggplot operates on data frames.

Installing ggplot2

Because ggplot2 is one of the core members of tidyverse, the easiest way to install ggplot2 is to install the entire tidyverse:
install.packages('tidyverse')

The seven parameters of the grammar of graphics

The grammar of graphics has seven parameters that allow to create and adjust a graphic:
library(ggplot2)

ggplot(data = a_data_frame) +
  geom_XXX(
    mapping  = aes(x = var_1, y = var_2),
    stat     = …
    position = …
  )                +
  coord_XXX ( … )  +
  facet_XXX ( … )

ggplot()

In ggplot2, a plot is started with the function ggplot(data = ds). The data argument specifies the dataset to use in the graph.
This function creates a coordinate system on which layers can be added (for example geom_point(…).
Layers are added with the + operator.

geom_…()

geom…() defines how a plot represents data.
ggplot2 provides over 30 geom_*() functions. More extensions are found on ggplot2-exts.org.
A geom_…() function takes a mapping argument. It is returned by aes(x = …, y= …).
Every geom has its default stat (for example the default stat of geom_bar() is stat_count()).

Plotting multiple geoms

Multiple geoms can be displayed in a plot by adding them
ggplot(data = …) +
  geom_…(mapping = aes(x=…, y=…)) +
  geom_…(mapping = aes(x=…, y=…))
If both mapping arguments are equal, they can be factor out:
ggplot(data = … , mapping = aes(x=…,y=…)) +
  geom_…() +
  geom_…()

aes()

The x= and y= arguments of aes(…) specify which variables of the dataset that was specified in ggplot(…) are mapped to the x and y coordinates.
The returned value of aes() can either be passed to ggplot() or a layer. If passed to ggplot(), the aesthetics become defaults for each layer.
Old parameter names are replaced with better ones: shape for pch, size for cex (etc.?).

Line chart (year)

X11()

library(ggplot2)

df = data.frame (
      name   = c('abc', 'abc', 'abc', 'abc', 'abc',
                 'def', 'def', 'def', 'def', 'def',
                 'ghi', 'ghi', 'ghi', 'ghi', 'ghi',
                 'jkl', 'jkl', 'jkl', 'jkl', 'jkl',
                 'mno', 'mno', 'mno', 'mno', 'mno',
                 'pqr', 'pqr', 'pqr', 'pqr', 'pqr'),

      year   = c(2015 , 2016 , 2017 , 2018 , 2019 ,
                 2015 , 2016 , 2017 , 2018 , 2019 ,
                 2015 , 2016 , 2017 , 2018 , 2019 ,
                 2015 , 2016 , 2017 , 2018 , 2019 ,
                 2015 , 2016 , 2017 , 2018 , 2019 ,
                 2015 , 2016 , 2017 , 2018 , 2019 
                 ),

      val_1  = c(   8 ,   10,    11 ,    9 ,    8 ,  # abc
                    7 ,    5,     5 ,    6 ,    7 ,  # def
                    6 ,    8,     7 ,    4 ,    9 ,  # ghi
                    8 ,    7,     6 ,    7 ,    6 ,  # jkl
                    5 ,    4,     6 ,    8 ,    9 ,  # mno
                    4 ,    6,     5,     3 ,    2 ), # pqr

      val_2  = c(   7 ,   11,     9 ,    9 ,    8 ,  # abc
                    6 ,    7,     7 ,    4 ,    9 ,  # def
                    7 ,    5,     5 ,    9 ,   10 ,  # ghi
                    8 ,    7,     8 ,    6 ,    3 ,  # jkl
                    4 ,    5,     6 ,    7 ,    8 ,  # mno
                    5 ,    7,     6 ,    8 ,    4 )  # pqr

    )

s <- subset(df, name %in% c('def', 'ghi', 'jkl'))

ggplot(data = s, aes(x = year, y = val_1, group = name, color = name)) + geom_line()

# ggsave('img/line-chart-year.png', width=12, height=12, units='cm', dpi=72)

cat ("Press enter...")
readLines("stdin", n = 1)
Github repository about-r, path: /packages/ggplot2/line-chart-year.R

Multiple lines

The following example plots values «in parallel».
Before using the pivotized data, it needs to be prepared with gather(…) of the tidyr package.
library(ggplot2)
library(tidyr  )
library(dplyr  )

data <- data.frame(
   year   = c(2001, 2002, 2003, 2004, 2005, 2006),
   apple  = c(  11,    9,   10,    7,    7,    8),
   banana = c(  14,   15,   12,   12,   11,   14),
   cherry = c(   7,    7,    9,   10,    8,    9),
   junk   = c(   3,    2,    4,    6,    3,    8)
);

gathered_data <- data                             %>%
   select (year, apple, banana, cherry)           %>%  # get rid of junk
   gather (key = 'fruit', value = 'val', -year);


ggplot(
  gathered_data,
  aes(
    x = year,
    y = val
  )
) +
geom_line(
     aes(color = fruit),
     size = 2
) +
scale_color_manual(
    values = c(
       'apple' =  '#d35c23',
       'banana' = '#b2b112',
       'cherry' = '#340289')
) +
theme_minimal(
)
Github repository about-r, path: /packages/ggplot2/geom_line/multiple-lines.R

Bar chart

Count the occurences of each item:
X11()

library(ggplot2)

df = data.frame (
       item = c('foo', 'bar', 'foo', 'bar', 'foo', 'baz', 'foo', 'bar', 'baz'),
       val  = c(   9 ,    6 ,    4 ,    7 ,    6 ,    7 ,    3 ,    8 ,    6 )
)

ggplot(data = df) + geom_bar(mapping = aes(x = item))
ggsave('img/geom_bar.png', width=12, height=12, units='cm', dpi=72)

cat ("Press enter...")
readLines("stdin", n = 1)
Github repository about-r, path: /packages/ggplot2/geom_bar.R

Counting combinations

The following example first creates a data frame. dplyr is then used to count the occurence of every combination of val_one and val_two. Finally, geom_count() is used to also graphically plot the counts of these combinations.
X11()

library(ggplot2)
library(dplyr  )

df <- data.frame(
       val_one = c('foo', 'foo', 'bar', 'foo', 'bar', 'baz', 'bar', 'foo', 'baz', 'bar', 'baz', 'bar'),
       val_two = c('B'  , 'A'  , 'A'  , 'A'  , 'C'  , 'A'  , 'B'  , 'A'  , 'B'  , 'C'  , 'B'  , 'A'  )
)

df %>% count(val_one, val_two)
#
# val_one val_two     n
# <fct>   <fct>   <int>
# bar     A           2
# bar     B           1
# bar     C           2
# baz     A           1
# baz     B           2
# foo     A           3
# foo     B           1

ggplot(data = df) +
  geom_count(mapping = aes(x=val_one, y=val_two))

# ggsave('img/geom_count.png', width=12, height=12, units='cm', dpi=72)

cat ("Press enter...")
readLines("stdin", n = 1)
Github repository about-r, path: /packages/ggplot2/geom_count.R

Using count in a data frame

The following example produces the same plot as above, but uses the count column of the data frame to specify the dot size in the plot:
X11()
library(ggplot2)

df <- data.frame(
  val_one = c('bar', 'bar', 'bar', 'baz', 'baz', 'foo', 'foo'),
  val_two = c('A'  , 'B'  , 'C'  , 'A'  , 'B'  , 'A'  , 'B'  ),
  count   = c( 2   ,  1   ,  2   ,  1   ,  2   ,  3   ,  1   )
)

ggplot(
     data    = df,
     mapping = aes(x = val_one, y = val_two)
   ) +
   geom_point(aes(size = count))


# ggsave('img/geom_point-size.png', width=12, height=12, units='cm', dpi=72)

cat ("Press enter...")
readLines("stdin", n = 1)
Github repository about-r, path: /packages/ggplot2/geom_point-size.R

See also

Creating a heatmap with ggplot2
An example that plots 3 variables in a data frame using ggplot2.
The next iteration of ggplot2 seems to be ggvis: it has the pipe (%>%).
R functions for graphics
qqplot
The ggformula package provides a formula interface to ggplot2 graphics.
R packages

Index