hist(data, col = "green") # data is a vector
Fundamentals of Data Science for NHS using R
Unlike most other graphics packages, ggplot2 has an underlying grammar, based on the Grammar of Graphics, that allows you to compose graphs by combining independent components. This makes ggplot2 powerful. Rather than being limited to sets of pre-defined graphics, you can create novel graphics that are tailored to your specific problem.
Hadley Wickham,
ggplot2: Elegant Graphics for Data Analysis
Theoretical foundation of graphical applications and packages including ggplot2.
Should you read it? Maybe in the future!
To have an intuition (and other things about ggplot2) watch
R for Data Science work-in-progress 2nd edition by Hadley Wickham, Mine Çetinkaya-Rundel, Garrett Grolemund.
This is still the best place where to start also for data visualisation! (see chapters 2 and 10)
R Graphics Cookbook (2e) by Winston Chang.
A lot of recipes to produce plots!
ggplot2 (3e) by Hadley Wickham & Danielle Navarro & Thomas Lin Pedersen.
This is what you should read to fully understand how ggplot2 works.
Books:
Data Visualization - A practical introduction by Kieran Healy
Data Visualization with R by Rob Kabacoff
Websites:
Install and load tidyverse
and palmerpenguins
in your environment!
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different shapes for different islands.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different shapes for different islands. Increase the size of the points to 7
. We will keep this size for the rest of the session unless otherwise stated.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different islands.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different values of the continuous variable flipper_length_mm
.
Draw a scatter plot using the variable flipper_length_mm
for the x-axis, and the variable body_mass_g
on the y-axis. Use different colours for different values of the new continuous variable obtained from the ratio of bill_length_mm
by bill_depth_mm
.
Repeat the previous exercise using the dplyr verb mutate
calling the new variable ratio
.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different species.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Highlight the penguins that belong to the “Chinstrap” species.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different species, and different shapes for different islands.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different species. Change opacity to 0.4.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different penguin sex. Change opacity to 0.4.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different penguin species. Change opacity to 0.4. Use different facet (panels) for different species.
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different penguin sex. Change opacity to 0.4. Use different facet (panels) for different species.
Find the mean of every group identified by species and sex.
# A tibble: 6 × 4
# Groups: species [3]
species sex bill_length_mm bill_depth_mm
<fct> <fct> <dbl> <dbl>
1 Adelie female 37.3 17.6
2 Adelie male 40.4 19.1
3 Chinstrap female 46.6 17.6
4 Chinstrap male 51.1 19.3
5 Gentoo female 45.6 14.2
6 Gentoo male 49.5 15.7
Draw a scatter plot using the variable bill_length_mm
for the x-axis, and the variable bill_depth_mm
on the y-axis. Use different colours for different penguin sex. Change opacity to 0.4. Use different facet (panels) for different species and different sex.
Draw an histogram with the distribution of the continuous variable flipper_length_mm
.
Draw an histogram with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species.
Draw an histogram with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species. Change opacity to 0.4.
Draw an histogram with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species. Change opacity to 0.4. Use the option position = "stack"
. What changed compared to the previous exercise?
Draw an histogram with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species. Change opacity to 0.4. Use the option position = "identity"
. What changed compared to the previous exercise?
Draw an histogram with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species. Change opacity to 0.4. Use the option position = "identity"
. This time use also different colour for the borders of the histogram.
Draw an histogram with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species. Change opacity to 0.4. Use the option position = "dodge"
. What changed compared to the previous exercise?
Draw a density plot with the distribution of the continuous variable flipper_length_mm
.
Draw a density plot with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species.
Draw a density plot with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species. Change opacity to 0.6.
Draw a density plot with the distribution of the continuous variable flipper_length_mm
. Use different colours for different species. Change opacity to 0.6. Change also the colour of the border of the density plot and change its size.
Reproduce the following plot. Observe the values on the x-axis.
Repeat the previous exercise using the function seq
.
Repeat the previous exercise also changing the values on the y-axis.
Reproduce the following plot.
Change the ratio between x and y axis. Keep the ratio fixed even if you resize the picture.
# A tibble: 44 × 3
group x y
<chr> <dbl> <dbl>
1 1 10 8.04
2 1 8 6.95
3 1 13 7.58
4 1 9 8.81
5 1 11 8.33
6 1 14 9.96
7 1 6 7.24
8 1 4 4.26
9 1 12 10.8
10 1 7 4.82
11 1 5 5.68
12 2 10 9.14
13 2 8 8.14
14 2 13 8.74
15 2 9 8.77
16 2 11 9.26
17 2 14 8.1
18 2 6 6.13
19 2 4 3.1
20 2 12 9.13
21 2 7 7.26
22 2 5 4.74
23 3 10 7.46
24 3 8 6.77
25 3 13 12.7
26 3 9 7.11
27 3 11 7.81
28 3 14 8.84
29 3 6 6.08
30 3 4 5.39
31 3 12 8.15
32 3 7 6.42
33 3 5 5.73
34 4 8 6.58
35 4 8 5.76
36 4 8 7.71
37 4 8 8.84
38 4 8 8.47
39 4 8 7.04
40 4 8 5.25
41 4 19 12.5
42 4 8 5.56
43 4 8 7.91
44 4 8 6.89
Obtain the following numerical summaries
# A tibble: 4 × 6
group mean_x var_x mean_y var_y cor_xy
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 9 11 7.50 4.13 0.816
2 2 9 11 7.50 4.13 0.816
3 3 9 11 7.5 4.12 0.816
4 4 9 11 7.50 4.12 0.817
Plot the data using an appropriate technique we have seen today.
More Data Visualisation!