NHS_intro_DS

Fundamentals of Data Science for NHS using R

Slides

Day Slides
1 Introduction to Tidyverse
2 Data Manipulation
3 Categorical Variables
4 Relational Data
5 Data Visualisation I
6 Data Visualisation II
7 Exploratory Data Analysis I
8 Exploratory Data Analysis II

Summary of the course

In this course we will introduce the basic ideas of Data Science and we will implement them using the R programming language. We will use the Tidyverse, which is a collection of R packages that facilitate data import, manipulation, encoding, exploration and visualisation.

Learning outcomes

Detailed Programme

Day 1: Introduction to Tidyverse

Introduction to R and RStudio. Workflow. Tidy data. The Tidyverse ecosystem. Data import. Tibbles. Dplyr basics. Pipes.

Day 2: Data Manipulation

Dplyr verbs. Numerical summaries. SQL and Dplyr.

Day 3: Categorical Variables

Factors. The package forcats. Modifying factor order. Modifying factors levels.

Day 4: Relational Data

Mutating joins. Filtering joins. Set operations.

Day 5: Data Visualisation I

Introduction to ggplot2. Creating a ggplot. Aesthetic mappings. Geometric objects.

Day 6: Data Visualisation II

More geometric objects. Themes.

Day 7: Exploratory Data Analysis I

Visualising distributions. Typical vs unusual values. Missing values.

Day 8: Exploratory Data Analysis II

Covariation. A categorical and continuous variable. Two categorical variables. Two continuous variables.