R Workshops for Data Science

University of KwaZulu Natal

Author

Aubrey Mpungose

An Invitation to R for Data Science

These workshops aim to introduce participants to the foundations of data science using R programming Language. As you advance in our studies and career, you will learn that the ability to collect, clean, transform, analyse and use data for predictive analytics is one of the most needed skills in the labour market. As we live in the digital age, where big data has become a commodity, learning to handle these data systems will be one of the most important investments in your career.

In this course, we will be using R for programming and data analysis. R, along with Python, is flexible when working with data, especially large data. Majority of academics, researchers, and students who analyse quantitative data use programs such as SPSS, STATA and Excel. However, these programs are expensive! Universities spend large sums of money to purchase these licences, which disadvantages students and academics in the developing world. They are also very limited in handling various data formats such as big data, text data, geospatial data, etc. But if you insist on learning them, good for you.

On the other hand, R and Python are free and come with with ABSOLUTELY NO WARRANTY 😎 . They are very flexible and can handle very large data. They are the dominant programming languages used in the labour market around the world. R and Python have thousands of libraries that can handle and analyse any type of data, including basic data cleaning and wrangling, transforming data, regression, visualisation, text analysis and natural language processing, statistical analysis, machine learning, geospatial analysis and visualisation. In academia, researchers are encouraged to make their research outputs Reproducible, that is, researchers must share code, data and analysis when submitting papers to journals. This is called Reproducible Science.

Both R and Python are awesome programming languages. Here, we will use R because I feel there are many libraries that are user friendly in R compared to Python. However, I will also share Python code that correspond with R code. At the end of this course, you will have an option to continue to serious data science stuff.

Learning Objectives

  • Understand R and its functions

  • Conduct basic programming using R

  • Learn to wrangle, clean and transform data

  • Learn the basics of data visualisation using ggplot2

  • Learn how to conduct exploratory data analysis

  • Learn to communicate and tell stories using data

Materials

There are tons of materials available online; below I share compulsory and recommended materials we will be using. There will be additional materials and slides for each section:

Schedule

Week Topic Presenter
Week 1 Introduction: Basics of R functions Aubrey
Week 1 Data Structures and Types Aubrey
Week 2 Data Visualisation Aubrey
Week 3 Data Manipulation Aubrey
Week 4 Data manipulation Part 2 Aubrey
Week 5 Importing data, Reproducible workflows Aubrey
Week 6 Communicating and Reporting Data Aubrey

Let’s Get Started