Introduction to Basic Statistical Concepts

Time to complete: 40 minutes
What will this topic cover?
This topic forms part of a wider learning pathway and is designed to help you explore fundamental digital skills and think about how you can use them to enhance your daily working practices and approaches. This learning topic, within the Advanced Data pathway, introduces you to the concept of advanced data literacy and approaches of using data within the University.
Please note that the topics within the advanced data pathway involve more in-depth analysis, mathematical approaches and discussion. These pathways are designed so that you can jump to the appropriate steps when needed but they may take more time to complete.
This topic will focus on explaining basic statistical concepts and terminology that are often used within data reporting. The aim of this topic is to help you understand what this terminology means in context as well as how to apply this within your own work.
By the end of this topic, you will be able to:
- Understand core data terminology
- Identify and understand key opportunities to use data within your role
- discuss data terminologies within your own context
How to use this topic page
This topic page is split up into different sections. Each section has a step and an activity to complete. These include scenarios and links off to instructions to try elements for yourself. Each topic also has a reflective section to think about how this will be used within your own practice.
Step 1: Statistical Terminology & Approaches
There are several key terminologies that can help you when understanding or discussing data analysis. These are also useful elements to understand to see how they can be applied within aspects of your role. Some of these terminologies you may be aware of, although they are sometimes referred to slightly differently.
Mean (Average)
The mean is just the average of a group of numbers. You add up all the numbers and then divide by how many numbers there are. It’s a way to find the central value.
Example: An academic would like to calculate the average grade of students in a university course to understand overall performance. If the grades are 70, 75, 80, 85, and 90, the mean is calculated as follows:
- Sum of grades: 70 + 75 + 80 + 85 + 90 = 400
- Number of grades: 5
- Mean = Sum of grades/ number of grades or 400 / 5 = 80
How to achieve in Excel: Use the =AVERAGE(range) function to calculate the mean. For more details, see the Excel guide on calculating the mean.
Median
The median is the middle number in a list of numbers when you arrange them from smallest to largest. If there are an odd number of values, it’s the one right in the center. If there are an even number of values, it’s the average of the two middle numbers
Example: Determine the median research grant amount awarded to faculty members to understand the typical funding level. If the grant amounts are £10,000, £15,000, £20,000, £25,000, and £30,000:
- Median: £20,000 (middle value)
How to achieve in Excel: Use the =MEDIAN(range) function. For more details, see the Excel guide on calculating the median.
Mode
The mode is the number that appears the most often in a group of numbers. It’s the value you see the most frequently.
Example: An administrator is wanting to Identify the most common student enrollment status in a university course (e.g., full-time or part-time).
- Enrollment statuses: Full-time, Part-time, Full-time, Full-time, Part-time
- Mode: Full-time (most frequent status)
How to achieve in Excel: Use the =MODE.SNGL(range) function. For more details, see the Excel guide on calculating the mode.
Activity
Try it yourself
You have been given the satisfaction scores for a class of students. Your task is to analyse this data to understand trends and variability in student satisfaction scores (1 – Low satisfaction to 100 – High Satisfaction)
Here are the satisfaction scores for the students in the class:
Student Name | Satisfaction Score |
---|---|
Alice | 85 |
Bob | 90 |
Charlie | 78 |
David | 88 |
Eve | 78 |
Using the scores provided to :
- Calculate the mean (average) score.
- Determine the mode of the test scores.
- Determine the median of the test scores.
- Calculating the Average Score
The average score (mean) is found by adding all the scores together and then dividing by the number of students.
Steps:
- Add the scores: (85 + 90 + 78 + 88 + 78 = 419)
- Divide by the number of students: (419 \div 5 = 83.8)
So, the average score is 83.8.
- Finding the Most Common Score
The mode is the score that appears most frequently.
Steps:
- List the scores: 85, 90, 78, 88, 78
- Identify the score that appears most often: 78 appears twice, while the others appear only once.
So, the most common score is 78.
- Identifying the median score
The median is the middle value when the scores are arranged in ascending order.
Steps:
- Arrange the scores in ascending order: 78, 78, 85, 88, 90
- Find the middle score: The third score in this list is 85.
So, the middle score is 85.
Can you think of anywhere within your role where you would be able to apply these approaches?
Step 2: Standard Deviation & Correlation
Standard Deviation
Think of standard deviation as a way to see how spread out the numbers are in a group. If the standard deviation is small, most numbers are close to the average. If it’s large, the numbers are more spread out. It tells you how much the numbers differ from the average, but not the exact numbers themselves.
Example: If you have a set of scores with a mean of 80 and a standard deviation of 5, most of the scores will fall within 5 points of the mean (i.e., between 75 and 85). If the standard deviation were 10, the scores would be more spread out, typically falling within 10 points of the mean (i.e., between 70 and 90).
A lower standard deviation highlights that the scores are close to the mean, suggesting consistency in student performance.
A higher standard deviation indicates more variability in the scores, suggesting differing levels of understanding among students.
How to achieve in Excel: Use the =STDEV.P(range) function. For more details, see the Excel guide on calculating the standard deviation.
Correlation
Correlation tells us how two things are related. It shows if they move together in a certain way. The Correlation Coefficient is a number that describes how strong and in what direction this relationship is. If the number is close to 1 or -1, the relationship is strong. If it’s close to 0, the relationship is weak. A positive number indicates a positive correlation, and a negative number indicates a negative correlation. This helps us identify and understand relationships between two strands of data.
Example: A member of the library would like to analyse the correlation between the number of hours spent in the library and final exam scores to understand if more study hours result in higher scores.
A positive correlation (closer to a score of 1) means both variables increase together, indicating that more library hours are associated with higher exam scores.
A negative correlation (closer to a score of -1) means one variable increases while the other decreases, indicating that more library hours are associated with lower exam scores.
A correlation close to 0 means there is little or no relationship between library hours and exam scores.
An example of how this looks visually and overview of what the scores mean in more detail can be found via the Maths and Stats Help (MASH) website:
How to achieve in Excel: Use the =CORREL(1st set of numbers , 2nd set of Numbers) function. For more details, see the Excel guide on calculating correlation.
Activity
Try it yourself
You have the test scores for a class of students and their study hours. Your task is to analyse this data to understand the relationship between study hours and test scores, and to measure the variability in test scores.
Here is the data for the students:
Student Name | Test Scores | Study hours |
---|---|---|
Alice | 85 | 10 |
Bob | 90 | 12 |
Charlie | 78 | 8 |
David | 88 | 11 |
Eve | 78 | 7 |
Using this data:
- Calculate the correlation between test scores and study hours.
- Calculate the standard deviation of the test scores.
- The answer for calculating the correlation should be 0.98. This score is close to 1 which means that there is a strong relationship between the two sets of data.
- The answer for the standard deviation of test scores is 4.99 (rounded up to 5). So this means that most scores will fall within 5 points from the average score. Although there are always scores which fall outside this range.
Can you think of any elements within your role where you would be able to apply these elements?
Step 3: Variables
In higher education research, understanding the distinction between independent, dependent, and confounding variables is crucial for accurate data analysis. A variable is something that is measured when looking at data, so for example it could be student grades, age, part time/full time etc. However, there are three main types of variables to consider.
Something that is manipulated or controlled to see its impact on another variable. An independent variable is something you change or control to observe its effect on another factor. For example, if you introduce a new teaching method (independent variable), you might want to see how it affects student grades (dependent variable).
The dependent variable is what you measure to see the impact of the independent variable. In this case, it would be the students’ grades after the new teaching method is applied.
Confounding variables are other factors that can affect both the independent and dependent variables, potentially skewing the results. For instance, students’ prior knowledge or motivation levels could influence both their participation in the new teaching method and their grades.
By identifying and controlling for confounding variables, researchers can ensure their findings more accurately reflect the true relationship between the independent and dependent variables in an educational study.
Activity
Case study
Below we have an example of a case study. Read through it and see if you can identify the three different types of variables: independent, dependent and confounding.
Examining the Impact of Peer Tutoring on Student Performance in Higher Education
A university implemented a peer tutoring programme to enhance student learning and academic performance. The programme paired struggling students with high-achieving peers who assisted them with coursework and exam preparation. The academic aim was to evaluate if the peer tutoring programme led to improved student grades while considering other influencing factors like prior academic performance and attendance rates. However, there may be many confounding variables which will need to be considered.
- Can you identify the independent and dependent,variables from the case study?
- Can you think of any confounding variables that may be involved?
The independent variable is the peer tutoring program since this is the change that has been implemented.
The dependent variable is the student grades as this is what the academic wanted to see a change in.
The academic needed to be aware of wider confounding variables that could have an impact on results such as prior academic performance and attendance rates for example. Although there may be wider confounding variables to consider.
Can you think of any more?
Step 4: Reflection
What have I discovered from this learning topic?
This step is designed to help you think about what you have learned and how this applies to your own practice and context. This steps activity will ask you some questions to help you with this reflection.
Activity
Reflect
Use the following questions to help you think about your own practice.
- Can you think of any data which you use regular which this learning can be applied to?
- Is there any data that you collect that you need to consider the three different types of variables for?
Other Advanced Data Learning Pathways
Consideration for Data Analysis
This topic will focus on the importance of clear questions and suggestable outpu
Common Data Analysis Approaches
This topic will focus on the four key approaches to data analysis, although ther
Storytelling & Reporting Data
This particular pathway is more in-depth and can take longer to complete as it w