Free course

Foundations: Maths Without Fear

The maths you need to start data science - numbers, algebra, and statistics - explained clearly. No jargon, no rush. Do these lessons before you move on to the serious stuff.

Lesson 1 - Numbers, Variables and Functions: The Language of Data

What This Lesson Is About

Before any machine learning model, any algorithm, any dataset, there are numbers. And before numbers make sense, you need to understand how mathematicians talk about them. This lesson is not about memorising formulas. It is about learning to read and speak the language that underpins everything in data science. Think of it as learning the alphabet before writing sentences.

1.1 - Types of Numbers

In data science, not all numbers behave the same way, and knowing the difference matters when you clean data, build models, or interpret results.

Natural numbers are the counting numbers: 0, 1, 2, 3, 4… When you count rows in a dataset, you are using natural numbers.
Integers extend natural numbers into the negatives: …, -3, -2, -1, 0, 1, 2, 3… Temperature differences, profit and loss figures, and elevation data are often integers.
Real numbers include everything in between: 3.14, -0.001, 2.71828… Most measurements in data science (salaries, probabilities, model weights) are real numbers.
Rational vs irrational numbers: a rational number can be written as a fraction (1/2, 3/4). An irrational number cannot (pi, the square root of 2). In practice, computers approximate irrational numbers, which is why you sometimes see rounding errors in code.

1.2 - Variables: Giving Names to the Unknown

A variable is simply a placeholder for a value. In algebra, you write x or y. In data science, you write feature, weight, or loss. The concept is identical.

When you see: y = 2x + 3, it means "the output y depends on the input x in this specific way." If x = 5, then y = 13. If x = 10, then y = 23.

In data science, you will constantly write things like: predicted_salary = 500 × years_experience + 20000. This is exactly the same structure, just with meaningful names.

1.3 - Functions: Rules That Transform Inputs Into Outputs

A function is a rule. You give it an input, it gives you exactly one output. Nothing more, nothing less.

Written formally: f(x) = x² means "square whatever you put in." So f(3) = 9, f(-2) = 4, f(0) = 0.

Why functions matter in data science: every machine learning model is a function. You feed it data (input), it returns a prediction (output). Understanding functions means understanding models at their core.

Key function properties to know:

A function is increasing when a bigger input gives a bigger output.
A function is decreasing when a bigger input gives a smaller output.
A function has a minimum or maximum when it stops going down or up and turns around. Finding these turning points is literally what model training does.

1.4 - Graphs: Seeing Functions Visually

A graph is just a picture of a function. The horizontal axis (x-axis) represents the input. The vertical axis (y-axis) represents the output. Every point on the curve is a pair (input, output).

When you plot model performance over training time, you are drawing a function. When you visualise a cost curve, you are reading a graph. Learning to interpret graphs is as important as reading the formula.

Exercises - Lesson 1

Exercise 1.1 Classify each of the following as natural, integer, or real number: -7, 3.14, 0, 1000, -0.5, 42.
Exercise 1.2 Given f(x) = 3x - 5, calculate: f(0), f(2), f(-1), f(10).
Exercise 1.3 You are told that a data science model predicts house prices using the function: price = 300 × size + 50000, where size is in square metres. What is the predicted price for an 80 sq m flat? For a 120 sq m flat? What does the number 300 represent intuitively?
Exercise 1.4 Sketch (by hand or describe) what the function f(x) = x² looks like. Is it increasing everywhere? Where is its minimum?
Exercise 1.5 (Challenge) You are given two functions: f(x) = 2x + 1 and g(x) = x². Calculate f(g(3)) and g(f(3)). Are the results the same? What does this tell you about the order in which you apply functions?

Lesson 2 - Algebra Refresher: Rearranging, Solving, Thinking Logically

What This Lesson Is About

Algebra is the skill of rearranging and solving. In data science, you will constantly manipulate equations: deriving update rules for neural networks, solving for unknown parameters, or simplifying cost functions. This lesson rebuilds your algebraic instincts without pain.

2.1 - The Golden Rule of Algebra

Whatever you do to one side of an equation, you must do to the other. This single rule is the foundation of all algebraic manipulation.

If x + 5 = 12, then x = 12 - 5 = 7. You subtracted 5 from both sides.
If 3x = 18, then x = 18 ÷ 3 = 6. You divided both sides by 3.
If x ÷ 4 = 9, then x = 9 × 4 = 36. You multiplied both sides by 4.

2.2 - Working With Multiple Variables

Most real equations have more than one variable. You do not always need to solve for a single number. Sometimes you rearrange to express one variable in terms of others.

Example: the equation y = mx + b (a straight line). If you know y, m, and b, you can solve for x: x = (y - b) / m.

In machine learning, you will encounter equations with dozens of variables. The algebra is the same, just larger.

2.3 - Inequalities

An inequality tells you not the exact value but the range of values. Common symbols: < (strictly less than), > (strictly greater than), ≤ (less than or equal to), ≥ (greater than or equal to).

Example: if a model's accuracy must be at least 80%, you write: accuracy ≥ 0.80.

Solving inequalities works like equations, with one important exception: if you multiply or divide by a negative number, the direction of the inequality flips.

2.4 - Summation Notation

In data science, you constantly add up many values. Instead of writing x₁ + x₂ + x₃ + … + x₁₀₀, mathematicians use the sigma symbol (Σ) with a subscript and superscript to write the same thing compactly. The bottom of the sigma tells you where to start, the top tells you where to stop, and the expression to the right tells you what to add each time.

When you see this notation in a loss function or a statistical formula, do not panic. It simply means "add all of these up."

Exercises - Lesson 2

Exercise 2.1 Solve for x: (a) 2x + 7 = 19, (b) 5x - 3 = 2x + 9, (c) x/3 + 4 = 10.
Exercise 2.2 Rearrange the formula E = mc² to express m in terms of E and c.
Exercise 2.3 A data scientist writes: loss = (predicted - actual)². If the actual value is 50 and the loss is 25, what are the possible predicted values?
Exercise 2.4 A model must have a precision above 0.75 and a recall above 0.60. Write these two conditions as inequalities. If precision = 0.80 and recall = 0.55, does the model meet both criteria?
Exercise 2.5 (Challenge) You have 5 data points with values 3, 7, 2, 9, and 5. Write out the sum using summation notation, then calculate the result. Now calculate the mean (average) by dividing this sum by 5.

Lesson 3 - Statistics for Humans: Understanding Data Before Modelling It

What This Lesson Is About

Before you build any model, you must understand your data. Statistics is the science of summarising and interpreting collections of numbers. This lesson covers the intuitive foundations that every data scientist uses daily, often without realising they are doing statistics.

3.1 - Descriptive Statistics: Summarising a Dataset

Mean (average): add all values, divide by the count. The mean tells you the "centre of gravity" of your data. It is sensitive to extreme values (outliers).
Median: the middle value when data is sorted. More robust than the mean when outliers exist. If a dataset has an even number of values, the median is the average of the two middle ones.
Mode: the most frequently occurring value. Useful for categorical data (e.g. the most common job title in a dataset).

When to use which: if a tech company reports the average salary as £80,000 but the CEO earns £2,000,000 and the 50 employees earn £35,000 each, the mean is misleading. The median tells the real story.

3.2 - Spread: How Spread Out Is Your Data?

Range: maximum minus minimum. Simple, but fragile (very sensitive to a single extreme value).
Variance: measures how far, on average, each value is from the mean. Specifically, it is the average of the squared differences from the mean. Squaring ensures negative and positive differences do not cancel each other out.
Standard deviation: the square root of the variance. It brings the measurement back to the same units as the original data. A small standard deviation means the data is clustered tightly around the mean. A large one means it is spread widely.

3.3 - Distributions: The Shape of Data

A distribution describes how often each value appears in your data. When you plot a histogram, you are visualising the distribution.

The normal distribution (bell curve) is the most famous. It is symmetric around the mean, with most values clustered near the centre and fewer values at the extremes. Many natural phenomena follow this shape (heights, measurement errors, test scores).

Skewed distributions: if the tail stretches to the right, the distribution is right-skewed (income data is typically right-skewed). If it stretches to the left, it is left-skewed.

Understanding distribution shapes is critical before applying any statistical or machine learning technique, because many models assume your data is normally distributed.

Exercises - Lesson 3

Exercise 3.1 You have the following salaries in a small company (in £000s): 28, 32, 35, 40, 42, 45, 200. Calculate the mean, median, and mode. Which measure best represents the "typical" salary? Why?
Exercise 3.2 Two machine learning models have the following accuracy scores across 5 experiments. Model A: 0.80, 0.81, 0.79, 0.80, 0.80. Model B: 0.70, 0.90, 0.85, 0.75, 0.80. Both have the same mean. Which model would you trust more and why? Calculate the standard deviation of each to support your answer.
Exercise 3.3 Describe in plain English what it means for a dataset to be normally distributed. Name two real-world datasets you believe might follow a roughly normal distribution, and two that definitely would not.
Exercise 3.4 (Challenge) A dataset has a mean of 50 and a standard deviation of 10. In a normal distribution, approximately 68% of values fall within one standard deviation of the mean, and 95% within two. What range of values would cover 95% of this dataset? If a new data point arrives with value 85, should you be surprised? Why?

Luxley Digital College - Ready for the next step? Our Data Science & AI programme builds on these foundations with real projects and expert support.

Explore Data Science & AI →