Why Statistics for Data Science?
Statistics is the mathematical foundation of data science. Every model, every A/B test, and every business insight relies on statistical reasoning. Without it, you're guessing — not analysing.
Measures of Central Tendency
- Mean — arithmetic average, sensitive to outliers
- Median — middle value, robust to outliers
- Mode — most frequent value, useful for categorical data
Measures of Spread
Standard deviation and variance quantify how spread out your data is. A low standard deviation means values cluster near the mean; a high one means they are widely dispersed.
Always report both a central tendency measure and a spread measure. Saying 'average salary is £50k' is incomplete without knowing the standard deviation.
Key Takeaway
Descriptive statistics summarise your data. Always pair central tendency (mean/median) with spread (std/IQR) for a complete picture.