Statistical Glossary

Comprehensive definitions of statistical terms, concepts, and methods.

Basic Concepts

Population

The entire group of individuals, objects, or measurements of interest for a statistical study.

Examples: All registered voters in a country, Every star in a galaxy

Sample

Parameter

Sample

A subset of the population selected for study.

Examples: 1000 randomly selected voters, 100 stars observed through a telescope

Population

Random Sampling

Variable

A characteristic or attribute that can be measured or categorized.

Examples: Height, Age, Color, Income

Quantitative Variable

Qualitative Variable

Data Types and Measurement Scales

Nominal Scale

Categorical data without any inherent order.

Examples: Gender, Blood type

Ordinal Scale

Qualitative Variable

Ordinal Scale

Categorical data with a meaningful order but no fixed intervals.

Examples: Survey responses (e.g., poor, fair, good), Rankings in a race

Nominal Scale

Quantitative Variable

Interval Scale

Numeric data with meaningful intervals but no true zero.

Examples: Temperature in Celsius, Calendar years

Ratio Scale

Quantitative Variable

Ratio Scale

Numeric data with meaningful intervals and a true zero.

Examples: Height, Weight, Income

Interval Scale

Quantitative Variable

Descriptive Statistics

Mean

The arithmetic average of a set of numbers, calculated by summing all values and dividing by the count of values.

Formula: x̄ = (Σx) / n

Examples: Average test score, Mean household income

Median

Mode

Median

The middle value when data is arranged in order.

Examples: Median home price, Median age

Mean

Quartile

Standard Deviation

A measure of variability that indicates the average distance between data points and their mean.

Formula: s = √[Σ(x - x̄)² / (n-1)]

Variance

Mean

Data Collection Methods

Survey

Method of collecting data from a sample to infer about the population.

Examples: Questionnaires, Phone interviews

Sampling

Population

Experiment

A controlled study where variables are manipulated to determine effects.

Examples: Clinical trials, Lab-based studies on reaction time

Observational Study

Randomized Controlled Trial

Observational Study

A study where data is collected without intervention, observing naturally occurring variables.

Examples: Cohort study, Case-control study

Experiment

Survey

Probability

A numerical measure of the likelihood of an event occurring.

Examples: P(heads) = 0.5 for a fair coin, Probability of rain tomorrow

Conditional Probability

Independence

Two events are independent if the occurrence of one does not affect the probability of the other.

Examples: Consecutive coin flips, Drawing cards with replacement

Conditional Probability

Joint Probability

Bayes' Theorem

A formula for updating probabilities based on new information.

Formula: P(A|B) = [P(B|A) * P(A)] / P(B)

Examples: Medical testing, Spam email detection

Conditional Probability

Independence

Central Limit Theorem

The theorem that the distribution of sample means approximates a normal distribution as the sample size increases.

Normal Distribution

Law of Large Numbers

Distributions

Normal Distribution

Also: Gaussian distribution, bell curve

A symmetric, bell-shaped distribution characterized by its mean and standard deviation.

Standard Normal Distribution

Central Limit Theorem

Skewness

A measure of the asymmetry of a probability distribution.

Examples: Right-skewed: income distribution, Left-skewed: exam scores in an easy test

Kurtosis

Normal Distribution

Inferential Statistics

Hypothesis Testing

A statistical method for making decisions using data, involving null and alternative hypotheses.

P-value

Significance Level

Type I Error

Confidence Interval

A range of values that likely contains the true population parameter with a specified level of confidence.

Formula: CI = point estimate ± (critical value × standard error)

Margin of Error

Standard Error

Statistical Tests

t-test

A statistical test used to determine if there is a significant difference between the means of two groups.

Types: One-sample t-test, Independent samples t-test, Paired t-test

P-value

Degrees of Freedom

ANOVA

Analysis of Variance - a statistical test used to analyze differences among means of three or more groups.

F-test

Multiple Comparisons

Post-hoc Tests

Chi-square Test

A test for relationships between categorical variables, often used in contingency tables.

Categorical Data

Hypothesis Testing

Errors and Power in Hypothesis Testing

Type I Error

A false positive, rejecting the null hypothesis when it is actually true.

Examples: Declaring a new drug effective when it is not

Significance Level

P-value

Type II Error

A false negative, failing to reject the null hypothesis when it is actually false.

Examples: Failing to detect a real effect of a drug

Statistical Power

Hypothesis Testing

Statistical Power

The probability of correctly rejecting the null hypothesis (1 - Type II error probability).

Examples: High power in drug trials reduces the chance of missing real effects

Type II Error

Sample Size

Regression and Correlation

Correlation

A measure of the strength and direction of the relationship between two variables.

Examples: Height and weight, Income and education level

Covariance

Regression Analysis

Simple Linear Regression

A method to predict a response variable based on one predictor variable.

Formula: y = a + bx

Correlation

Multiple Regression

A regression model with multiple predictor variables to predict an outcome.

Examples: Predicting house prices using area, number of rooms, etc.

Simple Linear Regression

R-squared

Multivariate Analysis

Principal Component Analysis (PCA)

A technique for reducing dimensionality of large data sets while retaining most variation.

Examples: Image compression, Genomics data analysis

Eigenvalues

Factor Analysis

Cluster Analysis

A method for grouping data points into clusters based on similarities.

Examples: Market segmentation, Species classification

K-means

Hierarchical Clustering

Effect Size and Practical Significance

Effect Size

A measure of the strength or magnitude of an effect, useful for understanding practical significance.

Examples: Cohen's d = 0.5 indicates a medium effect size

Cohen's d

Practical Significance

Cohen's d

An effect size measure for the difference between two means.

Formula: d = (M1 - M2) / SD

Effect Size

t-test