Understanding tests in statistics, with a working example in Python

3 min readMay 15, 2022

Before reading this article I highly recommend that you please just go through the first part of this where I have explained all the theories and framework steps.

Understanding tests in statistics, everyone should know this

People often confuse statistical significance with a number of samples that is wrong and specify n=<some value> under…

amitb0007.medium.com

I am just reiterating the steps here :

Framework for implementing any statistical test :

Step 1: State H0 and HA i.e. Null hypothesis and Alternate Hypothesis respectivelyStep 2: State confidence level or level of significance (alpha =0.05.0.01 etc) based on the sensitivityStep 3: Select appropriate tests and find test statistics.Step 4: Establish or draw the critical region.Step 5: Decision Making based on calculations. This step can be done by any of the following approaches:1. Critical value approach 
(Reject H0 if |test_statistics| > critical value else fail to reject H0)2. p-value approach
 (Reject H0 if p-value < alpha else fail to reject H0)3. Confidence Interval Approach

Let’s hop on to an example here real quick :

A machine is supposed to run for 300 minutes at a go, as told by a company on one unit of regular gas.
A random sample of 50 machines is tested.
The machine run for an average of 295 minutes, with a standard deviation of 20 minutes.
Check the hypothesis if the mean run-time of a machine is 300 minutes or not.
Use a 0.05 level of significance. What is the region of acceptance?

Solution :

Since we are solving this problem in Python we have to import some statistical libraries for getting started.

import numpy as np
import pandas as pd
import scipy.stats as stats
import math
from scipy.stats import ttest_rel,ttest_ind,ttest_1samp
import statsmodels.api as sm
from   statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from scipy.stats import ttest_1samp,ttest_ind,wilcoxon,levene,bartlett,shapiro,mannwhitneyu

μ means mean, α=0.05 (alpha)

μ=300  (population mean),
n=50   (sample size),
σ =20  (standard deviation),
x_bar=295 (sample mean),
α =0.05   (level of significance)

Step 1: Define null and alternative hypotheses

H0: The mean run-time of a machine is 300 minutes μ=300
H1: The mean run-time of a machine is not 300 minutes μ!=300

Step 2: Decide the significance level

Given α = 0.05

Step 3: Find out the Confidence interval for the significance level

z = stats.norm.isf(0.05)
1.6448536269514729Standard_Error = σ/np.sqrt(n)UB = x_bar+z*Standard_ErrorLB= x_bar-z*Standard_Errorprint('Upper limit is :',UB,'\n','Lower Limit is ',LB)#so the region of acceptance is between 299.65 to 290.34

Step 4: Select appropriate tests and find test statistics.

Now, We do not know the population standard deviation of the population we just know the sample population deviation. T

The sample is a large sample, n > 30.

So you use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic.

𝑡𝑆𝑇𝐴𝑇 =(x_bar-μ)/SEtcritical=stats.t.isf((0.025),n-1)

Calculate the p-value :

pvalue=stats.t.cdf(tsat,n-1)

Inferences :

as we see above p-value is greater than α(0.025) so we fail to reject H∅
t_statistic value is less than t_critical hence fails to reject H∅
from both results, we get to know that our H∅ fails to reject (accept H∅)
Therefore, the mean run-time of a machine is 300 minutes μ=300 (accept H∅)

Hope this helps!
Please let me know if you want a similar python statistics article.

Thanks !

Basics to know before even you start exploratory data analysis (EDA)

Data enthusiasts just love EDA. I am sure people who have gone through lots of data now have their pathways or…

medium.com