Skip to content
Sahithyan's S3
1
Sahithyan's S3 — Applied Statistics

Two-Sample Mean Test

Suppose two independent random samples of size n1n_1 and n2n_2 respectively are drawn from two populations with means μ1\mu_1 and μ2\mu_2 and variance σ1\sigma_1 and σ2\sigma_2 respectively.

H0: μ1μ2=d0H_0:\ \mu_1 - \mu_2 = d_0 H1: μ1μ2<d0  or  μ1μ2d0  or  μ1μ2>d0H_1:\ \mu_1 - \mu_2 < d_0\ \text{ or }\ \mu_1 - \mu_2 \neq d_0\ \text{ or }\ \mu_1 - \mu_2 > d_0

The appropriate test statistics depends on σ1\sigma_1 and σ2\sigma_2 are known or unknown.

  • Independence: samples are independent within and across groups
  • Distribution: each population is normal, or samples are large (CLT)
  • Known vs unknown variances: choose Z or t accordingly
  • Equal-variance assumption only for pooled t; otherwise use Welch t

Both σ1\sigma_1 and σ2\sigma_2 are known.

Z  =  (Xˉ1Xˉ2)d0σ12n1+σ22n2    N(0,1) under H0Z \;=\; \frac{(\bar X_1 - \bar X_2) - d_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \;\sim\; N(0,1)\ \text{under } H_0

Decision

  • Two-tailed: reject H0H_0 if Zz1α/2|Z|\ge z_{1-\alpha/2}
  • Right-tailed (>)(>): reject if Zz1αZ\ge z_{1-\alpha}
  • Left-tailed (<)(<): reject if Zz1αZ\le -z_{1-\alpha}

Confidence interval for d0d_0:

(Xˉ1Xˉ2)±z1α/2σ12n1+σ22n2(\bar X_1 - \bar X_2) \pm z_{1-\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

Both σ1\sigma_1 and σ2\sigma_2 are unknown.

Variances are equal and unknown. Pooled t statistic is used.

Pooled variance sps_p is defined as:

sp2  =  (n11)s12+(n21)s22νs_p^2 \;=\; \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{\nu}

Here: ν=n1+n22\nu = n_1 + n_2 - 2.

The test statistic for this use case is:

t  =  (Xˉ1Xˉ2)d0sp1n1+1n2    tνunder H0t \;=\; \frac{(\bar X_1 - \bar X_2) - d_0}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \;\sim\; t_{\nu} \text{under } H_0

Decision

  • Two-tailed: reject if ttv,,1α/2|t|\ge t_{v,,1-\alpha/2}
  • Right-tailed: reject if ttv,,1αt\ge t_{v,,1-\alpha}
  • Left-tailed: reject if ttv,,1αt\le -t_{v,,1-\alpha}

Confidence interval for d0d_0:

(Xˉ1Xˉ2)±tv,1α/2sp1n1+1n2(\bar X_1 - \bar X_2) \pm t_{v,\,1-\alpha/2}\, s_p \sqrt{\tfrac{1}{n_1} + \tfrac{1}{n_2}}

Unknown and unequal variances. Welch t statistic is used.

Approximate degrees of freedom is:

ν  =  (s12n1+s22n2)2(s12/n1)2n11+(s22/n2)2n21\nu \;=\; \frac{\big(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\big)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

If ν>30\nu\gt 30, z-statistic can be used. Otherwise the test statistic is:

t  =  (Xˉ1Xˉ2)d0s12n1+s22n2tv under H0t \;=\; \frac{(\bar X_1 - \bar X_2) - d_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \sim t_v\ \text{under } H_0

Confidence interval for d0d_0:

(Xˉ1Xˉ2)±tv,1α/2s12n1+s22n2(\bar X_1 - \bar X_2) \pm t_{v,\,1-\alpha/2}\, \sqrt{\tfrac{s_1^2}{n_1} + \tfrac{s_2^2}{n_2}}

For any computed statistic TT:

• Two-tailed H1:μ1μ2d0H_1:\mu_1-\mu_2\neq d_0

p-value=2min{Pr(Ttobs), Pr(Ttobs)}p\text{-value} = 2\,\min\{\Pr(T \le t_{\text{obs}}),\ \Pr(T \ge t_{\text{obs}})\}

• Right-tailed H1:μ1μ2>d0H_1:\mu_1-\mu_2> d_0

p-value=Pr(Ttobs)p\text{-value} = \Pr(T \ge t_{\text{obs}})

• Left-tailed H1:μ1μ2<d0H_1:\mu_1-\mu_2< d_0

p-value=Pr(Ttobs)p\text{-value} = \Pr(T \le t_{\text{obs}})

Decision: reject H0H_0 if p-valueαp\text{-value}\le \alpha.