栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

T.test

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

T.test

Data Science Day 20:

When we are watching Soccer games, at the beginning of the match, the screen will show the basic info for each team. Suppose we want to know is there any difference between the average age between Real Madrid and Barcelona pl****ayers, What statistical test should we use?

RonnyK / Pixabay

kappilrinesh / Pixabay[/caption]

Answer:

We can use T-test to determine whether there is a significant difference between the means of two groups.

T-test assumptions:

  • The dependent variable is Normally distributed
    Note, identify the probability of a particular outcome
  • Independent observations
  • The dependent variable is Continuous.
  • No outliers

Example: Kaggle FIFA 2018 dataset

Null Hypothesis H0: There is NO significant difference between the age of  Real Madrid and Barcelona’s players.

  1. We choose the variable Age and Club (Real Madrid, Barcelona).

import packages
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import statistics as st
import seaborn as sns

data1= data[["club","age"]]
data2=data1.loc[data1["club"].isin(["Real Madrid CF", "FC Barcelona"])
  1. **Histogram Graph for Age **

data3=data1.loc[data1["club"].isin(["Real Madrid CF"])]
data4=data1.loc[data1["club"].isin(["FC Barcelona"])]

plt.hist(data3.age, bins="auto", color="c" ,edgecolor="k",alpha=0.5)
plt.hist(data4.age, bins="auto", color="r", alpha=0.5)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution in Barcelona vs MFC')

plt.show()

3**. Density Plot of Age**

#kde plot
df=pd.Dataframe({"mfc": data3.age, "barcelona":data4.age,})
ax=df.plot.kde()
plt.title("Density Plot for Players' Age in Barcelona vs MFC")
plt.show()

** 4. Statistical T-test **
stats.ttest_ind(data3.age,data4.age, equal_var=False)
Ttest_indResult(statistic=-1.9061510499479299, pvalue=0.062416380021536121)

Conclusion:

Although the Histogram graph does not show a normal distribution, the Density Plot represents some feature of the Normality for Age Distribution. Since the P-value= 0.06, we will Accept the Null Hypothesis: 
There is No significant difference in players age between Real Madrid and Barcelona.

Additional Info:

We used Non-direction (two sided) Ttest to generate the results,  but one question we can ask ourselves is how sure are we about the results?

  1. Type 1 error, Reject a null hypothesis that is True
    Predict there is a difference while in reality there’s no.
    p=0.05,  there is  a 5% chance we are making type 1 error
  2. Type 2 error, Accept a null hypothesis that is false
    Predict there  is no difference when the reality has one

In the previous example, we have a 2-level independent variable Club (Barcelona, Real Madrid), and one dependent variable age.

What if we have an independent variable more than 2 levels?
AC Milan, Barcelona, and Real Madrid ?

That will be ANOVA’s show!

Happy Studying! 

转载请注明:文章转载自 www.mshxw.com
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号