5 months ago

1.使用经典数据集:泰坦尼克号乘客生存预测

titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")

2.实现决策树算法

from sklearn.tree import DecisionTreeClassifier
def decisioncls(titan):
    """
    决策树进行乘客生存预测
    :return:
    """
    # 1、获取数据
    #titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")

    # 2、数据的处理
    x = titan[['pclass', 'age', 'sex']]

    y = titan['survived']

    # print(x , y)
    # 缺失值需要处理,将特征当中有类别的这些特征进行字典特征抽取
    x['age'].fillna(x['age'].mean(), inplace=True)

    # 对于x转换成字典数据x.to_dict(orient="records")
    # [{"pclass": "1st", "age": 29.00, "sex": "female"}, {}]

    dict = DictVectorizer(sparse=False)

    x = dict.fit_transform(x.to_dict(orient="records"))
    
    print(dict.get_feature_names())
    print(x)

    # 分割训练集合测试集
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

    # 进行决策树的建立和预测
    dc = DecisionTreeClassifier(max_depth=5)

    dc.fit(x_train, y_train)

    # print("预测的准确率为:", dc.score(x_test, y_test))
    # export_graphviz(dc, out_file="./tree.dot", feature_names=['age', 'pclass=1st', 'pclass=2nd', 'pclass=3rd', '女性', '男性'])

    return dc.score(x_test, y_test)

3.多次迭代求取平均值

import matplotlib.pyplot as plt
def score_avg(func,num):
    score_sum=0
    a=[]
    b=[]
    for i in range(1,num+1):
        score_sum += func(titan)
        # 每10次查看一下准确率
        #if i%10==0:
        score_ave = score_sum/(i)
        a.append(i)
        b.append(score_ave)
    # 获取最终的准确率
    print(score_ave)
    # 将准确率变化过程展示出来
    plt.plot(a,b)
    plt.grid()
    plt.show()

4.运行结果

from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_extraction import DictVectorizer
num=100
score_avg(decisioncls,num)


从第10次运算,准确率已经收敛到0.820

5.将决策树图形导出

1)sklearn.tree.export_graphviz() 该函数能够导出DOT格式
tree.export_graphviz(estimator, out_file='tree.dot’,feature_names=[ ' ', ' ' ])
在单次运行程序中加入以下代码

export_graphviz(dc, out_file="./tree.dot", feature_names=['age', 'pclass=1st', 'pclass=2nd', 'pclass=3rd', '女性', '男性'])

2)工具graphviz: 能够将dot文件转换为pdf、png
ubuntu: sudo apt-get install graphviz
Mac: brew install graphviz
3)运行命令

dot -Tpng tree.dot -o tree.png

← 【机器学习】20newsgroup数据集KNN与贝叶斯算法比较 【机器学习】集成算法之随机森林 →