作者:mobiledu2502906891 | 来源:互联网 | 2022-12-13 15:42
此算法需要更改的地方出除了上篇写到的loadDataSet函数,在课本中getBestFeat()函数中信息增益计算公式处给出的矩阵相除在py3无法运行需要改为dot(A,B.T)形式具
此算法需要更改的地方出除了上篇写到的loadDataSet函数,在课本中getBestFeat()函数中信息增益计算公式处给出的矩阵相除在py3无法运行需要改为dot(A,B.T)形式
具体代码
def getBestFeat(self,dataSet):
Num_Feats=len(dataSet[0][:-1])
totality=len(dataSet)
BaseEntropy=self.computeEntropy(dataSet)
COnditionEntropy=[]
splitInfo=[]
allFeatVList=[]
for f in range(Num_Feats):
featList=[example[f] for example in dataSet]
[splitI,featureValueList]=self.computeSplitInfo(featList)
allFeatVList.append(featureValueList)
splitInfo.append(splitI)
resultGain=0.0
for value in featureValueList:
subSet=self.splitDataSet(dataSet,f,value)
appearNum=float(len(subSet))
subEntropy=self.computeEntropy(subSet)
resultGain+=(appearNum/totality)*subEntropy
ConditionEntropy.append(resultGain)
infoGainArray=BaseEntropy*ones(Num_Feats)-array(ConditionEntropy)
# infoGainRatio=infoGainArray/array(splitInfo)#py2可以这样做但是py3不行
infoGainRatio=dot(infoGainArray,array(splitInfo).T)#py3这种用法更贴近线性代数中矩阵除法形式
bestFeatureIndex=argsort(-infoGainRatio)[0]
return bestFeatureIndex,allFeatVList[bestFeatureIndex]