图解机器学习：神经网络和TensorFlow的文本分类

#Get all wordsfor word in text.split(' '):

vocab[word]+=1

#Convert words to indexes

def get_word_2_index(vocab):

word2index = {} for i,word in enumerate(vocab):

word2index[word] = i

return word2index

#Now we have an index

word2index = get_word_2_index(vocab)

total_words = len(vocab)

#This is how we create a numpy array (our matrix)

matrix = np.zeros((total_words),dtype=float)

#Now we fill the valuesfor word in text.split():

matrix[word2index[word]] += 1print(matrix)

>>> [ 1. 1. 1.]

膳绫擎例子中的文本是‘Hi from Brazil’，矩阵是 [ 1. 1. 1.]。如不雅文本仅是‘Hi’会怎么样?

提示：修改我们定义的值，以查看更改若何影响练习时光和模型精度。

matrix = np.zeros((total_words),dtype=float) 
text = "Hi"for word in text.split(): 
    matrix[word2index[word.lower()]] += 1print(matrix) 
>>> [ 1.  0.  0.]

将会与标签(文本的分类)雷同，然则如今得应用独热编码(one-hot encoding)：

y = np.zeros((3),dtype=float)if category == 0: 
    y[0] = 1.        # [ 1.  0.  0.] 
elif category == 1: 
    y[1] = 1.        # [ 0.  1.  0.]else: 
     y[2] = 1.       # [ 0.  0.  1.]

运行图并获取结不雅

如今进入最出色的部分：大年夜模型中获取结不雅。先细心看看输入的数据集。

对于一个有 18.000 个帖子大年夜约有 20 个主题的数据集，将会应用到 20个消息组。要加载这些数据集将会用到 scikit-learn 库。我们只应用 3 种类别：comp.graphics, sci.space 和 rec.sport.baseball。scikit-learn 有两个子集：一个用于练习，另一个用于测试。建议不要查看测试数据，因为这可能会在创建模型耆?扰你的选择。你不会欲望创建一个模型来猜测这个特定的测试数据，因为你欲望创建一个具有很好的泛化机能的模型。

这里是若何加载数据集的代码：

from sklearn.datasets import fetch_20newsgroups 
categories = ["comp.graphics","sci.space","rec.sport.baseball"] 
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories) 	
			 5/7   首页 上一页 3 4 5 6 7 下一页 尾页	
			

　　推荐阅读
　　iOS AFNetworking框架HTTPS请求配置
            
            【引自IamOkay的博客】 iOS在Apple公司的强迫请求下，数据传输必须按照ATS(App Transefer  Security)条目。关于AFNetworking框架传输HTTPS数据。一.AllowsArbitraryLoads 白名单机制NSAll>>>详细阅读


本文标题：图解机器学习：神经网络和TensorFlow的文本分类
地址：http://www.17bianji.com/lsqh/34909.html
 1/2    1