chatbot使用_如何使用Python构建Chatbot项目

作者：YI恐龙_554 | 来源：互联网 | 2023-10-10 19:31

chatbot使用聊天机器人对企业组织和客户都非常有帮助。大多数人都喜欢直接从聊天室进行交谈，而不是致电服务中心。Facebook发布的数据证明了机器人的价值。每月在

chatbot使用

聊天机器人对企业组织和客户都非常有帮助。大多数人都喜欢直接从聊天室进行交谈&＃xff0c;而不是致电服务中心。 Facebook发布的数据证明了机器人的价值。每月在人与公司之间发送的消息超过20亿条。 HubSpot研究表明&＃xff0c;有71&＃xff05;的人希望从消息传递应用程序获得客户支持。这是解决问题的快速方法&＃xff0c;因此聊天机器人在组织中拥有光明的前景。

今天&＃xff0c;我们将在Chatbot上构建一个令人兴奋的项目。我们将从头开始实现一个聊天机器人&＃xff0c;该聊天机器人将能够理解用户在说什么并给出适当的响应。

先决条件

为了实现聊天机器人&＃xff0c;我们将使用Keras&＃xff08;一个深度学习库&＃xff09;&＃xff0c;NLTK&＃xff08;一个自然语言处理工具包&＃xff09;和一些有用的库。运行以下命令以确保已安装所有库

pipinstall tensorflow keras pickle nltk

Python备忘单- 免费学习Python的主要指南 。

聊天机器人如何工作&＃xff1f;

聊天机器人不过是一种智能软件&＃xff0c;可以像人类一样与人互动和交流。有趣吗&＃xff1f; 现在让我们了解它们的实际工作原理。所有聊天机器人都属于NLP&＃xff08;自然语言处理&＃xff09;概念。 NLP由两部分组成&＃xff1a;

NLU&＃xff08;自然语言理解&＃xff09;&＃xff1a;机器理解人类语言&＃xff08;如英语&＃xff09;的能力。
NLG&＃xff08;自然语言生成&＃xff09;&＃xff1a;机器生成类似于人类书面句子的文本的能力。

向用户提问&＃xff0c;向聊天机器人成像“嘿&＃xff0c;今天的新闻是什么&＃xff1f; 聊天机器人会将用户句子分为两部分&＃xff1a;意图和实体。该句子的意图可能是get_news&＃xff0c;因为它表示用户想要执行的操作。实体会告知有关意图的具体细节&＃xff0c;因此“今天”将是实体。因此&＃xff0c;通过这种方式&＃xff0c;机器学习模型可用于识别聊天的意图和实体。

项目文件结构

项目完成后&＃xff0c;将剩下所有这些文件。让我们快速浏览它们的每一个&＃xff0c;它将使您对如何实施该项目有所了解。

Train_chatbot.py-在此文件中&＃xff0c;我们将构建和训练深度学习模型&＃xff0c;该模型可以分类和识别用户对机器人的要求。
Gui_Chatbot.py-此文件是我们将在其中建立图形用户界面以与我们训练有素的聊天机器人聊天的地方。
Intents.json-intents文件包含我们将用于训练模型的所有数据。它包含标签的集合及其相应的模式和响应。
Chatbot_model.h5-这是一个分层数据格式文件&＃xff0c;我们在其中存储了经过训练的模型的权重和体系结构。
Classes.pkl-泡菜文件可用于存储所有标签名称&＃xff0c;以便在我们预测消息时进行分类。
Words.pkl- words.pkl泡菜文件包含所有唯一的单词&＃xff0c;这些单词都是我们模型的词汇。

下载源代码和数据集

如何建立自己的聊天机器人&＃xff1f;

我通过5个步骤简化了此聊天机器人的构建&＃xff1a;

步骤1.导入库并加载数据 -创建一个新的python文件&＃xff0c;并将其命名为train_chatbot&＃xff0c;然后我们将导入所有必需的模块。之后&＃xff0c;我们将在python程序中读取JSON数据文件。

import numpy as np from keras.models import Sequential from keras.layers import Dense, Activation, Dropout from keras.optimizers import SGD import random import nltk from nltk.stem import WordNetLemmatizer lemmatizer &＃61; WordNetLemmatizer() import json import pickle intents_file &＃61; open( &＃39;intents.json&＃39; ).read() intents &＃61; json.loads(intents_file)

这是我们的意图文件的外观。

步骤2.预处理数据

该模型无法获取原始数据。为了使机器易于理解&＃xff0c;它必须经过很多预处理。对于文本数据&＃xff0c;有许多可用的预处理技术。第一种技术是令牌化&＃xff0c;其中我们将句子分解为单词。

通过观察意图文件&＃xff0c;我们可以看到每个标记都包含模式和响应的列表。我们标记每个模式并将单词添加到列表中。另外&＃xff0c;我们创建一个类和文档列表&＃xff0c;以添加与模式相关的所有意图。

words&＃61;[] classes &＃61; [] documents &＃61; [] ignore_letters &＃61; [&＃39;!&＃39; , &＃39;?&＃39; , &＃39;,&＃39; , &＃39;.&＃39; ] for intent in intents[ &＃39;intents&＃39; ]:for pattern in intent[ &＃39;patterns&＃39; ]:#tokenize each wordword &＃61; nltk.word_tokenize(pattern)words.extend(word)#add documents in the corpusdocuments.append((word, intent[ &＃39;tag&＃39; ]))# add to our classes listif intent[ &＃39;tag&＃39; ] not in classes:classes.append(intent[ &＃39;tag&＃39; ]) print(documents)

另一种技术是放缩。我们可以将单词转换为引理形式&＃xff0c;以便我们可以减少所有规范的单词。例如&＃xff0c;单词play&＃xff08;正在播放&＃xff09;&＃xff0c;playing&＃xff08;正在播放&＃xff09;&＃xff0c;plays&＃xff08;已播放&＃xff09;&＃xff0c;played&＃xff08;已播放&＃xff09;等都将被play替换。这样&＃xff0c;我们可以减少词汇量中的总单词数。因此&＃xff0c;现在我们对每个词进行词素化并删除重复的词。

# lemmaztize and lower each word and remove duplicates words &＃61; [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_letters] words &＃61; sorted(list(set(words))) # sort classes classes &＃61; sorted(list(set(classes))) # documents &＃61; combination between patterns and intents print (len(documents), "documents" ) # classes &＃61; intents print (len(classes), "classes" , classes) # words &＃61; all words, vocabulary print (len(words), "unique lemmatized words" , words) pickle.dump(words,open( &＃39;words.pkl&＃39; , &＃39;wb&＃39; )) pickle.dump(classes,open( &＃39;classes.pkl&＃39; , &＃39;wb&＃39; ))

最后&＃xff0c;单词包含我们项目的词汇表&＃xff0c;而类包含要分类的全部实体。为了将python对象保存在文件中&＃xff0c;我们使用了pickle.dump&＃xff08;&＃xff09;方法。这些文件在培训结束后将很有帮助&＃xff0c;我们可以预测聊天记录。

步骤3.创建培训和测试数据

为了训练模型&＃xff0c;我们将每个输入模式转换为数字。首先&＃xff0c;我们将对模式中的每个单词进行定形&＃xff0c;并创建一个与单词总数相同长度的零列表。我们将仅对那些包含模式中单词的索引设置值1。我们将通过将类输入模式所属的设置为1来创建输出。

# create the training data training &＃61; [] # create empty array for the output output_empty &＃61; [ 0 ] * len(classes) # training set, bag of words for every sentence for doc in documents:# initializing bag of wordsbag &＃61; []# list of tokenized words for the patternword_patterns &＃61; doc[ 0 ]# lemmatize each word - create base word, in attempt to represent related wordsword_patterns &＃61; [lemmatizer.lemmatize(word.lower()) for word in word_patterns]# create the bag of words array with 1, if word is found in current patternfor word in words:bag.append( 1 ) if word in word_patterns else bag.append( 0 )# output is a &＃39;0&＃39; for each tag and &＃39;1&＃39; for current tag (for each pattern)output_row &＃61; list(output_empty)output_row[classes.index(doc[ 1 ])] &＃61; 1training.append([bag, output_row]) # shuffle the features and make numpy array random.shuffle(training) training &＃61; np.array(training) # create training and testing lists. X - patterns, Y - intents train_x &＃61; list(training[:, 0 ]) train_y &＃61; list(training[:, 1 ]) print( "Training data is created" )

步骤4.训练模型

我们模型的架构将是由3个密集层组成的神经网络。第一层具有128个神经元&＃xff0c;第二层具有64个神经元&＃xff0c;最后一层将具有与类数相同的神经元。引入了辍学层&＃xff0c;以减少模型的过拟合。我们使用了SGD优化器&＃xff0c;并拟合了数据以开始训练模型。完成200个时期的训练后&＃xff0c;我们然后使用Keras model.save&＃xff08;“ chatbot_model.h5”&＃xff09;函数保存训练后的模型。

# deep neural networds model model &＃61; Sequential() model.add(Dense( 128 , input_shape&＃61;(len(train_x[ 0 ]),), activation&＃61; &＃39;relu&＃39; )) model.add(Dropout( 0.5 )) model.add(Dense( 64 , activation&＃61; &＃39;relu&＃39; )) model.add(Dropout( 0.5 )) model.add(Dense(len(train_y[ 0 ]), activation&＃61; &＃39;softmax&＃39; )) # Compiling model. SGD with Nesterov accelerated gradient gives good results for this model sgd &＃61; SGD(lr&＃61; 0.01 , decay&＃61; 1e-6 , momentum&＃61; 0.9 , nesterov&＃61; True ) model.compile(loss&＃61; &＃39;categorical_crossentropy&＃39; , optimizer&＃61;sgd, metrics&＃61;[ &＃39;accuracy&＃39; ]) #Training and saving the model hist &＃61; model.fit(np.array(train_x), np.array(train_y), epochs&＃61; 200 , batch_size&＃61; 5 , verbose&＃61; 1 ) model.save( &＃39;chatbot_model.h5&＃39; , hist) print( "model is created" )

步骤5.与聊天机器人进行交互

我们的模型已准备好聊天&＃xff0c;因此现在让我们在新文件中为聊天机器人创建一个漂亮的图形用户界面。您可以将文件命名为gui_chatbot.py

在我们的GUI文件中&＃xff0c;我们将使用Tkinter模块构建桌面应用程序的结构&＃xff0c;然后将捕获用户消息&＃xff0c;并再次执行一些预处理&＃xff0c;然后再将消息输入到经过训练的模型中。

然后&＃xff0c;该模型将预测用户消息的标签&＃xff0c;我们将从intent文件中的响应列表中随机选择响应。

这是GUI文件的完整源代码。

import nltk from nltk.stem import WordNetLemmatizer lemmatizer &＃61; WordNetLemmatizer() import pickle import numpy as np from keras.models import load_model model &＃61; load_model( &＃39;chatbot_model.h5&＃39; ) import json import random intents &＃61; json.loads(open( &＃39;intents.json&＃39; ).read()) words &＃61; pickle.load(open( &＃39;words.pkl&＃39; , &＃39;rb&＃39; )) classes &＃61; pickle.load(open( &＃39;classes.pkl&＃39; , &＃39;rb&＃39; )) def clean_up_sentence (sentence) :# tokenize the pattern - splitting words into arraysentence_words &＃61; nltk.word_tokenize(sentence)# stemming every word - reducing to base formsentence_words &＃61; [lemmatizer.lemmatize(word.lower()) for word in sentence_words]return sentence_words # return bag of words array: 0 or 1 for words that exist in sentence def bag_of_words (sentence, words, show_details&＃61;True) :# tokenizing patternssentence_words &＃61; clean_up_sentence(sentence)# bag of words - vocabulary matrixbag &＃61; [ 0 ]*len(words) for s in sentence_words:for i,word in enumerate(words):if word &＃61;&＃61; s: # assign 1 if current word is in the vocabulary positionbag[i] &＃61; 1if show_details:print ( "found in bag: %s" % word)return (np.array(bag)) def predict_class (sentence) :# filter below threshold predictionsp &＃61; bag_of_words(sentence, words,show_details&＃61; False )res &＃61; model.predict(np.array([p]))[ 0 ]ERROR_THRESHOLD &＃61; 0.25results &＃61; [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]# sorting strength probabilityresults.sort(key&＃61; lambda x: x[ 1 ], reverse&＃61; True )return_list &＃61; []for r in results:return_list.append({ "intent" : classes[r[ 0 ]], "probability" : str(r[ 1 ])})return return_list def getResponse (ints, intents_json) :tag &＃61; ints[ 0 ][ &＃39;intent&＃39; ]list_of_intents &＃61; intents_json[ &＃39;intents&＃39; ]for i in list_of_intents:if (i[ &＃39;tag&＃39; ]&＃61;&＃61; tag):result &＃61; random.choice(i[ &＃39;responses&＃39; ])breakreturn result #Creating tkinter GUI import tkinter from tkinter import * def send () :msg &＃61; EntryBox.get( "1.0" , &＃39;end-1c&＃39; ).strip()EntryBox.delete( "0.0" ,END)if msg !&＃61; &＃39;&＃39; :ChatBox.config(state&＃61;NORMAL)ChatBox.insert(END, "You: " &＃43; msg &＃43; &＃39;\n\n&＃39; )ChatBox.config(foreground&＃61; "#446665" , font&＃61;( "Verdana" , 12 ))ints &＃61; predict_class(msg)res &＃61; getResponse(ints, intents)ChatBox.insert(END, "Bot: " &＃43; res &＃43; &＃39;\n\n&＃39; )ChatBox.config(state&＃61;DISABLED)ChatBox.yview(END) root &＃61; Tk() root.title( "Chatbot" ) root.geometry( "400x500" ) root.resizable(width&＃61;FALSE, height&＃61;FALSE) #Create Chat window ChatBox &＃61; Text(root, bd&＃61; 0 , bg&＃61; "white" , height&＃61; "8" , width&＃61; "50" , font&＃61; "Arial" ,) ChatBox.config(state&＃61;DISABLED) #Bind scrollbar to Chat window scrollbar &＃61; Scrollbar(root, command&＃61;ChatBox.yview, cursor&＃61; "heart" ) ChatBox[ &＃39;yscrollcommand&＃39; ] &＃61; scrollbar.set #Create Button to send message SendButton &＃61; Button(root, font&＃61;( "Verdana" , 12 , &＃39;bold&＃39; ), text&＃61; "Send" , width&＃61; "12" , height&＃61; 5 ,bd&＃61; 0 , bg&＃61; "#f9a602" , activebackground&＃61; "#3c9d9b" ,fg&＃61; &＃39;#000000&＃39; ,command&＃61; send ) #Create the box to enter message EntryBox &＃61; Text(root, bd&＃61; 0 , bg&＃61; "white" ,width&＃61; "29" , height&＃61; "5" , font&＃61; "Arial" ) #EntryBox.bind("", send) #Place all components on the screen scrollbar.place(x&＃61; 376 ,y&＃61; 6 , height&＃61; 386 ) ChatBox.place(x&＃61; 6 ,y&＃61; 6 , height&＃61; 386 , width&＃61; 370 ) EntryBox.place(x&＃61; 128 , y&＃61; 401 , height&＃61; 90 , width&＃61; 265 ) SendButton.place(x&＃61; 6 , y&＃61; 401 , height&＃61; 90 ) root.mainloop()

运行聊天机器人

现在我们有两个单独的文件&＃xff0c;一个是train_chatbot.py&＃xff0c;我们将首先使用它来训练模型。

python train_chatbot. py

通过源代码探索更多&＃64;Python项目 。

翻译自: https://hackernoon.com/python-chatbot-project-build-your-first-python-project-5mt30mi

chatbot使用

推荐阅读

yaml
Python配置文件读写指南

本文详细介绍如何使用Python进行配置文件的读写操作，涵盖常见的配置文件格式（如INI、JSON、TOML和YAML），并提供具体的代码示例。 ... [详细]

蜡笔小新 2024-12-28 08:39:55
search
图像标签与以图搜图技术的应用与实践

本文探讨了图像标签的多种分类场景及其在以图搜图技术中的应用，涵盖了从基础理论到实际项目实施的全面解析。 ... [详细]

蜡笔小新 2024-12-07 14:28:06
utf-8
自然语言处理(NLP)——LDA模型:对电商购物评论进行情感分析

目录一、2020数学建模美赛C题简介需求评价内容提供数据二、解题思路三、LDA简介四、代码实现1.数据预处理1.1剔除无用信息1.1.1剔除掉不需要的列1.1.2找出无效评论并剔除 ... [详细]

蜡笔小新 2024-11-14 18:21:21
join
Python 的 10 个开发技巧！太实用了

1.如何在运行状态查看源代码？查看函数的源代码，我们通常会使用IDE来完成。比如在PyCharm中，你可以Ctrl+鼠标点击进入函数的源代码。那如果没有IDE呢？当我们想使用一个函 ... [详细]

蜡笔小新 2024-12-27 18:36:54
client
Python自动化处理：从Word文档提取内容并生成带水印的PDF

本文介绍如何利用Python实现从特定网站下载Word文档，去除水印并添加自定义水印，最终将文档转换为PDF格式。该方法适用于批量处理和自动化需求。 ... [详细]

蜡笔小新 2024-12-27 13:10:20
utf-8
XNA 3.0 游戏编程：从 XML 文件加载数据

本文介绍如何在 XNA 3.0 游戏项目中从 XML 文件加载数据。我们将探讨如何将 XML 数据序列化为二进制文件，并通过内容管道加载到游戏中。此外，还会涉及自定义类型读取器和写入器的实现。 ... [详细]

蜡笔小新 2024-12-27 11:39:44
command
掌握远程执行Linux脚本和命令的技巧

本文将详细介绍如何利用Python的Paramiko库实现远程执行Linux脚本和命令，帮助读者快速掌握这一实用技能。通过具体的示例和详尽的解释，让初学者也能轻松上手。 ... [详细]

蜡笔小新 2024-12-26 19:47:05
utf-8
从 .NET 转 Java 的自学之路：IO 流基础篇

本文详细介绍了 Java 中的 IO 流，包括字节流和字符流的基本概念及其操作方式。探讨了如何处理不同类型的文件数据，并结合编码机制确保字符数据的正确读写。同时，文中还涵盖了装饰设计模式的应用，以及多种常见的 IO 操作实例。 ... [详细]

蜡笔小新 2024-12-26 17:37:25
request
Python 爬虫基础教程及代码实例

根据最新发布的《互联网人才趋势报告》，尽管大量IT从业者已转向Python开发，但随着人工智能和大数据领域的迅猛发展，仍存在巨大的人才缺口。本文将详细介绍如何使用Python编写一个简单的爬虫程序，并提供完整的代码示例。 ... [详细]

蜡笔小新 2024-12-26 10:42:40
join
Python处理Word文档的高效技巧

本文详细介绍了如何使用Python处理Word文档，涵盖从基础操作到高级功能的各种技巧。我们将探讨如何生成文档、定义样式、提取表格数据以及处理超链接和图片等内容。 ... [详细]

蜡笔小新 2024-12-23 10:40:32
join
使用Python批量处理图片尺寸调整

本文介绍了如何利用Python进行批量图片尺寸调整，包括放大和等比例缩放。文中提供了详细的代码示例，并解释了每个步骤的具体实现方法。 ... [详细]

蜡笔小新 2024-12-22 17:13:05
command
利用 Python 和 Scapy 实施 DNS 欺骗攻击的技术解析

本文详细介绍了如何使用 Python 编程语言中的 Scapy 库执行 DNS 欺骗攻击，包括必要的软件安装、攻击流程及代码示例。 ... [详细]

蜡笔小新 2024-11-25 15:52:30
join
Python基础：使用NLTK和Python构建机器学习应用

本文节选自《NLTK基础教程——用NLTK和Python库构建机器学习应用》一书的第1章第1.2节，作者Nitin Hardeniya。本文将带领读者快速了解Python的基础知识，为后续的机器学习应用打下坚实的基础。 ... [详细]

蜡笔小新 2024-11-13 21:23:34
cmd
360SRC安全应急响应：从漏洞提交到修复的全过程

本文详细介绍了360SRC平台处理一起关键安全事件的过程，涵盖从漏洞提交、验证、排查到最终修复的各个环节。通过这一案例，展示了360在安全应急响应方面的专业能力和严谨态度。 ... [详细]

蜡笔小新 2024-12-27 11:10:05
cmd
Facebook PrestoDB 配置指南

本指南详细介绍了如何安装和配置 Facebook PrestoDB，包括必要的文件设置和启动方法。 ... [详细]

蜡笔小新 2024-12-11 13:34:34

YI恐龙_554

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章