热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

FileNotFoundError:[Errno2]Nosuchfileordirectory:‘errors.out‘(python自然语言处理章节5.6最后的示例报错)

在使用python3.7运行NaturalLanguageProcessingwithPythonChapter5的最后一个示例fromnltk.tblimportdemoasbr

在使用python3.7运行Natural Language Processing with Python Chapter 5 的最后一个示例

from nltk.tbl import demo as brill_demo
brill_demo.demo()
print(open("errors.out").read())

时, 出现如下错误:


Traceback (most recent call last):
File "E:/Python Practice/NLP/Chapter5.py", line 332, in
print(open("errors.out").read())
FileNotFoundError: [Errno 2] No such file or directory: 'errors.out'

字面意思就是说,该文件不存在,在当前目录查找后也确实没有。通过搜索没有找到现成的解决方法,于是在StackOverflow求助,怀疑是nltk.tbl.demo模块的版本问题——是不是新的模块中有其他类似的生成errors.out文件的方法?

于是查看nltk/tbl/demo模块的源码,果然发现有一个类似的函数,如下

def demo_error_analysis():
"""
Writes a file with context for each erroneous word after tagging testing data
"""
postag(error_output="errors.txt")

根据注释,发现这个函数的功能正是生成类似errors.out的文件。于是自然就想到,我们只要首先执行demo_error_analysis()函数,然后读取生成的文件就好啦,

brill_demo.demo_error_analysis()

然而事情往往没有那么简单。。。运行后报错如下:

Traceback (most recent call last):
File "E:/Python Practice/NLP/Chapter5.py", line 331, in
brill_demo.demo_error_analysis()
File "D:\Anaconda3\lib\site-packages\nltk\tbl\demo.py", line 124, in demo_error_analysis
postag(error_output="errors.txt")
File "D:\Anaconda3\lib\site-packages\nltk\tbl\demo.py", line 322, in postag
u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n" #
TypeError: can't concat str to bytes

跟随提示的路径找到报错所在的源文件,如下

# writing error analysis to file
if error_output is not None:
with open(error_output, "w") as f:
f.write("Errors for Brill Tagger %r\n\n" % serialize_output)
f.write(
u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n"
)
print("Wrote tagger errors including context to {0}".format(error_output))

那么报错的意思就是说,在下面这一行,生成error_list时出现类型转换的问题了

u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n"

通过查阅这篇文章,发现问题所在:encode函数返回的是bytes类型的变量,不可以直接和string类型的变量合并,需要再调用decode函数,把bytes类型转变为string类型。

因此,解决方法很简单,即把这一行改成

u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8").decode() + "\n" #add .decode()

(修改时可能会出现提示信息询问是否确认修改,放心大胆的改吧朋友们,如果不放心的话后面注释一下修改的内容,向我上面那样做)

经过小小的改动之后,再次运行 

brill_demo.demo_error_analysis()

这时候就正常啦!

Loading tagged data from treebank...
Read testing data (200 sents/5251 wds)
Read training data (800 sents/19933 wds)
Read baseline data (800 sents/19933 wds) [reused the training set]
Trained baseline tagger
Accuracy on test set: 0.8366
Training tbl tagger...
TBL train (fast) (seqs: 800; tokens: 19933; tpls: 24; min score: 3; min acc: None)
Finding initial useful rules...
Found 12799 useful rules.
B |
S F r O | Score = Fixed - Broken
c i o t | R Fixed = num tags changed incorrect -> correct
o x k h | u Broken = num tags changed correct -> incorrect
r e e e | l Other = num tags changed incorrect -> incorrect
e d n r | e
------------------+-------------------------------------------------------
23 23 0 0 | POS->VBZ if Pos:[email protected][-2,-1]
18 19 1 0 | NN->VB if Pos:[email protected][-2] & Pos:[email protected][-1]
14 14 0 0 | VBP->VB if Pos:[email protected][-2,-1]
12 12 0 0 | VBP->VB if Pos:[email protected][-1]
11 11 0 0 | VBD->VBN if Pos:[email protected][-1]
11 11 0 0 | IN->WDT if Pos:[email protected][1] & Pos:[email protected][2]
10 11 1 0 | VBN->VBD if Pos:[email protected][-1]
9 10 1 0 | VBD->VBN if Pos:[email protected][-1]
8 8 0 0 | NN->VB if Pos:[email protected][-1]
7 7 0 1 | VB->NN if Pos:[email protected][-1]
7 7 0 0 | VB->VBP if Pos:[email protected][-1]
7 7 0 0 | IN->WDT if Pos:[email protected][1] & Pos:[email protected][2]
7 8 1 0 | IN->RB if Word:[email protected][2]
6 6 0 0 | VBD->VBN if Pos:[email protected][-2,-1]
6 6 0 1 | IN->WDT if Pos:[email protected][1] & Pos:[email protected][2]
5 5 0 0 | POS->VBZ if Pos:[email protected][-1]
5 5 0 0 | VB->VBP if Pos:[email protected][-1]
5 5 0 0 | VBD->VBN if Word:[email protected][-2,-1]
4 4 0 0 | POS->VBZ if Pos:``@[-2]
4 4 0 0 | VBP->VB if Pos:[email protected][-2,-1]
4 6 2 3 | RP->RB if Pos:[email protected][1,2]
4 4 0 0 | RB->JJ if Pos:[email protected][-1] & Pos:[email protected][1]
4 4 0 0 | NN->VBP if Pos:[email protected][-2] & Pos:[email protected][-1]
4 5 1 0 | VBN->VBD if Pos:[email protected][-2] & Pos:[email protected][-1]
4 4 0 0 | IN->WDT if Pos:[email protected][1] & Pos:[email protected][2]
4 8 4 0 | VBD->VBN if Word:*@[1]
4 4 0 0 | JJS->RBS if Word:[email protected][0] & Word:[email protected][-1] & Pos:[email protected][-1]
3 3 0 0 | VBD->VBN if Pos:[email protected][-1]
3 4 1 0 | VBN->VB if Pos:[email protected][-1]
3 4 1 1 | IN->RB if Pos:[email protected][1]
3 3 0 0 | JJ->RB if Pos:[email protected][1]
3 3 0 0 | PRP$->PRP if Pos:[email protected][1]
3 3 0 0 | NN->VBP if Pos:[email protected][-1] & Pos:[email protected][1]
3 3 0 0 | VBP->VB if Word:n'[email protected][-2,-1]
Trained tbl tagger in 2.45 seconds
Accuracy on test set: 0.8572
Tagging the test data
Wrote tagger errors including context to errors.txt

我们可以看到当前目录下多出了一个errors.txt文件

最后一步,读取并输出文件

print(open("errors.txt").read())

输出内容如下(部分):

Errors for Brill Tagger None
left context | word/test->gold | right context
--------------------------+------------------------+--------------------------
| Soon/NN->RB | ,/, T-shirts/NNS *ICH*-1/
n/IN the/DT corridors/NNS | that/IN->WDT | *T*-2/-NONE- carried/VBD
NNS that/WDT *T*-2/-NONE- | carried/VBN->VBD | the/DT school/NN 's/POS f
D the/DT school/NN 's/POS | familiar/NN->JJ | red-and-white/JJ GHS/NNP
ool/NN 's/POS familiar/JJ | red-and-white/NN->JJ | GHS/NNP logo/NN on/IN the
iliar/JJ red-and-white/JJ | GHS/NN->NNP | logo/NN on/IN the/DT fron
/NN ,/, the/DT shirts/NNS | read/VBP->VBD | ,/, ``/`` We/PRP have/VBP
,/, ``/`` We/PRP have/VBP | all/DT->PDT | the/DT answers/NNS ./. ''
JJ colleagues/NNS are/VBP | angry/NN->JJ | at/IN Mrs./NNP Yeargin/NN
n/NNP Rice/NNP ,/, who/WP | *T*-100/NN->-NONE- | had/VBD discovered/VBN th
VBD discovered/VBN the/DT | crib/JJ->NN | notes/NNS ./.
``/`` We/PRP | work/NN->VBP | damn/RB hard/RB at/IN wha
``/`` We/PRP work/VBP | damn/NN->RB | hard/RB at/IN what/WP we/
/IN what/WP we/PRP do/VBP | *T*-101/NN->-NONE- | for/IN damn/RB little/JJ
VBP *T*-101/-NONE- for/IN | damn/NN->RB | little/JJ pay/NN ,/, and/
...

至此,我们就解决了最初的问题~

赶在双十一的尾巴总结一下这个困扰我两三个小时的问题,希望对后来者有帮助~


推荐阅读
author-avatar
广东庚舞飞扬
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有