热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

编码实现txt文件导入mongodb,将txt文件中的全文存储到mongodb中

IhavecreatedapythonscriptthatautomatesaworkflowconvertingPDFtotxtfiles.Iwanttobeabletostor

I have created a python script that automates a workflow converting PDF to txt files. I want to be able to store and query these files in MongoDB. Do I need to turn the .txt file into JSON/BSON? Should I be using a program like PyMongo?

I am just not sure what the steps of such a project would be let alone the tools that would help with this.

I've looked at this post: How can one add text files in Mongodb?, which makes me think I need to convert the file to a JSON file, and possibly integrate GridFS?

解决方案

You don't need to JSON/BSON encode it if you're using a driver. If you're using the MongoDB shell, you'd need to worry about it when you pasted the contents.

You'd likely want to use the Python MongoDB driver:

from pymongo import MongoClient

client = MongoClient()

db = client.test_database # use a database called "test_database"

collection = db.files # and inside that DB, a collection called "files"

f = open('test_file_name.txt') # open a file

text = f.read() # read the entire contents, should be UTF-8 text

# build a document to be inserted

text_file_doc = {"file_name": "test_file_name.txt", "contents" : text }

# insert the contents into the "file" collection

collection.insert(text_file_doc)

(Untested code)

If you made sure that the file names are unique, you could set the _id property of the document and retrieve it like:

text_file_doc = collection.find_one({"_id": "test_file_name.txt"})

Or, you could ensure the file_name property as shown above is indexed and do:

text_file_doc = collection.find_one({"file_name": "test_file_name.txt"})

Your other option is to use GridFS, although it's often not recommended for small files.

There's a starter here for Python and GridFS.



推荐阅读
author-avatar
涂凌萱_TLX_9s7_140
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有