2
You might want to look at batch()
too. The reason why it'd be slower with multi()
is because it's transactional. If something failed, nothing would be executed. That may be what you want, but you do have a choice for speed here.
您可能也想查看batch()。使用multi()时速度较慢的原因是它是事务性的。如果失败了,什么也不会执行。这可能是你想要的,但是你可以选择速度。
The redis-stream package doesn't seem to make use of Redis' mass insert functionality so it's also slower than the mass insert Redis' site goes on to talk about with redis-cli
.
Redis -stream包似乎并没有利用Redis的大规模插入功能,因此它也比mass insert Redis的站点在谈到Redis -cli时要慢。
Another idea would be to use redis-cli and give it a file to stream from, which this NPM package does: https://github.com/almeida/redis-mass
另一个想法是使用redis-cli并给它一个文件以流,这个NPM包就是这样做的:https://github.com/almeida/redis-mass
Not keen on writing to a file on disk first? This repo: https://github.com/eugeneiiim/node-redis-pipe/blob/master/example.js
不喜欢先写磁盘上的文件?这种回购:https://github.com/eugeneiiim/node-redis-pipe/blob/master/example.js
...also streams to Redis, but without writing to file. It streams to a spawned process and flushes the buffer every so often.
…也流到Redis,但没有写入文件。它流到一个衍生的进程,并经常刷新缓冲区。
On Redis' site under mass insert (http://redis.io/topics/mass-insert) you can see a little Ruby example. The repo above basically ported that to Node.js and then streamed it directly to that redis-cli
process that was spawned.
在Redis' site under mass insert (http://redis.io/topics/mass-insert)中,您可以看到一个小Ruby示例。上面的repo基本上将其移植到Node。然后将它直接流到生成的redis-cli进程。
So in Node.js, we have:
所以在节点。js,我们有:
var redisPipe = spawn('redis-cli', ['--pipe']);
var redisipe = spawn('redis-cli', ['- pipe']);
spawn()
returns a reference to a child process that you can pipe to with stdin
. For example: redisPipe.stdin.write()
.
spawn()返回对子进程的引用,您可以使用stdin将该进程转换为管道。例如:redisPipe.stdin.write()。
You can just keep writing to a buffer, streaming that to the child process, and then clearing it every so often. This then won't fill it up and will therefore be a bit better on memory than perhaps the node_redis
package (that literally says in its docs that data is held in memory) though I haven't looked into it that deeply so I don't know what the memory footprint ends up being. It could be doing the same thing.
您可以继续写入缓冲区,将其流到子进程,然后每隔一段时间就清除它。这就不会把它填平,因此有点记忆比也许node_redis包(在其文档,从字面上说,数据保存在内存中)虽然我没有深入,所以我不知道内存占用最终被。它也可以做同样的事情。
Of course keep in mind that if something goes wrong, it all fails. That's what tools like fluentd were created for (and that's yet another option: http://www.fluentd.org/plugins/all - it has several Redis plugins)...But again, it means you're backing data on disk somewhere to some degree. I've personally used Embulk to do this too (which required a file on disk), but it did not support mass inserts, so it was slow. It took nearly 2 hours for 30,000 records.
当然要记住,如果出了问题,一切都会失败。这就是像fluentd这样的工具被创建的目的(这是另一个选项:http://www.fluentd.org/plugins/all——它有几个Redis插件)……但同样,这也意味着在某种程度上你在备份磁盘上的数据。我个人也使用Embulk进行此操作(这需要磁盘上的文件),但是它不支持大量插入,所以速度很慢。3万张唱片用了近2个小时。
One benefit to a streaming approach (not backed by disk) is if you're doing a huge insert from another data source. Assuming that data source returns a lot of data and your server doesn't have the hard disk space to support all of it - you can stream it instead. Again, you risk failures.
流处理方法的一个好处(不支持磁盘)是如果您正在从另一个数据源进行大量插入。假设数据源返回大量数据,并且您的服务器没有硬盘空间来支持所有这些数据,那么您可以对其进行流处理。你失败的风险。
I find myself in this position as I'm building a Docker image that will run on a server with not enough disk space to accommodate large data sets. Of course it's a lot easier if you can fit everything on the server's hard disk...But if you can't, streaming to redis-cli
may be your only option.
我发现自己处于这个位置,因为我正在构建一个Docker映像,它将在一个没有足够的磁盘空间来容纳大型数据集的服务器上运行。当然,如果你能把所有东西都安装到服务器的硬盘上,那就容易多了。但如果不能,则可以将其流到redis-cli中。
If you are really pushing a lot of data around on a regular basis, I would probably recommend fluentd to be honest. It comes with many great features for ensuring your data makes it to where it's going and if something fails, it can resume.
如果你真的定期发布大量数据,我可能会建议fluentd说实话。它提供了许多很棒的特性,可以确保数据到达它要去的地方,如果失败了,它可以继续。
One problem with all of these Node.js approaches is that if something fails, you either lose it all or have to insert it all over again.
所有这些节点都存在一个问题。js的方法是,如果某件事失败了,要么全部丢失,要么重新插入。