关于ORM+RDBMS的替代方案的想法?-IdeasonthisalternativetoORM+RDBMS?

作者：手机用户2502938297 | 来源：互联网 | 2023-05-19 09:59

Iamcurrentlydevelopingaproofofconceptforanalternativedatastore.ThereasonwhyisIneed

I am currently developing a proof of concept for an alternative data store. The reason why is I need to enhance a read-mostly clustered webapp, but also because I want to free myself from the pain of the sometimes overly-complex ORM+RDBMS solution.

我目前正在为另一种数据存储开发概念证明。原因是我需要增强一个主要是阅读的集群webapp，但也因为我想摆脱ORM+RDBMS解决方案有时过于复杂的痛苦。

Overall the idea is quite similar to a distributed cache with persistence (letting the cluster be the SoR), however:

总的来说，这个想法与具有持久性的分布式缓存非常相似(让集群成为SoR)，但是:

want to be able to retrieve any object along with its children, by id (providing class & id) [only that to start off, as the main querying part is already resolved with lucene in my app].
希望能够通过id(提供类和id)来检索任何对象及其子对象(只需要从它开始，因为主要的查询部分已经在我的应用程序中通过lucene解析)。
need to have map of maps of types ( ~ tables in the relational world), and therein distributed maps of 'dehydrated' stored objects (flattening the object graph via reflection deep cloning)
需要有类型映射(关系世界中的表)，以及“脱水”存储对象的分布式映射(通过反射深度克隆来扁平化对象图)
a bin log (like Prevayler, for example) for
- eventual recovery if whole cluster goes down
- 如果整个集群崩溃，最终的恢复
- development (and ability to refactor code / change structure)
- 开发(以及重构代码/更改结构的能力)
- perhaps asynchronously processed for other purposes (reporting, whatever)
- 可能为了其他目的异步处理(报告，无论什么)
如果整个集群的开发(以及重构代码/更改结构的能力)可能会被异步处理(报告，无论如何)，那么一个bin日志(比如Prevayler)将最终恢复。
eventually later on try to integrate a statically-typed query mechanism, like LINQ, Jaque or H2's JaQu / see ODBs / Lucene (?)
最后，尝试集成静态类型的查询机制，如LINQ、Jaque或H2的jticus / see ODBs / Lucene (?)
it has to be transaction-aware (not sure "JTA type" though)
它必须具有事务意识(虽然不确定是“JTA类型”)

I'm planning to implement this idea with Hazelcast (I love its super-simple API) or Terracotta (which I never used - but I'm aware of their 'sweet spot', medium-term data). If you will, my aim is to do more or less what Jonas once blogged about. Using one of these, stored data would roughly have to fit in the sum of the JVM heaps of the cluster.

我打算用Hazelcast(我喜欢它的超级简单的API)或Terracotta(我从未用过)来实现这个想法，但我知道他们的“最佳点”，中期数据)。如果你愿意，我的目标是多或少做些乔纳斯曾经写过的东西。使用其中的一个，存储的数据大概需要与集群中JVM堆的总和相匹配。

This should be pretty simple to scale, would avoid the relational impedance mismatch (ie save as with an ODB) and JDBC + I/O overhead.

这应该很容易伸缩，可以避免关系阻抗不匹配(即使用ODB保存)和JDBC + I/O开销。

Do you know of other tools/frameworks or combination thereof already providing similar functionality, that I'm ignoring? Can you suggest other ways of tackling this 'getting rid of the DB'? What flaws do you already see in this idea? Concurrency-wise would it make sense to consider Scala instead of Java?

你知道有哪些工具/框架或它们的组合已经提供了类似的功能，而我却忽略了这些功能吗?你能提出解决这个问题的其他方法吗?你在这个想法中已经看到了哪些缺陷?从并发的角度来看，考虑Scala而不是Java有意义吗?

How about non-relational data stores such as Couch DB, Neo4j, HyperTable, HBase?

非关系数据存储(如Couch DB、Neo4j、HyperTable、HBase)又如何呢?

A similar question was asked one month ago - but there was no concrete solution.

一个月前曾有人提出过类似的问题，但没有具体的解决方案。

BTW I just stumbled upon the concept of Enterprise Data Fabric, which, to my surprise, describes a lot of these ideas.

我偶然发现了企业数据结构的概念，让我惊讶的是，它描述了很多这样的想法。

6 个解决方案

#1

Definitely give Terracotta a try. It's free (unless you go Enterprise which has an SLA and support). It is a JVM-level cluster, so to speak, so you don't have the issues associated with sessions on multiple boxes behind disparate JK workers (assuming you're using this for a J2EE app).

一定要试试兵马俑。它是免费的(除非您去具有SLA和支持的企业)。可以说，它是一个jvm级别的集群，因此在不同的JK workers(假设您在J2EE应用程序中使用它)后面的多个盒子上没有与会话相关的问题。

I'm just rambling, so have a look here: http://en.wikipedia.org/wiki/Terracotta_Cluster

我只是随便说说，看看这里:http://en.wikipedia.org/wiki/Terracotta_Cluster

UPDATE numerous bits of info on Terracotta on the web too, e.g. http://blog.terracottatech.com/2007/12/fud_of_the_week_terracotta_doe.html

在网络上更新关于Terracotta的大量信息，例如http://blog.terracottatech.com/2007/12/fud_of_the_week_terracotta_doe.html

UPDATE2 Bit of background on my views: I work for a company with a fairly big audience. We have a enterprise MySQL running with a master and about 5 slaves (times 2 considering we have 2 channels, with 4 app servers per channel), using MySQL's JDBC Replication driver (for which we've already submitted various patches). We use Spring2.5/Hibernate3 using Spring's declarative JTA transaction management, so read-onlies go to the slaves. With the advent of numerous Ajax enhancements on a future version of our site, our DB servers' load has gone up - we create pricing summaries for thousands of products for all countries, taking into account duties/tax rules for all these countries (plus promotions and real-time auctions running all the time), then the Ajax services have the latest prices in a blink. Terracotta takes the load off the DB and app servers by making these prices available to all app servers on a JVM-layer, with all the JVMs across the boxes linked. So, server A can update the prices every few minutes, and if Ajax hits server B, the prices are available immediately. I know there are people/companies out there with similar businesses, who probably have better ideas and implementations, so I'm always open for discussion, but this is my two cents.

更新:我的观点:我在一家有相当大客户的公司工作。我们有一个企业MySQL，运行主服务器和大约5个从服务器(考虑到我们有两个通道，每个通道有4个应用服务器)，使用MySQL的JDBC复制驱动程序(我们已经提交了各种补丁)。我们使用Spring2.5/Hibernate3使用Spring的声明性JTA事务管理，因此读-onlies将转到从服务器。随着大量Ajax增强在未来版本的我们的网站,我们的数据库服务器的负载上升——我们创造成千上万的产品定价总结所有国家,考虑到所有这些国家的关税/税规则(加上促销和实时拍卖运行所有的时间),然后Ajax服务拥有最新的价格在一个眨眼。Terracotta通过将这些价格提供给一个jvm层的所有应用程序服务器，并将所有的jvm都连接在一起，从而减轻了DB和app服务器的负载。因此，服务器A可以每隔几分钟更新一次价格，如果Ajax攻击服务器B，价格立即可用。我知道有些人/公司有类似的业务，他们可能有更好的想法和实现，所以我总是愿意讨论，但这是我的两点。

I get inspiration from the guys at Facebook too, for instance this very informative article: http://www.facebook.com/note.php?note_id=23844338919

我也从Facebook上的人那里得到了灵感，比如这篇非常有用的文章:http://www.facebook.com/note.php?

They talk about memcached which you should also definitely check out.

他们谈到memcached你也应该去看看。

#2

As Neo4j is mentioned in the question, I'm chiming in with a few thoughts on using a graph database in this case. (I'm part of the Neo4j team)

正如问题中提到的Neo4j一样，在本例中，我加入了一些关于使用图形数据库的想法。(我是Neo4j团队的一员)

retrieving children is trivial in a graph db
在图db中，检索子节点很简单
there is a map implementation for neo4j
neo4j有一个映射实现
as graphs are native to a graph db you could consider not to flatten the object graph, but to persist data in nodes and edges/relationships (this gives you more flexibility in handling the data)
由于图是图形db的原生图形，所以您可以考虑不使对象图变平，而是将数据保存在节点和边/关系中(这使您在处理数据时更加灵活)
neo4j is fully transactional
neo4j完全事务性

With the new DB technologies emerging today, there's really no need to stay with a RDBMS if your data isn't a good fit for the relational paradigm.

随着新的DB技术的出现，如果您的数据不适合关系范式，那么就不需要使用RDBMS。

#3

Seems to me Terracotta is a perfect fit for your requirements:

在我看来，Terracotta很适合你的需求:

cluster a map to retrieve children via keys (e.g. clustered Map)
群集一个映射以通过键检索子映射(例如，群集映射)
map of maps - no problem
地图地图-没问题
no explicit bin log - but Terracotta already persists everything to disk so full cluster restart is already supported
没有显式的bin日志——但是Terracotta已经将所有内容保存到磁盘中，所以已经支持完全集群重新启动
integrated already to Compass, Hibernate Search, and Lucene for search
已经集成到Compass、Hibernate搜索和Lucene搜索中
Transactions? Too slow. Use the cache as a datastore. With persistence you won't lose data writing to (clustered) memory and trickle back to the DB.
事务?太慢了。使用缓存作为数据存储。使用持久性，您不会丢失数据写入(集群)内存并返回到DB。

In addition, Terracotta does the "reflection" thing you ask for - although it doesn't use reflection as that is far too slow. It uses BCM. Only changes are propagated on the network.

此外，Terracotta做了你想要的“反射”——尽管它不使用反射，因为反射太慢了。它使用BCM。只有更改在网络上传播。

Hazelcast btw requires serialization so it will be slow and will not do well at all with a map of maps data structure (every put will result in a full deep clone copy across the network) and it doesn't have any kind of persistence built in.

Hazelcast btw需要串行化，所以它会很慢，而且在映射数据结构的映射上不会做得很好(每次放置都会在网络上产生一个完整的深度克隆拷贝)，而且它没有内置任何持久性。

#4

Interesting.

有趣。

I have a view that we all develop a zoo which comprises all the abstraction layers we habitually use in our projects. And each abstraction layer is a completely different animal.

我有一个观点，我们都开发了一个动物园，它包含了我们在项目中习惯使用的所有抽象层。每个抽象层都是完全不同的动物。

My goal is to minimize the amount of time spent on just care and feeding of the animals whenever it diverts me from solving the problem at hand - it's overhead - wasted resources. So the fewer, simpler abstraction layers we can get away with, the more productive we are.

我的目标是尽量减少在照顾和喂养动物上花费的时间，只要它让我无法解决手头的问题——这是开销——浪费资源。因此，我们可以摆脱的抽象层越少、越简单，我们的效率就越高。

I can usually do just fine with two beasties - OOP and RDBMS, coupled through nice, simple, minimal, hand-crafted DAL. For me, ORM is mostly overhead - one abstraction too many, and a pretty hungry one.

我通常可以很好地使用两个beasties—OOP和RDBMS，通过漂亮的、简单的、最小的手工制作的DAL进行耦合。对我来说，ORM主要是开销——一个抽象概念太多，一个相当饥饿的抽象概念。

Don't discount the option of treating stored procedures as an abstraction tool, either. If you're real comfortable with SQL, it can be a useful resource for implementing a light-weight BL facade that means not needing to think about the ORM problem.

也不要放弃将存储过程视为抽象工具的选择。如果您对SQL非常熟悉，那么它可以成为实现轻量级BL facade的有用资源，这意味着不需要考虑ORM问题。

And this post suggests the emergence of alternatives to RDBMS for some requirements, anyway.

无论如何，这篇文章表明，对于某些需求，RDBMS的替代方案正在出现。

#5

Thanks for your answers.

谢谢你的答案。

Actually, you talk about DBs which is something I want to completely take out of the picture.

实际上，你说的是DBs，这是我想要完全去掉的。

The use case I'm targetting is a startup's small/medium-sized clustered webapp (boxes in a LAN, or in the cloud). It needs to retrieve objects at ~RAM-speed levels and scale fairly easily. As a side-effect, one wouldn't have to think about DB server installations, impedance mismatch, JDBC, caches, polluting domain models with annotations, etc.

我在target中的用例是一个初创公司的小型/中型集群webapp (LAN中的盒子，或者云中的盒子)。它需要以一定的速度级别检索对象，并且很容易扩展。作为副作用，您不必考虑DB服务器安装、阻抗不匹配、JDBC、缓存、使用注释污染域模型等等。

Again, what I want to accomplish is something like described here, and I would love to have some more feedback on ideas concerning the actual implementation (why use Terracotta instead of Hazelcast, use serialization or deep cloning via reflection or whatever else, and also the major drawbacks of an approach like this - eg. why wouldn't you change it for your current ORM/DB setup).

再一次,我想要实现的是这里所描述,我更爱有一些反馈关于实际实现的想法(为什么使用Terracotta Hazelcast,而是使用序列化或深克隆通过反射或者其他,还有这样的一种方法——如的主要缺点。为什么不为当前的ORM/DB设置更改它呢?

It has to be super simple to integrate so it'll feature a really neat Java API, improving code readability. No other software (DB, memcached will be required).

集成起来必须非常简单，这样才能提供一个非常简洁的Java API，提高代码的可读性。不需要其他软件(DB, memcached)。

#6

Try GigaSpaces. I think they have exactly what you require, and if I'm not mistaken there's a free version for startups.

GigaSpaces试试。我认为他们完全符合你的要求，如果我没弄错的话，还有一个免费的初创公司版本。

Some concepts:

一些概念:

"Space" is some place where you can store and retrieve objects
“空间”是可以存储和检索对象的地方
Space can be backed by any JDBC-compliant DB, automatically (no code, only configuration)
空间可以由任何兼容jdbc的DB自动支持(没有代码，只有配置)
Space can be started in your java process, so all accesses are at RAM speed
可以在java进程中启动空间，因此所有访问都以RAM速度进行
Space can be clustered/partitioned in any way you want (full mirror, partial, grid).
空间可以以任何您想要的方式进行集群/分区(全镜像、部分、网格)。
Space supports distributed or local transactions
空间支持分布式或本地事务