作者:手机用户2502938297 | 来源:互联网 | 2023-05-19 09:59
Iamcurrentlydevelopingaproofofconceptforanalternativedatastore.ThereasonwhyisIneed
I am currently developing a proof of concept for an alternative data store. The reason why is I need to enhance a read-mostly clustered webapp, but also because I want to free myself from the pain of the sometimes overly-complex ORM+RDBMS solution.
我目前正在为另一种数据存储开发概念证明。原因是我需要增强一个主要是阅读的集群webapp,但也因为我想摆脱ORM+RDBMS解决方案有时过于复杂的痛苦。
Overall the idea is quite similar to a distributed cache with persistence (letting the cluster be the SoR), however:
总的来说,这个想法与具有持久性的分布式缓存非常相似(让集群成为SoR),但是:
- want to be able to retrieve any object along with its children, by id (providing class & id) [only that to start off, as the main querying part is already resolved with lucene in my app].
- 希望能够通过id(提供类和id)来检索任何对象及其子对象(只需要从它开始,因为主要的查询部分已经在我的应用程序中通过lucene解析)。
- need to have map of maps of types ( ~ tables in the relational world), and therein distributed maps of 'dehydrated' stored objects (flattening the object graph via reflection deep cloning)
- 需要有类型映射(关系世界中的表),以及“脱水”存储对象的分布式映射(通过反射深度克隆来扁平化对象图)
- a bin log (like Prevayler, for example) for
- eventual recovery if whole cluster goes down
- 如果整个集群崩溃,最终的恢复
- development (and ability to refactor code / change structure)
- 开发(以及重构代码/更改结构的能力)
- perhaps asynchronously processed for other purposes (reporting, whatever)
- 可能为了其他目的异步处理(报告,无论什么)
- 如果整个集群的开发(以及重构代码/更改结构的能力)可能会被异步处理(报告,无论如何),那么一个bin日志(比如Prevayler)将最终恢复。
- eventually later on try to integrate a statically-typed query mechanism, like LINQ, Jaque or H2's JaQu / see ODBs / Lucene (?)
- 最后,尝试集成静态类型的查询机制,如LINQ、Jaque或H2的jticus / see ODBs / Lucene (?)
- it has to be transaction-aware (not sure "JTA type" though)
- 它必须具有事务意识(虽然不确定是“JTA类型”)
I'm planning to implement this idea with Hazelcast (I love its super-simple API) or Terracotta (which I never used - but I'm aware of their 'sweet spot', medium-term data). If you will, my aim is to do more or less what Jonas once blogged about. Using one of these, stored data would roughly have to fit in the sum of the JVM heaps of the cluster.
我打算用Hazelcast(我喜欢它的超级简单的API)或Terracotta(我从未用过)来实现这个想法,但我知道他们的“最佳点”,中期数据)。如果你愿意,我的目标是多或少做些乔纳斯曾经写过的东西。使用其中的一个,存储的数据大概需要与集群中JVM堆的总和相匹配。
This should be pretty simple to scale, would avoid the relational impedance mismatch (ie save as with an ODB) and JDBC + I/O overhead.
这应该很容易伸缩,可以避免关系阻抗不匹配(即使用ODB保存)和JDBC + I/O开销。
Do you know of other tools/frameworks or combination thereof already providing similar functionality, that I'm ignoring? Can you suggest other ways of tackling this 'getting rid of the DB'? What flaws do you already see in this idea? Concurrency-wise would it make sense to consider Scala instead of Java?
你知道有哪些工具/框架或它们的组合已经提供了类似的功能,而我却忽略了这些功能吗?你能提出解决这个问题的其他方法吗?你在这个想法中已经看到了哪些缺陷?从并发的角度来看,考虑Scala而不是Java有意义吗?
How about non-relational data stores such as Couch DB, Neo4j, HyperTable, HBase?
非关系数据存储(如Couch DB、Neo4j、HyperTable、HBase)又如何呢?
A similar question was asked one month ago - but there was no concrete solution.
一个月前曾有人提出过类似的问题,但没有具体的解决方案。
BTW I just stumbled upon the concept of Enterprise Data Fabric, which, to my surprise, describes a lot of these ideas.
我偶然发现了企业数据结构的概念,让我惊讶的是,它描述了很多这样的想法。
6 个解决方案
2
Definitely give Terracotta a try. It's free (unless you go Enterprise which has an SLA and support). It is a JVM-level cluster, so to speak, so you don't have the issues associated with sessions on multiple boxes behind disparate JK workers (assuming you're using this for a J2EE app).
一定要试试兵马俑。它是免费的(除非您去具有SLA和支持的企业)。可以说,它是一个jvm级别的集群,因此在不同的JK workers(假设您在J2EE应用程序中使用它)后面的多个盒子上没有与会话相关的问题。
I'm just rambling, so have a look here: http://en.wikipedia.org/wiki/Terracotta_Cluster
我只是随便说说,看看这里:http://en.wikipedia.org/wiki/Terracotta_Cluster
UPDATE numerous bits of info on Terracotta on the web too, e.g. http://blog.terracottatech.com/2007/12/fud_of_the_week_terracotta_doe.html
在网络上更新关于Terracotta的大量信息,例如http://blog.terracottatech.com/2007/12/fud_of_the_week_terracotta_doe.html
UPDATE2 Bit of background on my views: I work for a company with a fairly big audience. We have a enterprise MySQL running with a master and about 5 slaves (times 2 considering we have 2 channels, with 4 app servers per channel), using MySQL's JDBC Replication driver (for which we've already submitted various patches). We use Spring2.5/Hibernate3 using Spring's declarative JTA transaction management, so read-onlies go to the slaves. With the advent of numerous Ajax enhancements on a future version of our site, our DB servers' load has gone up - we create pricing summaries for thousands of products for all countries, taking into account duties/tax rules for all these countries (plus promotions and real-time auctions running all the time), then the Ajax services have the latest prices in a blink. Terracotta takes the load off the DB and app servers by making these prices available to all app servers on a JVM-layer, with all the JVMs across the boxes linked. So, server A can update the prices every few minutes, and if Ajax hits server B, the prices are available immediately. I know there are people/companies out there with similar businesses, who probably have better ideas and implementations, so I'm always open for discussion, but this is my two cents.
更新:我的观点:我在一家有相当大客户的公司工作。我们有一个企业MySQL,运行主服务器和大约5个从服务器(考虑到我们有两个通道,每个通道有4个应用服务器),使用MySQL的JDBC复制驱动程序(我们已经提交了各种补丁)。我们使用Spring2.5/Hibernate3使用Spring的声明性JTA事务管理,因此读-onlies将转到从服务器。随着大量Ajax增强在未来版本的我们的网站,我们的数据库服务器的负载上升——我们创造成千上万的产品定价总结所有国家,考虑到所有这些国家的关税/税规则(加上促销和实时拍卖运行所有的时间),然后Ajax服务拥有最新的价格在一个眨眼。Terracotta通过将这些价格提供给一个jvm层的所有应用程序服务器,并将所有的jvm都连接在一起,从而减轻了DB和app服务器的负载。因此,服务器A可以每隔几分钟更新一次价格,如果Ajax攻击服务器B,价格立即可用。我知道有些人/公司有类似的业务,他们可能有更好的想法和实现,所以我总是愿意讨论,但这是我的两点。
I get inspiration from the guys at Facebook too, for instance this very informative article: http://www.facebook.com/note.php?note_id=23844338919
我也从Facebook上的人那里得到了灵感,比如这篇非常有用的文章:http://www.facebook.com/note.php?
They talk about memcached which you should also definitely check out.
他们谈到memcached你也应该去看看。