I am currently developing a proof of concept for an alternative data store. The reason why is I need to enhance a read-mostly clustered webapp, but also because I want to free myself from the pain of the sometimes overly-complex ORM+RDBMS solution.


Overall the idea is quite similar to a distributed cache with persistence (letting the cluster be the SoR), however:


  • want to be able to retrieve any object along with its children, by id (providing class & id) [only that to start off, as the main querying part is already resolved with lucene in my app].
  希望能够通过id(提供类和id)来检索任何对象及其子对象(只需要从它开始,因为主要的查询部分已经在我的应用程序中通过lucene解析)。
  • need to have map of maps of types ( ~ tables in the relational world), and therein distributed maps of 'dehydrated' stored objects (flattening the object graph via reflection deep cloning)
  需要有类型映射(关系世界中的表),以及"脱水"存储对象的分布式映射(通过反射深度克隆来扁平化对象图)
  • a bin log (like Prevayler, for example) for
    • eventual recovery if whole cluster goes down
    如果整个集群崩溃,最终的恢复
    • development (and ability to refactor code / change structure)
    开发(以及重构代码/更改结构的能力)
    • perhaps asynchronously processed for other purposes (reporting, whatever)
    可能为了其他目的异步处理(报告,无论什么)
  • 如果整个集群的开发(以及重构代码/更改结构的能力)可能会被异步处理(报告,无论如何),那么一个bin日志(比如Prevayler)将最终恢复。
  • eventually later on try to integrate a statically-typed query mechanism, like LINQ, Jaque or H2's JaQu / see ODBs / Lucene (?)
  最后,尝试集成静态类型的查询机制,如LINQ、Jaque或H2的jticus / see ODBs / Lucene (?)
  • it has to be transaction-aware (not sure "JTA type" though)
  它必须具有事务意识(虽然不确定是"JTA类型")

I'm planning to implement this idea with Hazelcast (I love its super-simple API) or Terracotta (which I never used - but I'm aware of their 'sweet spot', medium-term data). If you will, my aim is to do more or less what Jonas once blogged about. Using one of these, stored data would roughly have to fit in the sum of the JVM heaps of the cluster.


This should be pretty simple to scale, would avoid the relational impedance mismatch (ie save as with an ODB) and JDBC + I/O overhead.

这应该很容易伸缩,可以避免关系阻抗不匹配(即使用ODB保存)和JDBC + I/O开销。

Do you know of other tools/frameworks or combination thereof already providing similar functionality, that I'm ignoring? Can you suggest other ways of tackling this 'getting rid of the DB'? What flaws do you already see in this idea? Concurrency-wise would it make sense to consider Scala instead of Java?


How about non-relational data stores such as Couch DB, Neo4j, HyperTable, HBase?

非关系数据存储(如Couch DB、Neo4j、HyperTable、HBase)又如何呢?

A similar question was asked one month ago - but there was no concrete solution.


BTW I just stumbled upon the concept of Enterprise Data Fabric, which, to my surprise, describes a lot of these ideas.


6 个解决方案



Definitely give Terracotta a try. It's free (unless you go Enterprise which has an SLA and support). It is a JVM-level cluster, so to speak, so you don't have the issues associated with sessions on multiple boxes behind disparate JK workers (assuming you're using this for a J2EE app).

一定要试试兵马俑。它是免费的(除非您去具有SLA和支持的企业)。可以说,它是一个jvm级别的集群,因此在不同的JK workers(假设您在J2EE应用程序中使用它)后面的多个盒子上没有与会话相关的问题。

I'm just rambling, so have a look here: http://en.wikipedia.org/wiki/Terracotta_Cluster


UPDATE numerous bits of info on Terracotta on the web too, e.g. http://blog.terracottatech.com/2007/12/fud_of_the_week_terracotta_doe.html


UPDATE2 Bit of background on my views: I work for a company with a fairly big audience. We have a enterprise MySQL running with a master and about 5 slaves (times 2 considering we have 2 channels, with 4 app servers per channel), using MySQL's JDBC Replication driver (for which we've already submitted various patches). We use Spring2.5/Hibernate3 using Spring's declarative JTA transaction management, so read-onlies go to the slaves. With the advent of numerous Ajax enhancements on a future version of our site, our DB servers' load has gone up - we create pricing summaries for thousands of products for all countries, taking into account duties/tax rules for all these countries (plus promotions and real-time auctions running all the time), then the Ajax services have the latest prices in a blink. Terracotta takes the load off the DB and app servers by making these prices available to all app servers on a JVM-layer, with all the JVMs across the boxes linked. So, server A can update the prices every few minutes, and if Ajax hits server B, the prices are available immediately. I know there are people/companies out there with similar businesses, who probably have better ideas and implementations, so I'm always open for discussion, but this is my two cents.


I get inspiration from the guys at Facebook too, for instance this very informative article: http://www.facebook.com/note.php?note_id=23844338919


They talk about memcached which you should also definitely check out.




As Neo4j is mentioned in the question, I'm chiming in with a few thoughts on using a graph database in this case. (I'm part of the Neo4j team)


  • retrieving children is trivial in a graph db
  在图db中,检索子节点很简单
  • there is a map implementation for neo4j
  neo4j有一个映射实现
  • as graphs are native to a graph db you could consider not to flatten the object graph, but to persist data in nodes and edges/relationships (this gives you more flexibility in handling the data)
  由于图是图形db的原生图形,所以您可以考虑不使对象图变平,而是将数据保存在节点和边/关系中(这使您在处理数据时更加灵活)
  • neo4j is fully transactional
  neo4j完全事务性

With the new DB technologies emerging today, there's really no need to stay with a RDBMS if your data isn't a good fit for the relational paradigm.




Seems to me Terracotta is a perfect fit for your requirements:


  • cluster a map to retrieve children via keys (e.g. clustered Map)
  群集一个映射以通过键检索子映射(例如,群集映射)
  • map of maps - no problem
  地图地图-没问题
  • no explicit bin log - but Terracotta already persists everything to disk so full cluster restart is already supported
  没有显式的bin日志——但是Terracotta已经将所有内容保存到磁盘中,所以已经支持完全集群重新启动
  • integrated already to Compass, Hibernate Search, and Lucene for search
  已经集成到Compass、Hibernate搜索和Lucene搜索中
  • Transactions? Too slow. Use the cache as a datastore. With persistence you won't lose data writing to (clustered) memory and trickle back to the DB.
  • 事务?太慢了。使用缓存作为数据存储。使用持久性,您不会丢失数据写入(集群)内存并返回到DB。

In addition, Terracotta does the "reflection" thing you ask for - although it doesn't use reflection as that is far too slow. It uses BCM. Only changes are propagated on the network.


Hazelcast btw requires serialization so it will be slow and will not do well at all with a map of maps data structure (every put will result in a full deep clone copy across the network) and it doesn't have any kind of persistence built in.

Hazelcast btw需要串行化,所以它会很慢,而且在映射数据结构的映射上不会做得很好(每次放置都会在网络上产生一个完整的深度克隆拷贝),而且它没有内置任何持久性。





I have a view that we all develop a zoo which comprises all the abstraction layers we habitually use in our projects. And each abstraction layer is a completely different animal.


My goal is to minimize the amount of time spent on just care and feeding of the animals whenever it diverts me from solving the problem at hand - it's overhead - wasted resources. So the fewer, simpler abstraction layers we can get away with, the more productive we are.


I can usually do just fine with two beasties - OOP and RDBMS, coupled through nice, simple, minimal, hand-crafted DAL. For me, ORM is mostly overhead - one abstraction too many, and a pretty hungry one.


Don't discount the option of treating stored procedures as an abstraction tool, either. If you're real comfortable with SQL, it can be a useful resource for implementing a light-weight BL facade that means not needing to think about the ORM problem.

也不要放弃将存储过程视为抽象工具的选择。如果您对SQL非常熟悉,那么它可以成为实现轻量级BL facade的有用资源,这意味着不需要考虑ORM问题。

And this post suggests the emergence of alternatives to RDBMS for some requirements, anyway.




Thanks for your answers.


Actually, you talk about DBs which is something I want to completely take out of the picture.


The use case I'm targetting is a startup's small/medium-sized clustered webapp (boxes in a LAN, or in the cloud). It needs to retrieve objects at ~RAM-speed levels and scale fairly easily. As a side-effect, one wouldn't have to think about DB server installations, impedance mismatch, JDBC, caches, polluting domain models with annotations, etc.

我在target中的用例是一个初创公司的小型/中型集群webapp (LAN中的盒子,或者云中的盒子)。它需要以一定的速度级别检索对象,并且很容易扩展。作为副作用,您不必考虑DB服务器安装、阻抗不匹配、JDBC、缓存、使用注释污染域模型等等。

Again, what I want to accomplish is something like described here, and I would love to have some more feedback on ideas concerning the actual implementation (why use Terracotta instead of Hazelcast, use serialization or deep cloning via reflection or whatever else, and also the major drawbacks of an approach like this - eg. why wouldn't you change it for your current ORM/DB setup).

再一次,我想要实现的是这里所描述,我更爱有一些反馈关于实际实现的想法(为什么使用Terracotta Hazelcast,而是使用序列化或深克隆通过反射或者其他,还有这样的一种方法——如的主要缺点。为什么不为当前的ORM/DB设置更改它呢?

It has to be super simple to integrate so it'll feature a really neat Java API, improving code readability. No other software (DB, memcached will be required).

集成起来必须非常简单,这样才能提供一个非常简洁的Java API,提高代码的可读性。不需要其他软件(DB, memcached)。



Try GigaSpaces. I think they have exactly what you require, and if I'm not mistaken there's a free version for startups.


Some concepts:


  • "Space" is some place where you can store and retrieve objects
  "空间"是可以存储和检索对象的地方
  • Space can be backed by any JDBC-compliant DB, automatically (no code, only configuration)
  空间可以由任何兼容jdbc的DB自动支持(没有代码,只有配置)
  • Space can be started in your java process, so all accesses are at RAM speed
  可以在java进程中启动空间,因此所有访问都以RAM速度进行
  • Space can be clustered/partitioned in any way you want (full mirror, partial, grid).
  空间可以以任何您想要的方式进行集群/分区(全镜像、部分、网格)。
  • Space supports distributed or local transactions
  空间支持分布式或本地事务

Check their wiki, (but only "programmer's guide" - all the rest is marketing BS).


