当前位置: 开发笔记 > 编程语言 > 正文

neo4jcypher_Neo4j：Cypher–使用关系检测重复项

作者：囡囡需要嗳 | 来源：互联网 | 2023-08-20 10:49

neo4jcypher几个月来，我一直在建立计算机科学论文的图表，现在，我已经加载了几千本，我意识到其中有很多重复。它们不是

neo4j cypher

几个月来&＃xff0c;我一直在建立计算机科学论文的图表&＃xff0c;现在&＃xff0c;我已经加载了几千本&＃xff0c;我意识到其中有很多重复。

它们不是重复的&＃xff0c;因为有多个条目具有相同的标识符&＃xff0c;但是具有不同的标识符&＃xff0c;但似乎是同一篇论文&＃xff01;

例如&＃xff0c;有几篇名为“在Taos操作系统中进行身份验证”的论文&＃xff1a;

http://dl.acm.org/citation.cfm?id&＃61;174614

2016-07-20_11-43-00

http://dl.acm.org/citation.cfm?id&＃61;168640

2016-07-20_11-43-38

据我所知&＃xff0c;这是同一篇论文发表在两个不同的期刊上。

现在&＃xff0c;在这种情况下&＃xff0c;很容易对这些论文的标题进行字符串相似性比较&＃xff0c;并意识到它们是相同的。我以前曾使用过出色的重复数据删除库来执行此操作&＃xff0c;并且在Berlin Buzzwords 2014上也有一篇精彩的演讲&＃xff0c;作者使用局部敏感的哈希来实现类似的结果。

但是&＃xff0c;我很好奇我是否可以使用这些论文必须检测到的任何重复关系&＃xff0c;而不仅仅是依靠字符串匹配。

该图如下所示&＃xff1a;

我们将从编写查询开始&＃xff0c;以查看不同的Taos论文有多少个通用参考文献&＃xff1a;

MATCH (r:Resource {id: "168640"})-[:REFERENCES]->(other) WITH r, COLLECT(other) as myReferencesUNWIND myReferences AS reference OPTIONAL MATCH path &＃61; (other)-[:REFERENCES]->(reference) WITH other, COUNT(path) AS otherReferences, SIZE(myReferences) AS myReferences WITH other, 1.0 * otherReferences / myReferences AS similarity WHERE similarity > 0.5RETURN other.id, other.title, similarity ORDER BY similarity DESC LIMIT 10

╒════════╤═══════════════════════════════════════════╤══════════╕ │other.id│other.title │similarity│ ╞════════╪═══════════════════════════════════════════╪══════════╡ │168640 │Authentication in the Taos operating system│1 │ ├────────┼───────────────────────────────────────────┼──────────┤ │174614 │Authentication in the Taos operating system│1 │ └────────┴───────────────────────────────────────────┴──────────┘

该查询&＃xff1a;

选择道教论文之一并找到参考
查找其他引用相同论文的论文
根据他们有多少个普通参考文献来计算相似性分数
返回具有超过50&＃xff05;的相同参考文献的论文&＃xff0c;而最相似的论文在顶部

我在其他论文上尝试了一下&＃xff0c;看看效果如何&＃xff1a;

Firefly RPC的性能

╒════════╤════════════════════════════════════════════════════════════════╤══════════════════╕ │other.id│other.title │similarity │ ╞════════╪════════════════════════════════════════════════════════════════╪══════════════════╡ │74859 │Performance of Firefly RPC │1 │ ├────────┼────────────────────────────────────────────────────────────────┼──────────────────┤ │77653 │Performance of the Firefly RPC │0.8333333333333334│ ├────────┼────────────────────────────────────────────────────────────────┼──────────────────┤ │110815 │The X-Kernel: An Architecture for Implementing Network Protocols│0.6666666666666666│ ├────────┼────────────────────────────────────────────────────────────────┼──────────────────┤ │96281 │Experiences with the Amoeba distributed operating system │0.6666666666666666│ ├────────┼────────────────────────────────────────────────────────────────┼──────────────────┤ │74861 │Lightweight remote procedure call │0.6666666666666666│ ├────────┼────────────────────────────────────────────────────────────────┼──────────────────┤ │106985 │The interaction of architecture and operating system design │0.6666666666666666│ ├────────┼────────────────────────────────────────────────────────────────┼──────────────────┤ │77650 │Lightweight remote procedure call │0.6666666666666666│ └────────┴────────────────────────────────────────────────────────────────┴──────────────────┘

分布式系统中的认证&＃xff1a;理论与实践

╒════════╤══════════════════════════════════════════════════════════╤══════════════════╕ │other.id│other.title │similarity │ ╞════════╪══════════════════════════════════════════════════════════╪══════════════════╡ │121160 │Authentication in distributed systems: theory and practice│1 │ ├────────┼──────────────────────────────────────────────────────────┼──────────────────┤ │138874 │Authentication in distributed systems: theory and practice│0.9090909090909091│ └────────┴──────────────────────────────────────────────────────────┴──────────────────┘

遗憾的是&＃xff0c;这并不像在参考文献中找到100&＃xff05;匹配项那么简单&＃xff01; 我希望以后的论文修订会增加更多的内容&＃xff0c;因此会增加参考文献。

如果我们也寻找作者相似之处怎么办&＃xff1f;

MATCH (r:Resource {id: "121160"})-[:REFERENCES]->(other) WITH r, COLLECT(other) as myReferencesUNWIND myReferences AS reference OPTIONAL MATCH path &＃61; (other)-[:REFERENCES]->(reference) WITH r, other, authorSimilarity, COUNT(path) AS otherReferences, SIZE(myReferences) AS myReferences WITH r, other, authorSimilarity, 1.0 * otherReferences / myReferences AS referenceSimilarity WHERE referenceSimilarity > 0.5MATCH (r)<-[:AUTHORED]-(author) WITH r, myReferences, COLLECT(author) AS myAuthorsUNWIND myAuthors AS author OPTIONAL MATCH path &＃61; (other)<-[:AUTHORED]-(author) WITH other, myReferences, COUNT(path) AS otherAuthors, SIZE(myAuthors) AS myAuthors WITH other, myReferences, 1.0 * otherAuthors / myAuthors AS authorSimilarity WHERE authorSimilarity > 0.5RETURN other.id, other.title, referenceSimilarity, authorSimilarity ORDER BY (referenceSimilarity &＃43; authorSimilarity) DESC LIMIT 10

╒════════╤══════════════════════════════════════════════════════════╤═══════════════════╤════════════════╕ │other.id│other.title │referenceSimilarity│authorSimilarity│ ╞════════╪══════════════════════════════════════════════════════════╪═══════════════════╪════════════════╡ │121160 │Authentication in distributed systems: theory and practice│1 │1 │ ├────────┼──────────────────────────────────────────────────────────┼───────────────────┼────────────────┤ │138874 │Authentication in distributed systems: theory and practice│0.9090909090909091 │1 │ └────────┴──────────────────────────────────────────────────────────┴───────────────────┴────────────────┘

╒════════╤══════════════════════════════╤═══════════════════╤════════════════╕ │other.id│other.title │referenceSimilarity│authorSimilarity│ ╞════════╪══════════════════════════════╪═══════════════════╪════════════════╡ │74859 │Performance of Firefly RPC │1 │1 │ ├────────┼──────────────────────────────┼───────────────────┼────────────────┤ │77653 │Performance of the Firefly RPC│0.8333333333333334 │1 │ └────────┴──────────────────────────────┴───────────────────┴────────────────┘

我敢肯定&＃xff0c;我还能找到其他一些论文&＃xff0c;但这些相似之处都不奏效&＃xff0c;但这是一个有趣的开始。

我认为下一步是建立一套训练对&＃xff0c;这些训练对是相互相似和不相似的。然后&＃xff0c;我们可以训练一个分类器来确定两个文档是否相同。

但这是另一天&＃xff01;

翻译自: https://www.javacodegeeks.com/2016/07/neo4j-cypher-detecting-duplicates-using-relationships.html

neo4j cypher

推荐阅读

header
计算机网络四

大三上结束之际，从网上找来一些关于计算机网络的知识作为总结，本文四篇笔记全部转自猪头任（博客地址：http:www.cnbl ... [详细]

蜡笔小新 2024-09-26 20:26:13
header
CloudStack 4.0 + KVM 安装详细指南

nsitionalENhttp:www.w3.orgTRxhtml1DTDxhtml1-transitional.dtd ... [详细]

蜡笔小新 2024-09-30 12:48:34
go
delphi控件大全

本文章已收录于：delphi控件查询：http:www.torry.nethttp:www.jrsoftware.orgTb97最有名的工具条(ToolBar) ... [详细]

蜡笔小新 2024-09-30 11:49:36
const
struts2的零配置

最近开始关注struts2的新特性，从这个版本开始，Struts开始使用convention-plugin代替codebehind-plugin来实现s ... [详细]

蜡笔小新 2024-09-25 15:21:19
const
android listview OnItemClickListener失效原因

最近在做listview时发现OnItemClickListener失效的问题，经过查找发现是因为button的原因。不仅listitem中存在button会影响OnItemClickListener事件的失效，还会导致单击后listview每个item的背景改变，使得item中的所有有关焦点的事件都失效。本文给出了一个范例来说明这种情况，并提供了解决方法。 ... [详细]

蜡笔小新 2023-12-14 14:25:50
main
ETC 纹理压缩和 Alpha 通道处理

转自：http:malideveloper.arm.comcndevelop-for-malisample-codeetcv1-texture-compression-and-alpha- ... [详细]

蜡笔小新 2024-09-30 20:00:46
filter
crossorigin注解添加了解决不了跨域问题_CORS与@CrossOrigin详解

1、跨域的基本概念a、跨域的解释要了解跨域，首先需要知晓浏览器的同源策略，简单的说就是两个请求协议、端口、主机都相同，则两个请求具有相同的 ... [详细]

蜡笔小新 2024-09-30 19:24:12
filter
HTTP 请求/响应的步骤

HTTP请求响应的步骤第一步：第二步：第三步：第四步：第五步第一步：1.客户端连接到Web服务器⼀个HTTP ... [详细]

蜡笔小新 2024-09-30 16:44:08
filter
JS动态生成表格案例

JS动态生成表格案例 ... [详细]

蜡笔小新 2024-09-30 10:33:54
filter
什么是API接口？给大家举例说明

Api接口也就是所谓的应用程序接口，api接口的全称是ApplicationProgramInterface，通过API接口可以实现计算机软件之间的相互 ... [详细]

蜡笔小新 2024-09-28 15:48:11
foreach
Lodash 中文文档 (v3.10.1)“Collection” 要领

Lodash中文文档(v3.10.1)–“Collection”要领TranslatedbyPeckZegOriginalDocs:Lodashv3.10.1Docs乞助翻译文档的 ... [详细]

蜡笔小新 2024-09-28 08:08:39
schema
Android(8) RecyclerView适配器实现多布局item+item内部控件点击事件

先看看效果是不是自己想要的吧item及item内部控件点击事件不懂的可以先点击查看 ... [详细]

蜡笔小新 2024-09-27 18:42:37
go
ideavim 100个实用映射

配 ... [详细]

蜡笔小新 2024-09-25 13:08:33
go
Hadoop2.6.0 + 云centos +伪分布式只谈部署

3.0.3玩不好，现将2.6.0tar.gz上传到usr,chmod-Rhadoop:hadophadoop-2.6.0，rm掉3.0.32.在etcp ... [详细]

蜡笔小新 2023-10-17 19:28:24
go
[翻译]微服务设计模式5. 服务发现服务端服务发现

服务之间需要互相调用，在单体架构中，服务之间的互相调用直接通过编程语言层面的方法调用就搞定了。在传统的分布式应用的部署中，服务地 ... [详细]

蜡笔小新 2023-10-17 18:03:57

囡囡需要嗳

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章