data.tableroll=“nearest”返回多个结果-data.tableroll=“nearest”returnsmultipleresults

作者：我心永恒2602922374_902 | 来源：互联网 | 2023-09-25 18:46

Imattemptingtousedata.tabletomatchthenearestdecimalvalueinavectorbutamrunninginto

I'm attempting to use data.table to match the nearest decimal value in a vector but am running into a situation where more than one result is returned. The simplified example below returns two values, 0.1818182 0.2727273, but using a less precise value for x (e.g. 0.0275) returns a single match (0.1818182).

我正在尝试使用data.table来匹配向量中最接近的十进制值,但是遇到了返回多个结果的情况。下面的简化示例返回两个值,0.1818182 0.2727273,但使用较不精确的x值(例如0.0275)会返回单个匹配(0.1818182)。

x = 0.0275016249293408
dt = data.table(rnk = c(0, 0.0909090909090909, 
                        0.181818181818182, 0.272727272727273),
                val = c(0.0233775088495975, 0.0270831481152598, 
                        0.0275016216267234, 0.0275016249293408),
                key="val")
dt[J(x), roll="nearest"][, ifelse(is.na(val), NA_real_, rnk)]

I'm assuming the problem is related to the precision of the numeric values I'm using for this comparison. Is there a limitation to the decimal precision that can be used for a nearest match (i.e. do I need to round the data points)? Is there a better way to accomplish this nearest match?

我假设问题与我用于此比较的数值的精度有关。可以用于最近匹配的小数精度是否有限制(即我需要舍入数据点)?有没有更好的方法来完成这个最接近的比赛?

2 个解决方案

#1

Referring to Matt's answer there is an easy way to use all the 15 significant digits a double offers in order to properly select the closest matching row. Instead of working on the original values, one can scale the values up to ensure that the 15 significant digits lie above the 10^(-8) level. This could be done as follows:

参考Matt的答案,有一种简单的方法可以使用双重提供的所有15位有效数字,以便正确选择最接近的匹配行。可以将值向上扩展以确保15个有效数字位于10 ^( - 8)级别之上,而不是处理原始值。这可以如下完成:

orig_vals <- dt[,val]
scale_fact <- max(10^(trunc(log10(abs(orig_vals)))+8))
scaled_vals <- orig_vals * scale_fact
dt[,scaled_val:=scaled_vals]
setkey(dt,scaled_val)

Now, performing the rolling join

现在,执行滚动连接

scaled_x <- x*scale_fact
dt[J(scaled_x), roll="nearest"][, ifelse(is.na(val), NA_real_, rnk)]

# [1] 0.2727273

yields - as desired - a single value.

If also in the case of two identical key values only one row should be selected, the mult="first" argument can be added to the above data.table call.

产量 - 根据需要 - 单一价值。如果在两个相同的键值的情况下也只应选择一行,则可以将mult =“first”参数添加到上面的data.table调用中。

#2

Yes, data.table automatically applies a tolerance when joining and grouping numeric columns. The tolerance in v1.8.10 is sqrt(.Machine$double.eps) == 1.490116e-08. This comes directly from ?base::all.equal.

是的,data.table在连接和分组数字列时自动应用容差。 v1.8.10中的容差是sqrt(.Machine $ double.eps)== 1.490116e-08。这直接来自?base :: all.equal。

To illustrate, consider grouping :

为了说明,请考虑分组:

> dt
          rnk        val
1: 0.00000000 0.02337751
2: 0.09090909 0.02708315
3: 0.18181818 0.02750162
4: 0.27272727 0.02750162

> dt[,.N,by=val]
          val N
1: 0.02337751 1
2: 0.02708315 1
3: 0.02750162 2    # one group, size two
>

When you joined using dt[J(x), roll="nearest"], that x value matched to within tolerance and you got the group it matched to, as usual when a matching value occurs in a rolling join. roll="nearest" only applies to the values that don't match, outside tolerance.

当您使用dt [J(x),roll =“nearest”]加入时,该x值匹配在容差范围内,并且您获得与之匹配的组,就像通常在滚动连接中出现匹配值一样。 roll =“nearest”仅适用于不匹配的值,超出容差范围。

data.table considers the values in rows 3 and 4 of val to be equal. The thinking behind this is for convenience, since most of the time key values are really a fixed precision such as prices ($1.23) or recorded measurements to a specified precision (1.234567). We'd like to join and group such numerics even after multiplying them for example, without needing to code for machine accuracy ourselves. And we'd like to avoid confusion when numeric data displays as though it's equal in a table, but isn't due to very tiny differences in the bit representation.

data.table认为val的第3行和第4行中的值相等。这背后的想法是为了方便,因为大多数时候键值实际上是固定的精度,例如价格(1.23美元)或记录的测量到指定的精度(1.234567)。我们希望加入并组合这些数字,即使在它们相乘之后,也不需要自己编码机器精度。我们希望避免在数字数据显示时在表格中相等的混淆,但不是由于位表示的微小差异。

See ?unique.data.table for this example :

有关此示例,请参阅?unique.data.table:

DT = data.table(a=tan(pi*(1/4 + 1:10)), b=rep(1,10))   # example from ?all.equal
length(unique(DT$a))         # 10 strictly unique floating point values
all.equal(DT$a,rep(1,10))    # TRUE, all within tolerance of 1.0
DT[,which.min(a)]            # row 10, the strictly smallest floating point value
identical(unique(DT),DT[1])  # TRUE, stable within tolerance
identical(unique(DT),DT[10]) # FALSE

data.table is also stable within tolerance; i.e, when you group by a numeric, the original order of the items within that group are maintained as usual.

data.table在容忍范围内也是稳定的;即,当您按数字分组时,该组中项目的原始顺序将照常维护。

> dt$val[3]  dt[, row:=1:4]  # add a row number to illustrate
> dt[, list(.N, list(row)), by=val]
          val N  V2
1: 0.02337751 1   1
2: 0.02708315 1   2
3: 0.02750162 2 3,4
> dt[3:4, val:=rev(val)]   # swap the two values around
> dt$val[3] > dt$val[4]
[1] TRUE
> dt[, list(.N, list(row)), by=val]
          val N  V2
1: 0.02337751 1   1
2: 0.02708315 1   2
3: 0.02750162 2 3,4    # same result, consistent. stable within tolerance

推荐阅读

get
在范围[0..n-1]中产生m个不同的随机数 - Generating m distinct random numbers in the range [0..n-1]

Ihavetwomethodsofgeneratingmdistinctrandomnumbersintherange[0..n-1]我有两种方法在范围[0.n-1]中生 ... [详细]

蜡笔小新 2024-11-13 09:49:14
get
单片微机原理P3：80C51外部拓展系统

　　外部拓展其实是个相对来说很好玩的章节，可以真正开始用单片机写程序了，比较重要的是外部存储器拓展，81C55拓展，矩阵键盘，动态显示，DAC和ADC。0.IO接口电路概念与存 ... [详细]

蜡笔小新 2024-11-12 19:51:29
instance
计算机视觉领域介绍 | 自然语言驱动的跨模态行人重识别前沿技术综述（上篇）

本文介绍了计算机视觉领域的最新进展，特别是自然语言驱动的跨模态行人重识别技术。上篇内容详细探讨了该领域的基础理论、关键技术及当前的研究热点，为读者提供了全面的概述。 ... [详细]

蜡笔小新 2024-11-07 12:41:08
timestamp
如何更有效地提升对支持部门的协助与支撑？ - Enhancing Support for the Support Department: Strategies and Best Practices

尽管我们尽最大努力，任何软件开发过程中都难免会出现缺陷。为了更有效地提升对支持部门的协助与支撑，本文探讨了多种策略和最佳实践，旨在通过改进沟通、增强培训和支持流程来减少这些缺陷的影响，并提高整体服务质量和客户满意度。 ... [详细]

蜡笔小新 2024-11-07 06:55:33
get
杜甫《喜晴》的两种英译比较

本文对比了杜甫《喜晴》的两种英文翻译版本：a. Pleased with Sunny Weather 和 b. Rejoicing in Clearing Weather。a 版由 alexcwlin 翻译并经 Adam Lam 编辑，b 版则由哈佛大学的宇文所安教授 (Prof. Stephen Owen) 翻译。 ... [详细]

蜡笔小新 2024-11-12 15:02:28
get
javascript分页类支持页码格式

前端时间因为项目需要，要对一个产品下所有的附属图片进行分页显示，没考虑ajax一张张请求，所以干脆一次性全部把图片out，然 ... [详细]

蜡笔小新 2024-11-12 14:58:57
php
解决Bootstrap DataTable Ajax请求重复问题

在最近的一个项目中，我们使用了JQuery DataTable进行数据展示，虽然使用起来非常方便，但在测试过程中发现了一个问题：当查询条件改变时，有时查询结果的数据不正确。通过FireBug调试发现，点击搜索按钮时，会发送两次Ajax请求，一次是原条件的请求，一次是新条件的请求。 ... [详细]

蜡笔小新 2024-11-12 13:59:27
php
poj 3352 Road Construction

poj 3352 Road Construction ... [详细]

蜡笔小新 2024-11-12 11:24:39
header
如何将Python与Excel高效结合：常用操作技巧解析

本文深入探讨了如何将Python与Excel高效结合，涵盖了一系列实用的操作技巧。文章内容详尽，步骤清晰，注重细节处理，旨在帮助读者掌握Python与Excel之间的无缝对接方法，提升数据处理效率。 ... [详细]

蜡笔小新 2024-11-11 15:18:30
substring
如何使用 `org.eclipse.rdf4j.query.impl.MapBindingSet.getValue()` 方法及其代码示例详解

如何使用 `org.eclipse.rdf4j.query.impl.MapBindingSet.getValue()` 方法及其代码示例详解 ... [详细]

蜡笔小新 2024-11-11 02:42:52
instance
ESP8266 01S Web 服务器成功启动：详细解决方案与实践指南

本文详细介绍了一种利用 ESP8266 01S 模块构建 Web 服务器的成功实践方案。通过具体的代码示例和详细的步骤说明，帮助读者快速掌握该模块的使用方法。在疫情期间，作者重新审视并研究了这一未被充分利用的模块，最终成功实现了 Web 服务器的功能。本文不仅提供了完整的代码实现，还涵盖了调试过程中遇到的常见问题及其解决方法，为初学者提供了宝贵的参考。 ... [详细]

蜡笔小新 2024-11-08 19:12:49
match
2019年寒假强化训练：二分算法深度解析与实战演练

在2019年寒假强化训练中，我们深入探讨了二分算法的理论与实践应用。问题A聚焦于使用递归方法实现二分查找。具体而言，给定一个已按升序排列且无重复元素的数组，用户需从键盘输入一个数值X，通过二分查找法判断该数值是否存在于数组中。输入的第一行为一个正整数，表示数组的长度。这一训练不仅强化了对递归算法的理解，还提升了实际编程能力。 ... [详细]

蜡笔小新 2024-11-08 16:59:56
get
深入解析 Android 中 EditText 的 getLayoutParams 方法及其代码应用实例

深入解析 Android 中 EditText 的 getLayoutParams 方法及其代码应用实例 ... [详细]

蜡笔小新 2024-11-07 20:50:46
regex
Python内置模块详解：正则表达式re模块的应用与解析

正则表达式是一种强大的文本处理工具，通过特定的字符序列来定义搜索模式。本文详细介绍了Python内置的`re`模块，探讨了其在字符串匹配、验证和提取中的应用。例如，可以通过正则表达式验证电子邮件地址、电话号码、QQ号、密码、URL和IP地址等。此外，文章还深入解析了`re`模块的各种函数和方法，提供了丰富的示例代码，帮助读者更好地理解和使用这一工具。 ... [详细]

蜡笔小新 2024-11-07 17:25:01
php
Java环境中Selenium Chrome驱动在大规模Web应用扩展时的性能限制分析

Java环境中Selenium Chrome驱动在大规模Web应用扩展时的性能限制分析 ... [详细]

蜡笔小新 2024-11-07 10:10:30

我心永恒2602922374_902

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章