8
Yes, data.table
automatically applies a tolerance when joining and grouping numeric
columns. The tolerance in v1.8.10 is sqrt(.Machine$double.eps) == 1.490116e-08
. This comes directly from ?base::all.equal
.
是的,data.table在连接和分组数字列时自动应用容差。 v1.8.10中的容差是sqrt(.Machine $ double.eps)== 1.490116e-08。这直接来自?base :: all.equal。
To illustrate, consider grouping :
为了说明,请考虑分组:
> dt
rnk val
1: 0.00000000 0.02337751
2: 0.09090909 0.02708315
3: 0.18181818 0.02750162
4: 0.27272727 0.02750162
> dt[,.N,by=val]
val N
1: 0.02337751 1
2: 0.02708315 1
3: 0.02750162 2 # one group, size two
>
When you joined using dt[J(x), roll="nearest"]
, that x
value matched to within tolerance and you got the group it matched to, as usual when a matching value occurs in a rolling join. roll="nearest"
only applies to the values that don't match, outside tolerance.
当您使用dt [J(x),roll =“nearest”]加入时,该x值匹配在容差范围内,并且您获得与之匹配的组,就像通常在滚动连接中出现匹配值一样。 roll =“nearest”仅适用于不匹配的值,超出容差范围。
data.table
considers the values in rows 3 and 4 of val
to be equal. The thinking behind this is for convenience, since most of the time key values are really a fixed precision such as prices ($1.23) or recorded measurements to a specified precision (1.234567). We'd like to join and group such numerics
even after multiplying them for example, without needing to code for machine accuracy ourselves. And we'd like to avoid confusion when numeric
data displays as though it's equal in a table, but isn't due to very tiny differences in the bit representation.
data.table认为val的第3行和第4行中的值相等。这背后的想法是为了方便,因为大多数时候键值实际上是固定的精度,例如价格(1.23美元)或记录的测量到指定的精度(1.234567)。我们希望加入并组合这些数字,即使在它们相乘之后,也不需要自己编码机器精度。我们希望避免在数字数据显示时在表格中相等的混淆,但不是由于位表示的微小差异。
See ?unique.data.table
for this example :
有关此示例,请参阅?unique.data.table:
DT = data.table(a=tan(pi*(1/4 + 1:10)), b=rep(1,10)) # example from ?all.equal
length(unique(DT$a)) # 10 strictly unique floating point values
all.equal(DT$a,rep(1,10)) # TRUE, all within tolerance of 1.0
DT[,which.min(a)] # row 10, the strictly smallest floating point value
identical(unique(DT),DT[1]) # TRUE, stable within tolerance
identical(unique(DT),DT[10]) # FALSE
data.table
is also stable within tolerance; i.e, when you group by a numeric
, the original order of the items within that group are maintained as usual.
data.table在容忍范围内也是稳定的;即,当您按数字分组时,该组中项目的原始顺序将照常维护。
> dt$val[3] dt[, row:=1:4] # add a row number to illustrate
> dt[, list(.N, list(row)), by=val]
val N V2
1: 0.02337751 1 1
2: 0.02708315 1 2
3: 0.02750162 2 3,4
> dt[3:4, val:=rev(val)] # swap the two values around
> dt$val[3] > dt$val[4]
[1] TRUE
> dt[, list(.N, list(row)), by=val]
val N V2
1: 0.02337751 1 1
2: 0.02708315 1 2
3: 0.02750162 2 3,4 # same result, consistent. stable within tolerance