我想对很多数据帧进行相关性分析-Iwanttodocorrelationformanydataframe

作者：liuyidii | 来源：互联网 | 2023-08-18 20:45

ihavealmost13filesandiwanttoconductthreetypesofcorrelationstoit.allthefileshavet

i have almost 13 files and i want to conduct three types of correlations to it. all the files have the same content except the values.

我有大约13个文件，我想对它进行三种类型的关联。除了值之外，所有的文件都有相同的内容。

for example:

例如:

v1 v2 v3 v4 v5 v6 v7 v8 ........... v50

vv3 v4 v5 v7v50

first correclation between v6 and v20 second correlation between v7 and v21 third correlation between v8 and v22

v7和v21之间的第二次相关v8和v22之间的第三次相关

my data have missing values.

我的数据缺少值。

doing it manually for each file will leads to a too long scrip, i want to do a loop function for all the files ( unfortunately im not expert in loop function and i tried so much) I need help please

为每个文件手工操作将导致一个太长的脚本，我想为所有的文件做一个循环函数(不幸的是，我不是循环函数的专家，我尝试了很多)我需要帮助

2 个解决方案

#1

If 'd1', 'd2', ...'d13' are the datasets and the columns are the in the same order, we can place the dataset in a list and get the cor for the specified columns. There are options in ?cor to compute the covariances in the presence of missing values. Here, I used na.or.complete. We can change it according to the need.

如果d1,d2,……“d13”是数据集，列是相同的顺序，我们可以将数据集放在一个列表中，并得到指定列的cor。在?cor中有一些选项可以在存在缺失值的情况下计算协方差。在这里,我使用na.or.complete。我们可以根据需要修改。

lapply(mget(paste0('d', 1:13)), function(x) 
      diag(cor(x[,6:8], x[,20:22], use='na.or.complete')))

It may be better to read the files into a list directly than creating individual data.frame objects in the global environment. Assuming that the files are all in the working directory.

直接将文件读入列表可能比在全局环境中创建单独的data.frame对象要好。假设文件都在工作目录中。

files <- list.files(pattern='file\\d+.txt')#change the pattern as needed
lapply(files, function(x) {
                x1 <- read.table(x, header=TRUE)
                diag(cor(x1[,6:8], x1[,20:22], use = 'na.or.complete'))})

#2

Here's a brute force version (with data generation included), it'll probably work for your purpose, a little more information about the structure of your data/task could help make this more efficient:

这里有一个蛮力版本(包含数据生成)，它可能适合您的目的，更多关于您的数据/任务结构的信息可能有助于提高效率:

N <- 10
k <- 50

d <- data.frame(matrix(runif(N * k), ncol = k))

sapply(20:k, function(col) cor(d[,col - 14], d[,col]))

Edit: Question has been edited, I'm not sure if this is actually what you're after now.

编辑:问题被编辑过了，我不确定这是不是你现在想要的。

我想对很多数据帧进行相关性分析-Iwanttodocorrelationformanydataframe

2 个解决方案

#1

#2

VScode格式化文档换行或不换行的设置方法

Android Studio Bumblebee | 2021.1.1（大黄蜂版本使用介绍）

C++字符字符串处理及字符集编码方案

clone的fork与pthread_create创建线程有何不同

向QTextEdit拖放文件的方法及实现步骤

Linux重启网络命令实例及关机和重启示例教程

二叉树层序创建问题的解决方法

CF：3D City Model（小思维）问题解析和代码实现

在mac环境下使用nginx配置nodejs代理服务器的步骤

MyBatis多表查询与动态SQL使用

r2dbc配置多数据源

MooTools和JQuery并排 - MooTools and JQuery Side by Side

JDK源码学习之HashTable(附带面试题)的学习笔记

iOS超签签名服务器搭建及其优劣势

Windows7 安装TensorflowGPU文档