R-按行X中的信息平均列

提问者：小点点

R-按行X中的信息平均列

我有一个data. frame，其中前13行包含站点/观察信息。每列代表1个人，但是大多数人都有A和B观察（尽管有些人只有A，而少数人有A、B和C观察）。我想平均每一行每个人，并根据这些信息创建一个新的data.frame。

示例（第1行、第7行、第13行和第56-61行的小子集）：

      OriginalID        Tree003A        Tree003B        Tree008B        Tree013A    
1           Township              LY              LY              LY              LY    
7         COFECHA ID        LY1A003A        LY1A003B        LY1A008B        LY1A013A    
13        PathLength         37.5455         54.8963         57.9732         64.0679    
56              2006           1.538           1.915           0.827           2.722    
57              2007           1.357           1.923           0.854           2.224    
58              2008           1.311           2.204           0.669           2.515    
59              2009           0.702           1.125           0.382           2.413    
60              2010           0.937           1.556           0.907           2.315    
61              2011           0.942           1.268           1.514           1.858

我想创建一个新的data. frame来平均每个人的年度观测值，无论他们有A、A和B，还是A B和C观测值。个人的ID在第7行（COFECHAID）：

预期输出：

      OriginalID           Tree003avg      Tree008avg        Tree013avg    
1           Township              LY              LY              LY    
7         COFECHA ID        LY1A003avg     LY1A008avg        LY1A013avg    
13        PathLength         46.2209         57.9732         64.0679    
56              2006           1.727           0.827           2.722    
57              2007           1.640           0.854           2.224    
58              2008           1.758           0.669           2.515    
59              2009           0.914           0.382           2.413    
60              2010           1.247           0.907           2.315    
61              2011           1.105           1.514           1.858

关于如何平均列的任何想法都很棒。我一直在尝试修改以下代码，但由于data. frame顶部有13行附加信息，我不知道如何指定仅平均行14:61。

rowMeans（子集（LY011B， select=c（"LY1A003A"，"LY1A003B"）），na.rm=TRUE）

我正在使用的一组更大的数据的代码是：

> dput(LY011B)
structure(list(OriginalTreeID = structure(c(58L, 53L, 57L, 59L, 
51L, 61L, 50L, 55L, 56L, 60L, 54L, 49L, 52L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 
32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 
45L, 46L, 47L, 48L), .Label = c("1964", "1965", "1966", "1967", 
"1968", "1969", "1970", "1971", "1972", "1973", "1974", "1975", 
"1976", "1977", "1978", "1979", "1980", "1981", "1982", "1983", 
"1984", "1985", "1986", "1987", "1988", "1989", "1990", "1991", 
"1992", "1993", "1994", "1995", "1996", "1997", "1998", "1999", 
"2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", 
"2008", "2009", "2010", "2011", "AnalysisDateTime", "COFECHA ID", 
"CoreLetter", "PathLength", "Plot#", "RingCount", "SiteID", "SP", 
"Subplot#", "Township", "Tree#", "YearLastRing", "YearLastWhiteWood"
), class = "factor"), Tree003A = structure(c(35L, 8L, 34L, 7L, 
34L, 21L, 36L, 31L, 37L, 30L, 32L, 29L, 33L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 23L, 22L, 25L, 28L, 27L, 24L, 26L, 20L, 16L, 
15L, 6L, 18L, 12L, 10L, 3L, 9L, 11L, 19L, 17L, 14L, 13L, 2L, 
4L, 5L), .Label = c("", "0.702", "0.803", "0.937", "0.942", "0.961", 
"003", "1", "1.09", "1.116", "1.124", "1.224", "1.311", "1.357", 
"1.471", "1.509", "1.538", "1.649", "1.679", "1.782", "1999", 
"2.084", "2.148", "2.162", "2.214", "2.313", "2.429", "2.848", 
"2/19/2014 11:06", "2011", "23017323011sp1", "24", "37.5455", 
"A", "LY", "LY1A003A", "sp1"), class = "factor"), Tree003B = structure(c(56L, 
19L, 54L, 18L, 55L, 49L, 57L, 51L, 58L, 50L, 52L, 48L, 53L, 1L, 
1L, 1L, 1L, 10L, 7L, 8L, 6L, 5L, 4L, 3L, 2L, 11L, 9L, 30L, 15L, 
24L, 20L, 23L, 33L, 37L, 42L, 13L, 44L, 36L, 12L, 16L, 21L, 27L, 
35L, 41L, 38L, 26L, 40L, 14L, 46L, 32L, 28L, 17L, 31L, 22L, 39L, 
43L, 45L, 47L, 25L, 34L, 29L), .Label = c("", "0.073", "0.092", 
"0.173", "0.174", "0.358", "0.413", "0.425", "0.58", "0.697", 
"0.719", "0.843", "0.883", "0.896", "0.937", "0.941", "0.964", 
"003", "1", "1.048", "1.067", "1.075", "1.097", "1.119", "1.125", 
"1.176", "1.207", "1.267", "1.268", "1.27", "1.297", "1.402", 
"1.429", "1.556", "1.662", "1.693", "1.704", "1.735", "1.76", 
"1.792", "1.816", "1.881", "1.915", "1.92", "1.923", "2.155", 
"2.204", "2/19/2014 11:06", "2000", "2011", "23017323011sp1", 
"48", "54.8963", "A", "B", "LY", "LY1A003B", "sp1"), class = "factor"), 
    Tree008B = structure(c(59L, 24L, 57L, 23L, 58L, 52L, 60L, 
    54L, 61L, 53L, 55L, 51L, 56L, 19L, 14L, 13L, 22L, 7L, 8L, 
    9L, 4L, 6L, 3L, 1L, 2L, 10L, 25L, 47L, 43L, 49L, 46L, 40L, 
    50L, 48L, 44L, 17L, 36L, 31L, 27L, 30L, 39L, 37L, 34L, 45L, 
    38L, 32L, 41L, 29L, 42L, 33L, 28L, 26L, 21L, 11L, 15L, 16L, 
    18L, 12L, 5L, 20L, 35L), .Label = c("0.302", "0.31", "0.318", 
    "0.357", "0.382", "0.412", "0.452", "0.476", "0.5", "0.539", 
    "0.591", "0.669", "0.673", "0.787", "0.79", "0.827", "0.835", 
    "0.854", "0.879", "0.907", "0.917", "0.967", "008", "1", 
    "1.027", "1.037", "1.141", "1.152", "1.172", "1.263", "1.383", 
    "1.411", "1.446", "1.498", "1.514", "1.611", "1.671", "1.685", 
    "1.695", "1.719", "1.783", "1.879", "1.884", "1.927", "1.97", 
    "2.019", "2.069", "2.35", "2.696", "2.979", "2/19/2014 11:06", 
    "2000", "2011", "23017323011sp1", "48", "57.9732", "A", "B", 
    "LY", "LY1A008B", "sp1"), class = "factor"), Tree013A = structure(c(45L, 
    6L, 44L, 5L, 44L, 38L, 46L, 40L, 47L, 39L, 42L, 37L, 43L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 10L, 
    13L, 8L, 22L, 14L, 18L, 24L, 4L, 11L, 25L, 7L, 36L, 41L, 
    33L, 29L, 17L, 28L, 23L, 21L, 16L, 26L, 15L, 3L, 20L, 12L, 
    2L, 9L, 34L, 35L, 27L, 32L, 31L, 30L, 19L), .Label = c("", 
    "0.608", "0.916", "0.945", "013", "1", "1.125", "1.18", "1.388", 
    "1.423", "1.493", "1.498", "1.554", "1.579", "1.619", "1.629", 
    "1.719", "1.756", "1.858", "1.867", "1.869", "1.876", "1.9", 
    "1.916", "2.023", "2.089", "2.224", "2.246", "2.247", "2.315", 
    "2.413", "2.515", "2.547", "2.645", "2.722", "2.785", "2/19/2014 11:11", 
    "2002", "2011", "23017323011sp1", "3.375", "34", "64.0679", 
    "A", "LY", "LY1A013A", "sp1"), class = "factor")), .Names = c("OriginalTreeID", 
"Tree003A", "Tree003B", "Tree008B", "Tree013A"), row.names = c(NA, 
61L), class = "data.frame")

共2个答案

匿名用户

这是另一种方法，其中大部分工作是通过使用reshape包重新排列数据来完成的。数据被“munging”后，可以使用cast函数将其重新排列成几乎任何你想要的东西。

# I'm used to the transpose
y = t(x)

# Make the first row the column names
# Also get rid of hashes. They make things difficult
library(stringr)
colnames(y) = str_replace( y[1,], "#", "" )
y = data.frame(y[-1,],check.names=FALSE)

# reshape the data by defining the "ID" variables
library(reshape)
z = melt(y,id.vars=c("Township","Plot","Subplot","Tree",
                     "CoreLetter","COFECHA ID","SiteID","SP","AnalysisDateTime"))
z$value = as.numeric(as.character(z$value))

# Now you can do lots of things!
# All the info you wanted is there, but it's in a different format
# than your "intended output"
cast( z, Tree ~ variable, mean, na.rm=TRUE )

# To get to your "intended output"
out = cast( z, Township + Plot + Subplot + Tree ~ variable, mean, na.rm=TRUE )
out[["COFECHA ID"]] = with(out,paste0(Township,Plot,Subplot,Tree,"avg"))
out2 = out[,c(1,ncol(out),8:(ncol(out)-1))]
out3 = cbind(colnames(out2),t(out2))
colnames(out3) = c("OriginalID",paste0("Tree",out$Tree,"avg"))

# For kicks, here are some other things. Have fun!
cast(z, Tree ~ variable, median, na.rm=TRUE ) # the median instead of the mean
cast(z, Tree + CoreLetter ~ variable ) # back to your original data
cast(z, CoreLetter ~ variable, length ) # How many measurements from each core?
cast(z, CoreLetter ~ variable, mean ) # The average across different cores

为了更多的乐趣！

library(ggplot2)
d = z[-c(1:16), ] # A not so pretty hack
colnames(d)[10] = "Year"
d$Year = as.integer(as.character(d$Year))
ggplot(d, aes(x=Year, y=value, group=Tree, color=Tree, shape=CoreLetter)) + 
  geom_point() + geom_smooth(method="loess",span=0.3)

这是否意味着21世纪初是干燥的？

匿名用户

试试这个……

d.f <- your data structure...above

子集数据

d.f <- d[-(1:13), -1]

c.n <- colnames(d.f)

构建分组变量

~~f~~

f <- d[4, 2:ncol(d)]

~~将数据帧拆分为子数据帧~~

d.f <- apply(d.f, 2, as.numeric)
d.f[is.na(d.f)] <- 0
d.f.g <- as.data.frame(t(d.f))
a <- split(d.f.g, f)

~~将分组平均值计算为colMeans（因为转置）~~

grp.means <- lapply(a, colMeans)

~~grp. means是一个数据帧列表，每个数据帧都包含每个grp的日期平均值。根据需要重新形成它，您可能需要再次转置。~~

相关问题 Android：在模块jefied-play-services-测量和jefied-play-services-测量-impl中发现重复类在Hashmap中查找匹配的键/值对如何迭代Hashmap并与同一Hashmap中的其他键进行组合以比较它们的对象 HashCode-如果相等的对象碰巧在同一个桶中散列会发生什么？如何防止对数组中类对象的重复引用？如何以及何时在HashMap中完成重新散列在hashmap或hashtable中重新散列的成本 HashMap如何识别内部数组中的哪些位置包含元素？当HashMap增加其大小时，HashMap中值的索引会发生什么？ @BeforeClass在ktor测试类中不工作 Jest vanilla JavaScript JSDOM刷新失败，切换beforeAll到before每一个后的第二次测试中断在笑话中，定义全局变量是否与在BeforeAll中定义相同？使用Junit重新运行失败的测试静态编程语言中@BeforeAll的正确解决方法是什么线程“main”java. lang.NoClassDefFoundError中的异常：在Intellij[Spring boot]中线程“main”java. lang.NoClassDefFoundError中的异常：org/apache/log4j/ProvisionNode 异步管道是否从服务中定义并从组件变量指向的可观察对象取消订阅？结合主体时不更新在模板中的异步管道可观察 AngularJs-RXJS可观察退订组件中的Angular 2重复订阅

~~热门标签 Java JavaScript Python PHP C# Android Html jQuery C++ Css IOS MySQL NodeJS~~

R-按行X中的信息平均列

共2个答案

相关问题

热门标签

微信关注