下午好,
我需要创建一个本学期所有A的学生名单。我需要创建另一个本学期所有Bs的学生名单。我不知道如何用我拥有的数据来完成这件事。下面是我拥有的和我正在寻找的。有什么想法吗?
original_df <-
tribble(~id, ~subject, ~grade,
"001", "ela", "A+",
"001", "math", "A",
"001", "science", "A-",
"002", "ela", "A",
"002", "math", "B+",
"002", "science", "B-",
"003", "ela", "A",
"003", "math", "A",
"003", "science", "A-",
"004", "ela", "C",
"004", "math", "C",
"004", "science", "A+",
)
summarized_df <-
tribble(~id, ~all_As, ~As_and_Bs,
"001", 1, 0,
"002", 0, 1,
"003", 1, 0,
"004", 0, 0
)
一种方法是按“id”分组后,使用正则表达式检查“A”,或者通过删除点提取字母并检查“A”、“B”的所有是否存在
library(dplyr)
library(stringr)
original_df %>%
group_by(id) %>%
summarise(all_As = +(all(str_detect(grade, 'A'))),
As_and_Bs = +(all(c('A', 'B') %in% str_remove(grade, '[-+]'))),
.groups = 'drop')
-输出
# A tibble: 4 x 3
# id all_As As_and_Bs
#* <chr> <int> <int>
#1 001 1 0
#2 002 0 1
#3 003 1 0
#4 004 0 0
或者正如@BenBolker在评论中提到的
original_df %>%
group_by(id) %>%
summarise(all_As=all(grepl("^A",grade)),
As_and_Bs=!all_As && all(grepl("^[AB]",grade)))
一个data. table
选项
setDT(original_df)[
,
.(
all_As = +!var(startsWith(grade, "A")),
As_and_Bs = +all(c("A", "B") %in% substr(grade, 1, 1))
), id
]
给
id all_As As_and_Bs
1: 001 1 0
2: 002 0 1
3: 003 1 0
4: 004 0 0
另一个data. table选项,尝试将函数和输入尽可能分开,使其灵活。
library(data.table)
setDT(original_df)
only <- function(x,y) all(x == y)
incl <- function(x,y) all(x %in% y)
original_df[
,
Map(
function(l,f) f(l, substr(grade, 1, 1)),
list(all_as = "A", all_bs = "B", as_and_bs = c("A","B")),
c(only, only, incl)
),
by=id
]
# id all_as all_bs as_and_bs
#1: 001 TRUE FALSE FALSE
#2: 002 FALSE FALSE TRUE
#3: 003 TRUE FALSE FALSE
#4: 004 FALSE FALSE FALSE
tidyverse翻译:
original_df %>%
group_by(id) %>%
mutate(subgrade = substr(grade,1,1)) %>%
summarise(
across(
c(subgrade),
list(
all_as = ~only(x="A", y=.x),
all_bs = ~only(x="B", y=.x),
as_and_bs = ~incl(x=c("A","B"), y=.x)
),
.names="{fn}"
)
)
#`summarise()` ungrouping output (override with `.groups` argument)
## A tibble: 4 x 4
# id all_as all_bs as_and_bs
# <chr> <lgl> <lgl> <lgl>
#1 001 TRUE FALSE FALSE
#2 002 FALSE FALSE TRUE
#3 003 TRUE FALSE FALSE
#4 004 FALSE FALSE FALSE