提问者:小点点

为R中具有All As OR As和Bs的学生创建虚拟变量(Tidyverse)


下午好,

我需要创建一个本学期所有A的学生名单。我需要创建另一个本学期所有Bs的学生名单。我不知道如何用我拥有的数据来完成这件事。下面是我拥有的和我正在寻找的。有什么想法吗?

original_df <- 
  tribble(~id, ~subject, ~grade,
          "001", "ela", "A+",
          "001", "math", "A",
          "001", "science", "A-",
          "002", "ela", "A",
          "002", "math", "B+",
          "002", "science", "B-",
          "003", "ela", "A",
          "003", "math", "A",
          "003", "science", "A-",
          "004", "ela", "C",
          "004", "math", "C",
          "004", "science", "A+",
          )

summarized_df <- 
  tribble(~id, ~all_As, ~As_and_Bs,
          "001", 1, 0, 
          "002", 0, 1, 
          "003", 1, 0,
          "004", 0, 0
          )

共3个答案

匿名用户

一种方法是按“id”分组后,使用正则表达式检查“A”,或者通过删除点提取字母并检查“A”、“B”的所有是否存在

library(dplyr)
library(stringr)
original_df %>%
   group_by(id) %>% 
   summarise(all_As = +(all(str_detect(grade, 'A'))),
     As_and_Bs = +(all(c('A', 'B') %in% str_remove(grade, '[-+]'))),
        .groups = 'drop')

-输出

# A tibble: 4 x 3
#  id    all_As As_and_Bs
#* <chr>  <int>     <int>
#1 001        1         0
#2 002        0         1
#3 003        1         0
#4 004        0         0

或者正如@BenBolker在评论中提到的

original_df %>%
   group_by(id) %>% 
   summarise(all_As=all(grepl("^A",grade)),
             As_and_Bs=!all_As && all(grepl("^[AB]",grade)))

匿名用户

一个data. table选项

setDT(original_df)[
  ,
  .(
    all_As = +!var(startsWith(grade, "A")),
    As_and_Bs = +all(c("A", "B") %in% substr(grade, 1, 1))
  ), id
]

    id all_As As_and_Bs
1: 001      1         0
2: 002      0         1
3: 003      1         0
4: 004      0         0

匿名用户

另一个data. table选项,尝试将函数和输入尽可能分开,使其灵活。

library(data.table)
setDT(original_df)

only <- function(x,y) all(x == y)
incl <- function(x,y) all(x %in% y)

original_df[
  , 
  Map(
    function(l,f) f(l, substr(grade, 1, 1)),
    list(all_as = "A", all_bs = "B", as_and_bs = c("A","B")),
    c(only, only, incl)
  ),
  by=id
]

#    id all_as all_bs as_and_bs
#1: 001   TRUE  FALSE     FALSE
#2: 002  FALSE  FALSE      TRUE
#3: 003   TRUE  FALSE     FALSE
#4: 004  FALSE  FALSE     FALSE

tidyverse翻译:

original_df %>%
  group_by(id) %>%
  mutate(subgrade = substr(grade,1,1)) %>%
  summarise(
    across(
      c(subgrade),
      list(
        all_as    = ~only(x="A", y=.x),
        all_bs    = ~only(x="B", y=.x),
        as_and_bs = ~incl(x=c("A","B"), y=.x)
      ),
      .names="{fn}"
    )
  )

#`summarise()` ungrouping output (override with `.groups` argument)
## A tibble: 4 x 4
#  id    all_as all_bs as_and_bs
#  <chr> <lgl>  <lgl>  <lgl>    
#1 001   TRUE   FALSE  FALSE    
#2 002   FALSE  FALSE  TRUE     
#3 003   TRUE   FALSE  FALSE    
#4 004   FALSE  FALSE  FALSE