dropStop is a utility for removing certain words from text data
dropStop(x, drop, lower = TRUE, splitby = " ")
character vector of strings to be cleaned
character vector of words to scrub
logical, if TRUE, x converted with tolower
character, used with strsplit to tokenize x
a list with one element per input string, split by " ", with elements in drop
removed
data(minicorpus)
minicorpus[1:3]
#> [1] "P493-6 treated with KJ-Pyr-9 and/or Doxycycline"
#> [2] "Enhanced MyoD-Induced Transdifferentiation to a Myogenic Lineage by Fusion to a Potent Transactivation Domain"
#> [3] "Osteosarcoma Genomics"
dropStop(minicorpus)[1:3]
#> [[1]]
#> [1] "p493-6" "treated" "kj-pyr-9" "and/or" "doxycycline"
#>
#> [[2]]
#> [1] "enhanced" "myod-induced" "transdifferentiation"
#> [4] "myogenic" "lineage" "fusion"
#> [7] "potent" "transactivation" "domain"
#>
#> [[3]]
#> [1] "osteosarcoma" "genomics"
#>