dropStop is a utility for removing certain words from text data

dropStop(x, drop, lower = TRUE, splitby = " ")

Arguments

x

character vector of strings to be cleaned

drop

character vector of words to scrub

lower

logical, if TRUE, x converted with tolower

splitby

character, used with strsplit to tokenize x

Value

a list with one element per input string, split by " ", with elements in drop removed

Examples

data(minicorpus)
minicorpus[1:3]
#> [1] "P493-6 treated with KJ-Pyr-9 and/or Doxycycline"                                                              
#> [2] "Enhanced MyoD-Induced Transdifferentiation to a Myogenic Lineage by Fusion to a Potent Transactivation Domain"
#> [3] "Osteosarcoma Genomics"                                                                                        
dropStop(minicorpus)[1:3]
#> [[1]]
#> [1] "p493-6"      "treated"     "kj-pyr-9"    "and/or"      "doxycycline"
#> 
#> [[2]]
#> [1] "enhanced"             "myod-induced"         "transdifferentiation"
#> [4] "myogenic"             "lineage"              "fusion"              
#> [7] "potent"               "transactivation"      "domain"              
#> 
#> [[3]]
#> [1] "osteosarcoma" "genomics"    
#>