dropStop is a utility for removing certain words from text data
Arguments
- x
character vector of strings to be cleaned
- drop
character vector of words to scrub
- lower
logical, if TRUE, x converted with
tolower
- splitby
character, used with strsplit to tokenize
x
Examples
data(minicorpus)
minicorpus[1:3]
#> [1] "P493-6 treated with KJ-Pyr-9 and/or Doxycycline"
#> [2] "Enhanced MyoD-Induced Transdifferentiation to a Myogenic Lineage by Fusion to a Potent Transactivation Domain"
#> [3] "Osteosarcoma Genomics"
dropStop(minicorpus)[1:3]
#> [[1]]
#> [1] "p493-6" "treated" "kj-pyr-9" "and/or" "doxycycline"
#>
#> [[2]]
#> [1] "enhanced" "myod-induced" "transdifferentiation"
#> [4] "myogenic" "lineage" "fusion"
#> [7] "potent" "transactivation" "domain"
#>
#> [[3]]
#> [1] "osteosarcoma" "genomics"
#>