One big caveat with this is you need to make sure that each trait can only match one regular expression, otherwise you will end up with the trait counted multiple times in different trait categories. In order to solve your "OR" problem with nitrogen mass, I just changed the regular expression to "nitrogen.*mass". An example of using grep Let’s take a file called somefile. Sum(val), by=.(species, variable) # group by standardized trait J1 min 163 words Andrew Table of Contents An example of using grep An example of using multiple grep s All the ways to use grep multiple patterns If you need to match a string in a file, then grep is your friend. Value = TRUE, # filter for trait-val combinations that match Melt(DT, id.vars=names(df))[ # transform to long format The following example shows how to use this syntax in practice. The syntax is: Use single quotes in the pattern: grep pattern file1 file2 Next use extended regular expressions: grep -E pattern1pattern2. Traits <- c(nm="nitrogen.*mass", wd="wood den", ca="carbon.*area")ĭT[, # Add a column for each trait, indicating whether row matches the traitĭata.frame(sapply(trait.nm, function(x) grepl(traits, tr))) This particular syntax filters the data frame for rows where the value in the column called mycolumn contains one of the string patterns in the vector called mypatterns. create multiple columns with mutate): # setup regular expressions, etc. The grep method of multiple strings or patterns can be used if the operating system contains files with multiple strings and the user wants to target or reach. I know you're asking for dplyr, but unfortunately some of the issues I ran into exceeded my dplyr skills (e.g. The grep command is famous in Linux and Unix circles for three reasons. It also works with piped output from other commands. wood % filter(grepl(alltr, tr, ignore.case=T)) %>% summarize(ave = mean(val)) #gives an error, only takes first element in alltr The Linux grep command is a string and pattern matching utility that displays matching lines from multiple files. So far I have tried combining the different search strings, but this doesn't work. lmass % filter(grepl(lmass, tr, ignore.case=T)) %>% summarize(ave = mean(val))īut this is using 'or', whereas I want 'and'- requiring both strings, so that the final dataframe is a single average across all rows containing both nitrogen and mass (in column tr).Īdditionally I have many of these trait strings, and I want a dataframe at the end with averages for each of these traits per species. So far I have tried using grepl to create a vector of 'required' strings to filter by. I want to summarize across these using fuzzy matching for many different traits, but don't know how to implement this across many traits at the same time. You can see that it treats the same traits as separate, since they don't match completely. Here's a simplified dataset, of tree species and their traits: df % I want to summarize across those messy names using dplyr. I have a rather unorganized dataframe which has varying names for the same categories in one column.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |