How to turn str_extract_all into multiple columns

r extract string between characters
stringr tidyverse
r extract string before character
str_extract multiple patterns
extract(r)
r regex extract
regex r
r regex cheat sheet

Here is the text:

  data$charge[1]
  [1] "Count #1 as Filed: In Violation of; 21 O.S. 645; Count #2 as Filed: In Violation of; 21 O.S. 1541.1;Docket 1"

I am currently trying to extract statutes from legal data. My code looks like this:

str_extract_all(data$charge[1:3], "(?<=Violation of;)(\\D|\\d){4,20}(?=;Count |;Docket)") 

[[1]]
[1] "21 O.S. 645"      "21 O.S. 1541.1"

[[2]]
[1]  "21 O.S. 1435     "21 O.S. 1760(A)(1)

[[3]]
[1]   "21 O.S. 1592"

And I'd like to add them as columns to a data frame like this:

id           name           statute1           statute2           statute3
1           BLACK, JOHN     21 O.S. 645        21 O.S. 1541.1     NA
2           DOE, JANE       21 O.S. 1435       21 O.S. 1760(A)(1) NA
3           ROSS, BOB       21 O.S. 1592       NA                 NA

Thank you! Does that make sense?

Since you haven't included a reproducible example of your data or expected output, I can't be sure, but I think what you're looking for is the simplify = TRUE argument for str_extract_all.

From the examples on ?str_extract_all:

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")

# without simplify = TRUE
str_extract_all(shopping_list, "\\b[a-z]+\\b")
[[1]]
[1] "apples"

[[2]]
[1] "bag"   "of"    "flour"

[[3]]
[1] "bag"   "of"    "sugar"

[[4]]
[1] "milk"

# with simplify = TRUE
str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE)
     [,1]     [,2] [,3]   
[1,] "apples" ""   ""     
[2,] "bag"    "of" "flour"
[3,] "bag"    "of" "sugar"
[4,] "milk"   ""   ""     

Using your added example:

dat <- "Count #1 as Filed: In Violation of; 21 O.S. 645; Count #2 as Filed: In Violation of; 21 O.S. 1541.1;Docket 1"

str_extract_all(dat, "(?<=Violation of;)(\\D|\\d){4,20}(?=;Count |;Docket)",
                simplify = TRUE)

     [,1]             
[1,] " 21 O.S. 1541.1"

Extract a character column into multiple columns using regular , Given a regular expression with capturing groups, extract() turns each group into a new column. If the groups don't match, or the input is NA, the output will be  0 Convert a list into a data frame Aug 6 '18 0 Using an if statement to replace NA values in a column Aug 7 '18 0 How to turn str_extract_all into multiple columns Aug 8 '18

You can do this with the tidyverse package. The regex pattern from your sample doesn't work for some of the sample text provided because it always needs a trailing semicolon. The pattern used below should be simpler, but might need some tweaking depending on the actual text.

library(tidyverse)

df %>% 
  mutate(charges = str_extract_all(charge, "(?<=Violation of;\\s).+?(?=(;|$))")) %>% # extracts the different charges
  select(-charge) %>%  # dropping the raw text can be skipped
  unnest(charges) %>%  # seperates the different charges for each name
  group_by(name) %>%   # in this sample there is only a name, but hopefully the real data has some sort of unique id - there could be lots of Jane Doe's in this data
  mutate(statute = paste0('statute', row_number())) %>% # adds a statute number to each charge
  spread(statute, charges) # shift the data from long to wide

# A tibble: 3 x 3
# Groups:   name [3]
  name       statute1        statute2             
  <chr>      <chr>           <chr>                
1 BLACK,JOHN 21 O.S. 645  21 O.S. 1541.1    
2 DOE, JANE  21 O.S. 1435 21 O.S. 1760(A)(1)
3 ROSS, BOB  21 O.S. 1592 NA      

Sample data:

df <- data_frame(name = c('BLACK,JOHN', 'DOE, JANE', 'ROSS, BOB'), 
                 charge = c('Count #1 as Filed: In Violation of; 21 O.S. 645; Count #2 as Filed: In Violation of; 21 O.S. 1541.1;Docket 1',
                            'Count #3 as Filed: In Violation of; 21 O.S. 1435; Count #4 as Filed: In Violation of; 21 O.S. 1760(A)(1)',
                            'Count #2 as Filed: In Violation of; 21 O.S. 1592'))

14 Strings, Write a function that turns (e.g.) a vector c("a", "b", "c") into the string a, b, and c . Typically, however, your strings will be one column of a data frame, and you'll 1​\nLine 2\nLine 3" str_extract_all(x, "^Line")[[1]] #> [1] "Line" str_extract_all(x,  Convert columns into multiple rows in pandas dataframe. I want to combine all the quarters into one new column and copy the deal number, year and financial data

This is by far not the most efficient solution, but compared to others, one that I could understand:

df = tribble(
  ~foo,
  "1,2",
  "3,4"
)

df %>% mutate(
  col1 = str_extract_all(foo, "\\d+", simplify = TRUE)[,1],
  col2 = str_extract_all(foo, "\\d+", simplify = TRUE)[,2],
)

Returns:

# A tibble: 2 x 3
  foo   col1  col2 
  <chr> <chr> <chr>
1 1,2   1     2    
2 3,4   3     4 

Getting started with stringr for textual analysis in R, As with most operations in R, there are multiple ways to approach combining data frames. need to make sure the same number of rows are in each data frame). The same goes for columns (except I think about a Roman pillar). stringr::str_extract_all(string = headlines_var, pattern = "[\d]") # Error: '\d' is  Select the cell, range, or entire column that contains the text values that you want to split. On the Data tab, in the Data Tools group, click Text to Columns. Follow the instructions in the Convert Text to Columns Wizard to specify how you want to divide the text into separate columns.

[PDF] Work with strings with stringr : : CHEAT SHEET, multiple strings into a single string. str_c(letters, LETTERS) str_to_lower(string, locale = "en")1 Convert strings to lower case. string, as a vector. Also str_extract_all to return every pattern matrix with a column for each ( ) group in pattern. The following example converts every four rows of data in a column to four columns of data in a single row (similar to a database field and record layout). This is a similar scenario as that which you experience when you open a worksheet or text file that contains data in a mailing label format.

11 Working with Text, We've even gone so far as to convert categorical data (such as the type of medal Note that we're using str_extract_all() here, so that our regex doesn't just pick up the plenty of further specifications you can use, to make your code ever more efficient. You might notice that our columns are still character vectors, though. Select the column with the data you want to split into multiple columns. Step 2 Click "DATA" and then select "Text to Columns" to open the "Convert Text to Columns" wizard. Step 3

Creating Multiple Columns from String Extraction, So str_extract_all() will return a list of vectors, how to get them into a data.frame can be seen in other corners of StackOverflow. Comments. This answer could help  You can paste data as transposed data within your workbook. Transpose reorients the content of copied cells when pasting. Data in rows is pasted into columns and vice versa. Here's how you can transpose cell content: Copy the cell range. Select the empty cells where you want to paste the transposed data. On the Home tab, click the Paste icon

Comments
  • I think we could use a reproducible example.
  • Do you mean the text I'm extracting from?
  • Yes, we can't solve your problem if we can't recreate it. Read How to make a great R reproducible example
  • Actually this worked! Thank you! Now would you know how to turn this output into data frame columns?
  • Are you sure there are no typos and that you're using str_extract_all? That error happens when you use an argument that's not recognized by the function, usually because of a spelling error, or a misplaced parentheses that associates the argument with a different function than you intended.
  • No you were right, the simplify=TRUE worked! I just need to turn the output into data frame columns