Remove dots from data column

remove dot from column names in r
how to remove dot after number in excel
str_detect remove
remove in r
string remove stringr
how to remove dot and comma in excel
stringr remove words
stringr remove r

I'm beginner dealing with R and working with strings. I've been trying to remove periods from data but unfortunately I can't find a solution.

This is the data I'm working on in a dataframe df:

df <- read.table(text = " n   mesAno          receita
                 97   1/2009 3.812.819.062,06
                 98   2/2009 4.039.362.599,36
                 99   3/2009 3.652.885.587,18
                 100  4/2009 3.460.247.960,02
                 101  5/2009 3.465.677.403,12
                 102  6/2009 3.131.903.622,55
                 103  7/2009 3.204.983.361,46
                 104  8/2009 3.811.786.009,24
                 105  9/2009 3.180.864.095,05
                 106 10/2009 3.352.535.553,88
                 107 11/2009 5.214.148.756,95
                 108 12/2009 4.491.795.201,50
                 109  1/2010 4.333.557.619,30
                 110  2/2010 4.808.488.277,86
                 111  3/2010 4.039.347.179,81
                 112  4/2010 3.867.676.530,69
                 113  5/2010 6.356.164.873,94
                 114  6/2010 3.961.793.391,19
                 115  7/2010    3797656130.81
                 116  8/2010    4709949715.37
                 117  9/2010    4047436592.12
                 118 10/2010    3923484635.28
                 119 11/2010    4821729985.03
                 120 12/2010    5024757038.22", 
header = TRUE, 
stringsAsFactors = TRUE)

My objective is to transform receita column to numeric as it's is being stored as factor. But applying conversion functions like as.numeric(as.factor(x)) does not work in the interval 97:114 (it coerces to NA's).

I suppose that this is because of the periods separating billion/million/thousands in this column. The mentioned conversion functions will work only if I have something like 3812819062.06 as in 115:120.

I tried mutating the dataset adding another column and modelling. I don't really know if what i'm doing is fine, but i also tried extracting the anomalous numbers to a variable, and applying sub/gsub on them but without success.

Is there some straight forward way of doing this, that is, instruct it to remove the 2 first occurrences of '.' and then replace the comma with a '.'? I'm very confident that the function i'm needing is gsub but i'm having a hard time finding the correct usage. Any help will be appreciated.

Edit: My approach using dplyr::mutate(). Ugly but works.

df <- df %>% 
mutate(receita_temp = receita) %>% 
mutate(dot_count = str_count(receita, '\\.')) %>% 
mutate(receita_temp = ifelse(dot_count == 3, 
                             gsub('\\.', '', as.factor(receita_temp)), 
                             gsub('\\,', '.',as.factor(receita_temp))
                             )) %>% 
mutate(receita_temp = ifelse(dot_count == 3,
                             gsub('\\,', '.',as.factor(receita_temp)),
                                  receita_temp)) %>% 
select(-c(dot_count, receita)) %>% 
rename(., receita = receita_temp)

You can use the following: first create a function that will be used for replacement:

repl = function(x)setNames(c("","."),c(".",","))[x]

This function takes in either "." or "," and returns "" or '.' respectively

Now use this function to replace

stringr::str_replace_all(as.character(df[,3]), "[.](?!\\d+$)|,", repl)

[1] "3812819062.06" "4039362599.36" "3652885587.18" "3460247960.02" "3465677403.12" "3131903622.55"
[7] "3204983361.46" "3811786009.24" "3180864095.05" "3352535553.88" "5214148756.95" "4491795201.50"
[13] "4333557619.30" "4808488277.86" "4039347179.81" "3867676530.69" "6356164873.94" "3961793391.19"
[19] "3797656130.81" "4709949715.37" "4047436592.12" "3923484635.28" "4821729985.03" "5024757038.22"

Of course you can do the rest. ie calling as.numeric() etc.

To do this in base R:

sub(',','.',gsub('[.](?!\\d+$)','',as.character(df[,3]),perl=T))

or If you know the exact number of . and , in your data, you could do

a = as.character(df[,3])
regmatches(a,gregexpr('[.](?!\\d+$)|,',df[,3],perl = T)) = list(c("","","","."))
a

Remove dots from data column, You can use the following: first create a function that will be used for replacement: repl = function(x)setNames(c("","."),c(".",","))[x]. This function  I am using SQL Server 2008. I migrated some data from MySql; the migration of data was successful but some values contain data with dots and commas. What is the best way to remove these characters from my data? In SQL SERVER you could use REPLACE.

I'm using regex and some stringr functions to remove all the periods except those followed by two digits and the end of the string. That way, periods denoting separation like in 3.811.786.009,24 are removed, but periods denoting the start of a decimal like in 4821729985.03 are not. Using str_remove_all rather than str_remove lets me not have to worry about removing the matches repeatedly or about how well it will scale. Then replace the remaining commas with periods, and make it numeric.

library(tidyverse)

df2 <- df %>%
  mutate(receita = str_remove_all(receita, "\\.(?!\\d{2,}$)") %>% 
           str_replace_all(",", ".") %>%
           as.numeric())

print(head(df2), digits = 12)
#>     n mesAno       receita
#> 1  97 1/2009 3812819062.06
#> 2  98 2/2009 4039362599.36
#> 3  99 3/2009 3652885587.18
#> 4 100 4/2009 3460247960.02
#> 5 101 5/2009 3465677403.12
#> 6 102 6/2009 3131903622.55

Created on 2018-09-04 by the reprex package (v0.2.0).

How to remove dot "." after the numerical values in Excel data file , Easy to solve. Go to variable view. click on Type. Change from numeric to string and the dots will disappear. The missing values in your output will also  Removes a column from the collection. Examples. The following code example illustrates the use of this method. For more information, see How to: Remove Autogenerated Columns from a Windows Forms DataGridView Control.

df$num <- as.numeric(sapply(as.character(si), function(x) gsub("\\,","\\.",ifelse(grepl("\\,", x), gsub("\\.","",x),x))))

should do the trick.

First, the function searches for rows with ",", removes "." in these rows, and last it converts all occurring "," into ".", so that it can be converted without problems to numeric.

Use print(df$num, digits = 12) to see your data with 2 decimals.

Removing any text after a dot in excel, Select the column with your data. On Data tab, click "text to columns", "delimited", and use a period as your delimiter. This will split all the segments  @Greg_Deckler: Text.Trim only removes characters at the start or at the end of a text.. My suggestion would be to convert the text to a list of characters, remove the unwanted characters and return the result back to a text.

Removing dots from a String, Removing dots from a String. Hi All, Please find the requirement. Excel sheet Input >> Database table output. I have a column called LEVEL in  I have a csv file with a "Prices" column. Right now entries look like 1,000 or 12,456. I could probably remove them in Excel and re-save but I want to know how I can transform the column to remove non-numeric characters so 'objects' like $1,299.99 will become 'float' 1299.99. Pandas has a built in replace method for "object" columns.

Remove Dots from text, help me with a macro to remove the dots (period) from Texts Column Example There are 1000 of Rows with data (given below is a random  Use this method to remove the specified DataGridColumn-derived column object from a DataGridColumnCollection collection. The DataGrid control does not store the contents of its Columns collection in the view state. To add or remove a column dynamically, you must programmatically add or remove the column every time the page is refreshed.

Remove dot at the end of many dates in column, Post a small Excel sheet (not a picture) showing realistic & representative sample data WITHOUT confidential information (10-20 rows, not thousands) and some​  First, I used the str.replace ('$','') method on the entire column. This is the most straightforward method, as it simply replaces the ‘$’ with a blank space for each item in the column. This was the slowest option, as you can see, but it still relatively quick like I mentioned above. That means it would only take about a second to do this

Comments
  • I'm sure there's a function that will do this more prettily, but you could always use gsub twice on the first 114 rows. Substitute "." for "" the first time, "," for "." the second and you'll be left with numbers matching your later entries. Just make sure you don't apply it to the whole column or you'll mess up the ones that are already formatted properly
  • Please add the code you tried to mutate the text.
  • @WiktorStribiżew, added as an edit.