removing sequences from the data Pandas Python Numpy

pandas dataframe
pandas drop rows with condition
pandas merge
rename column pandas
pandas iloc
drop first row pandas
drop multiple rows pandas
pandas drop index

I have tried the following:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_csv("training.csv")
>>> data_raw = df.values
>>> data = []
>>> seq_len = 5
>>> for index in range(len(data_raw) - seq_len):
...     data.append(data_raw[index: index + seq_len])
...
>>> len(data)
1994
>>> len(data_raw)
1999
>>> del data[0]

The data is available here: training.csv I have seen that the del removes the first element from the array. And rearrange the values like what was on 1st position, is now the 0th position, and so on. I want to remove the values at indices: 0,4,5,9,10,14, and so on. But this is not getting possible with the current del statement as it will rearrange the values. Please help me find the missing part.

To start with, desired removal indices: 0,4,5,9,10,14,15,19,20,24,25,29... can be generated:

indices = []
for i in range(1,401):
    indices.append(5*(i-1))
    indices.append(5*i-1)
del indices[-1] # This is to remove 1999, which is out of index for df
print(indices[:12])
[0, 4, 5, 9, 10, 14, 15, 19, 20, 24, 25, 29]

Then using np.delete:

data_raw = np.random.randint(0, 10, size=(1999, 10))
new_data = np.delete(data_raw, indices, axis=0) # Since this is not inplace op

Validation:

np.array_equal(new_data[:6],data_raw[[1,2,3,6,7,8]])
                                      # Where 0,4,5,9 is removed
# True

pandas: Delete rows, columns from DataFrame with drop(), DataFrame.drop — pandas 0.21.1 documentation Here, the following If no row name is set, by default index will be a sequence of integers. The data frame data looks like this: pid tag 1 23 1 45 1 62 2 24 2 45 3 34 3 25 3 62 Now I count the number of tag occurrences like this: bytag = data.groupby('tag').aggregate(np.count_nonzero) But then I can't figure out how to remove those entries which have low count

you can do it like this

example code:

index = [0,4,5,9,10,14]
for i, x in enumerate(index):
    index[i] -= i

print(index)


for i in index:
    del data[i]

How to drop one or multiple columns from Pandas Dataframe, To make use of any python library, we first need to load them up by using import command. import pandas as pd import numpy as np. Let's create� Python for Data Analysis by Wes McKinney, the creator of Pandas Pandas Cookbook by Ted Petrou, a data science trainer and consultant Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.

Here's a simple way to overcome this:

a = list(range(10))
remove = [0,4,5]

Say you want to remove the indices in remove from a. What you can do is sort the elements in remove in reverse order, and then remove them from a in a for loop as:

for i in sorted(remove, reverse=True):
    del a[i] 

Output

[1, 2, 3, 6, 7, 8, 9]

The Pandas DataFrame: Make Working With Data Delightful – Real , Pandas DataFrame Labels as Sequences; Data as NumPy Arrays; Data Calculating With Missing Data; Filling Missing Data; Deleting Rows� arr = np.array ( [4, 5, 6, 7, 8, 9, 10, 11, 4, 5, 6, 33, 6, 7]) arr = np.array ( [4, 5, 6, 7, 8, 9, 10, 11, 4, 5, 6, 33, 6, 7]) Now let’s delete all occurrences of 6 from the above numpy array using np.argwhere () & np.delete () i.e. # Single line solution to delete all occurrences of element with value 6.

another way to do that

a = list(range(10))

print(a)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

to_drop = [0,4,5,9] #indices to drop

values = [a[i] for i in to_drop] # values corresponding to the indices

new_v = [a.remove(v) for v in values] # new list after dropping the values

Output

[1, 2, 3, 6, 7, 8]

I mean remove = [0,4,5,9], this should be the sequence in the remove list if the array is or 10 values. How I can create it dynamically?

This is for 100 values of array. Generated the indices where it needs to be dropped for batch size of 10. Do correct me if I have interpreted wrongly

to_drop = [[j+(i*10) for j in [0,4,5,9]] for i in range(10)]

O/P

[[0, 4, 5, 9],
 [10, 14, 15, 19],
 [20, 24, 25, 29],
 [30, 34, 35, 39],
 [40, 44, 45, 49],
 [50, 54, 55, 59],
 [60, 64, 65, 69],
 [70, 74, 75, 79],
 [80, 84, 85, 89],
 [90, 94, 95, 99]]

Indexing and Selecting Data — pandas 0.13.1 documentation, The Python and NumPy indexing operators [] and attribute operator . provide the original DataFrame, with True wherever the element is in the sequence of values. Slightly nicer by removing the parentheses (by binding making comparison� Data as NumPy Arrays. Sometimes you might want to extract data from a Pandas DataFrame without its labels. To get a NumPy array with the unlabeled data, you can use either .to_numpy() or .values: >>>

pandas.DataFrame — pandas 0.18.1 documentation, data : numpy ndarray (structured or homogeneous), dict, or DataFrame dicts of Series, arrays, or dicts; DataFrame.from_items: from sequence of (key, value) pairs **kwargs), Return DataFrame with duplicate rows removed, optionally only. DataFrame.dropna () Python’s pandas library provides a function to remove rows or columns from a dataframe which contain missing values or NaN i.e. DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)

Indexing and selecting data — pandas 0.8.1 documentation, Similar to numpy ndarrays, pandas Index, Series, and DataFrame also If you want to identify and remove duplicate rows in a DataFrame, there are two Sometimes you want to extract a set of values given a sequence of row labels and� Python | Delete rows/columns from DataFrame using Pandas.drop() Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

pandas.DataFrame.reset_index — pandas 1.0.5 documentation, pandas.DataFrame.reset_index�. DataFrame. reset_index (self, level: Union[ Hashable, Sequence[Hashable], NoneType] = None, drop: bool = False, inplace: � Although lists, NumPy arrays, and Pandas dataframes can all be used to hold a sequence of data, these data structures are built for different purposes. Lists are simple Python built-in data structures, which can be easily used as a container to hold a dynamically changing data sequence of different data types, including integer, float, and object.

Comments
  • If you don't want to change your list size, you should replace the deleted indices with a constant value, Am I right?
  • What is your rule of generating indices? is it 5(n-1) and 5n - 1?
  • @Chris actually that is becoming a mystery for me :P. the sequence I found a pattern was like this 0,4,5,9,10,14,15,19,20,24,25,29 and so on. I am unable to figure it out whether what formula the pattern resembles.
  • @JafferWilson Gotcha. I'll upload a post accordingly.
  • Thank you everyone for the answers. It was amazing.
  • No sir, what you have suggested will edit the dataframe and not the array. I want the sequence from array to get removed and not from the dataframe. Do not want to disturb original frames.
  • i was thinking like you first remove and the do df.values to get the array.
  • do you want to preserve the initial dataframe?
  • No sir. Not from the dataframes. I want the sequence from array to get removed. Do you wanna say that it will be the same then I guess you are wrong.
  • I do not want to copy a dataframe. as it will be safe if I operate on the arrays only and not the dataframe level
  • Sir is there a way I create the sequence list to be removed till the data range. Say I have 10 elements, so how I can create the remove list till the array range?
  • You mean how can you modify remove so that no elements are greater than the length of a?
  • Yes, sir. I mean if I need to create the array on my own, because data can be huge and need the sequence to be remove. Can you help me.
  • No sir, I mean remove = [0,4,5,9], this should be the sequence in the remove list if the array is or 10 values. How I can create it dynamically? because imagine sir I have array of 100, how I manually can create the remove list? Please sir can you add that part in the answer it will be great.
  • But I don't understand what crietria you are following to create the sequence remove?