Efficient way to replace strings in a list of lists based on a dict

Efficient way to replace strings in a list of lists based on a dict

python replace list element with dictionary
python list of lists
python list replace element at index
python lists replace element
python list replace element with multiple elements
replace one value with another in list python
replace one list with another python
iterate through list of lists python

I have a list of lists that contain classificatory labels for a certain domain. Example:

data = [
    ['polmone', 'linfonodi'],
    ['osso'],
    ['polmone'],
    ['linfonodi', 'osso', 'polmone'],
    ['peritoneo', 'osso'],
    ['fegato'],
    ['polmone', 'linfonodi'],
    ['osso'],
    ['osso', 'fegato'],
]

The list has 331 lists and each of them can contain one or all the possible labels. The number of possible labels is 20.

I need to feed the list of lists of labels to a sklearn.neighbors.KNeighborsClassifier and was thinking of converting each possible label to a number (e.g. 0-19).

I was wondering about a most efficient way to perform this conversion.

I guess the 'stupid' way could be that of creating a dictionary with each unique label and the corresponding value, as in:

{'polmone': 0, 'linfonodi': 1, ..., 'label_19': 19}

...and then iterate over each element of the list and perform a str.replace().

I feel there should be a more efficient solution. Do you advice any?

Thanks in advance.

P.S. I searched for a similar topic, but couldn't find one. If I mistakenly didn't notice it, feel free to close this thread and send me to hell.

Edit:

First of all, I'd like to thank everyone for their answers, as every one of them has come to help for different issues I was encountering and I will encounter.

Now I want to share another solution I just found when dealing with KNeighborClassifier and a multiple-output target. By feeding the encoded labels (both as strings or as integers, and both as simple lists or as numpy arrays), I had the following error:

Traceback (most recent call last):
  File "embedding_gensim.py", line 111, in <module>
    neigh.fit(doc_train, labls_train)
  File "/home/matteo/anaconda3/envs/deep_l/lib/python3.7/site-packages/sklearn/neighbors/base.py", line 906, in fit
    check_classification_targets(y)
  File "/home/matteo/anaconda3/envs/deep_l/lib/python3.7/site-packages/sklearn/utils/multiclass.py", line 169, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'unknown'

I found that MultiLabelBinarizer solves the problem of feeding the classifier with a multi-label list of lists (or numpy arrays).

So, following @Alexander Rossa's solution:

binarized_labels = MultiLabelBinarizer().fit_transform(encoded_labels_list)

binarized_labels then is like:

[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0]
...

The MultiLabelBinarizer() actually works directly with the lists of strings in split_labels . Perhaps I am tackling the problem from the wrong perspective.


What you can do is use a LabelEncoder to create the dictionary:

from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
labels = ['polmone', 'fegato', 'linfonodi']
encoded_labels = label_encoder.fit_transform(labels)
labels_dict = {}
for i in range(len(labels)):
    labels_dict[labels[i]] = encoded_labels[i]

print(labels_dict)

That gives you {'polmone': 2, 'fegato': 0, 'linfonodi': 1}.

This can be especially helpful when you have many more labels to encode and replace and doing this by hand is not feasible.

Your whole solution can then look something like this:

from sklearn.preprocessing import LabelEncoder

data = [
    ['polmone', 'linfonodi'],
    ['osso'],
    ['polmone'],
    ['linfonodi', 'osso', 'polmone'],
    ['peritoneo', 'osso'],
    ['fegato'],
    ['polmone', 'linfonodi'],
    ['osso'],
    ['osso', 'fegato'],
]

# get labels programatically from your data
labels = []
for nested_list in data:
    for label in nested_list:
        if label not in labels:
            labels.append(label)

label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(labels)
labels_dict = {}
for i in range(len(labels)):
    labels_dict[labels[i]] = encoded_labels[i]

encoded_data = []
for labels_list in data:
    # for each label in a nested list replace it with the encoded value from dict
    encoded_data_list = [l.replace(l, str(labels_dict[l])) for l in labels_list]
    encoded_data.append(encoded_data_list)

The encoded data for the data you supplied will look like this:

>>> encoded_data
[['4', '1'], ['2'], ['4'], ['1', '2', '4'], ['3', '2'], ['0'], ['4', '1'], ['2'], ['2', '0']]

Iterate over large list of lists and replace its elements, suggest whether there's a way to do this task much faster/more efficiently. This way you will loop over the list only once: We can either use dict.get here and avoid an if condition: it's usually a better idea to return the list object from the function and re-assign to the variable from where it was called. Let’s discuss certain methods by which dictionary of list can be converted to the corresponding list of dictionaries. Method #1 : Using list comprehension We can use list comprehension as the one-liner alternative to perform various naive tasks providing readability with a more concise code.


This works for me:

pandas.factorize( ['B', 'C', 'D', 'B'] )[0]

Output:

[0, 1, 2, 0]

Try to look up at hot encoding as well and tranforming categorical into numeric.

Sams Teach Yourself Python in 24 Hours, How could the restaurant from earlier examples benefit from using objects? Methods. str short description- change title print str meal type-str FIGURE 10.7 Menu He could just make a list of lists, or use a dictionary, but he realizes he may  Some useful methods of Lists. Slicing a List. In List, we can take portions (including the lower but not the upper limit). List slicing is the method of splitting a subset of a list, and the


I think that a Label encoder is what you need.

As specified in the doc this lib transfrom efficiently your labels in an integer sequence.

What you should do is something like:

from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
labelencoder.fit(["label1", "label2", ...])
for curr_labels_list in all_labels_list:
    res = labelencoder.fit_transform(curr_labels_list)

Python, Given a list of strings, write a Python program to convert each element of the given One drawback of this method is that it does not work with integer list as ' int' List comprehension is an efficient approach as it doesn't make use of extra space. last_page Python | Create simple animation for console-based application. Lists are just like the arrays, declared in other languages. Lists need not be homogeneous always which makes it a most powerful tool in Python. A single list may contain DataTypes like Integers, Strings, as well as Objects. Lists are mutable, and hence, they can be altered even after their creation. Example:


Python, Python | Replace elements in second list with index of same element in first list. Given two lists of strings, where first list contains all elements of second list, the task is to replace every element in Method #2: Using List comprehension We use cookies to ensure you have the best browsing experience on our website. Method #2 : Using map() + lambda + replace() The combination of these functions can also be used to perform this particular task. The map and lambda help to perform the task same as list comprehension and replace method is used to perform the replace functionality.


Python List of Lists – A Helpful Illustrated Guide to Nested Lists in , Due its simplicity and efficiency, the first list comprehension method is There are three main ways to convert a list of lists into a dictionary in Python (source): You can check the type of the output by using the built-in type() function: > If one changes this nested list in the original list, the change would not be visible at the  One of these operations could be that we want to remap the values of a specific column in the DataFrame. Let’s discuss several ways in which we can do that. Given a Dataframe containing data about an event, remap the values of a specific column to a new value. Code #1: We can use DataFrame.replace() function to achieve this task. Let’s see


3. Strings, lists, and tuples, Last chapter we introduced Python's built-in types int , float , and str , and we stumbled upon tuple . To find out what the replace method does, for example, we do this: Since lists are mutable, these methods modify the list on which they are Lists and dictionaries are mutable data types; strings and tuples are not. nested  Following conversions from list to dictionary will be covered here, Convert a List to Dictionary with same values; Convert List items as keys in dictionary with enumerated value; Convert two lists to dictionary; Convert a list of tuples to dictionary; Convert a List to Dictionary with same values. Suppose we have a list of strings i.e.