Normalization VS. numpy way to normalize?

numpy standardize
python normalize between 0 and 1
numpy normalize between 0 and 1
numpy normalize unit vector
numpy normalize columns
numpy unit vector
numpy normalizing rows
normalize axis

I'm supposed to normalize an array. I've read about normalization and come across a formula:

I wrote the following function for it:

def normalize_list(list):
    max_value = max(list)
    min_value = min(list)
    for i in range(0, len(list)):
        list[i] = (list[i] - min_value) / (max_value - min_value)

That is supposed to normalize an array of elements.

Then I have come across this: https://stackoverflow.com/a/21031303/6209399 Which says you can normalize an array by simply doing this:

def normalize_list_numpy(list):
    normalized_list = list / np.linalg.norm(list)
    return normalized_list

If I normalize this test array test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9] with my own function and with the numpy method, I get these answers:

My own function: [0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
The numpy way: [0.059234887775909233, 0.11846977555181847, 0.17770466332772769, 0.23693955110363693, 0.29617443887954614, 0.35540932665545538, 0.41464421443136462, 0.47387910220727386, 0.5331139899831830

Why do the functions give different answers? Is there others way to normalize an array of data? What does numpy.linalg.norm(list) do? What do I get wrong?

There are different types of normalization. You are using min-max normalization. The min-max normalization from scikit learn is as follows.

import numpy as np
from sklearn.preprocessing import minmax_scale

# your function
def normalize_list(list_normal):
    max_value = max(list_normal)
    min_value = min(list_normal)
    for i in range(len(list_normal)):
        list_normal[i] = (list_normal[i] - min_value) / (max_value - min_value)
    return list_normal

#Scikit learn version 
def normalize_list_numpy(list_numpy):
    normalized_list = minmax_scale(list_numpy)
    return normalized_list

test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9]
test_array_numpy = np.array(test_array)

print(normalize_list(test_array))
print(normalize_list_numpy(test_array_numpy))

Output:

[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]    
[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]

MinMaxscaler uses exactly your formula for normalization/scaling: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.minmax_scale.html

@OuuGiii: NOTE: It is not a good idea to use Python built-in function names as varibale names. list() is a Python builtin function so its use as a variable should be avoided.

Standardize or Normalize?, The use of a normalization method will improve analysis from multiple models. Additionally, if we were to use any algorithms on this data set before we normalized  Today let’s take the help of sklearn library to perform normalization. Using sklearn preprocessing — Normalizer. Before feeding the “Age” and the “Weight” values directly to the method we need to convert these data frames into a numpy array. To do this we can use the to_numpy() method as shown below:

The question/answer that you reference doesn't explicitly relate your own formula to the np.linalg.norm(list) version that you use here.

One NumPy solution would be this:

import numpy as np
def normalize(x):
    x = np.asarray(x)
    return (x - x.min()) / (np.ptp(x))

print(normalize(test_array))    
# [ 0.     0.125  0.25   0.375  0.5    0.625  0.75   0.875  1.   ]

Here np.ptp is peak-to-peak ie

Range of values (maximum - minimum) along an axis.

This approach scales the values to the interval [0, 1] as pointed out by @phg.

The more traditional definition of normalization would be to scale to a 0 mean and unit variance:

x = np.asarray(test_array)
res = (x - x.mean()) / x.std()
print(res.mean(), res.std())
# 0.0 1.0

Or use sklearn.preprocessing.normalize as a pre-canned function.

Using test_array / np.linalg.norm(test_array) creates a result that is of unit length; you'll see that np.linalg.norm(test_array / np.linalg.norm(test_array)) equals 1. So you're talking about two different fields here, one being statistics and the other being linear algebra.

sklearn.preprocessing.normalize, set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix and if axis is 1). I have a set of images of different sizes but with a width/height ratio less than (or equal) 2 (could be anything but let's say 2 for now), I want to normalize each one, meaning I want all of them to have the same size. Specifically I am going to do so like this:

The power of python is its broadcasting property, which allows you to do vectorizing array operations without explicit looping. So, You do not need to write a function using explicit for the loop, which is slow and time-consuming, especially if your dataset is too big.

The pythonic way of doing min-max normalization is

test_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
normalized_test_array = (test_array - min(test_array)) / (max(test_array) - min(test_array)) 

output >> [ 0., 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1. ]

How to normalize the rows of a NumPy array in Python, How to normalize the rows of a NumPy array in Python. Use numpy.ndarray.sum​() and broadcasting. Use sklearn.preprocessing.normalize(). Normalization is  Alright: NumPy's array shape broadcast adds dimensions to the left of the array shape, not to its right. NumPy can however be instructed to add a dimension to the right of the norms array: does work! Just a note, I use norms[, np.newaxis] in case the matrix is not just 2D.

How to normalize an array in NumPy?, The normalization of data is important for the fast and smooth training of our machine learning models. Scikit learn, a library of python has  1. Normalization makes training less sensitive to the scale of features, so we can better solve for coefficients. Consider the dataset above of housing prices in California, which have features such as the number of bedrooms and the median household income.

How to Normalize and Standardize Time Series Data in Python, The limitations of normalization and expectations of your data for using standardization. What parameters are required and how to manually  Normalization is a good technique to use when you do not know the distribution of your data or when you know the distribution is not Gaussian (a bell curve). Normalization is useful when your data

linalg.norm - Numpy and Scipy, Normalizing data with numpy. Introduction; Imports; Normalize Rows A common technique used in Machine Learning and Deep Learning is to normalize the data. So how did it work when you divided x by x_norm? Interesting for anyone working with scores and looking for normalization, though personally, I don't like PCA (produces meaningless reduced variables and sensitive to outliers / correlation among variables). Here's a simpler way to normalize scores, I actually patented it. Table of content: About standardization; About Min-Max scaling

Comments
  • Just so you're aware, this isn't the traditional formula for normalization, which is usually expressed as (x - x_mean) / stdev(x), which standardizes x to be normally distributed. (stdev is standard deviation.)
  • Agree with Brad. Your formula scales the values to the interval [0, 1], while "normalization" more often means transforming to have mean 0 and variance 1 (in statistics), or scaling a vector to have unit length with respect to some norm (usually L2).
  • Isn't that called 'Standardization'? @phg
  • @OuuGiii No, without having an official reference to cite I would say that both "normalization" and "standardization" refer to subtracting out a mean and dividing by a standard deviation to get the data to have an N~(0,1) distribution. Maybe normalization could take on the meaning you mention in linear algebra contexts, but I would say phg's is the dominant usage.
  • @OuuGiii yes, according to this answer at least, normalization refers to a [0,1] range, and standardization refers to a mean 0 variance 1.
  • Didn't know this existed, +1. @OuuGii directly from the docs for this function, "This transformation is often used as an alternative to zero mean, unit variance scaling."
  • @BradSolomon Its used quite often in sklearn for feature scaling before they are fed to various sensitive classifiers such as svm or knn etc.
  • thank you, but what then does the function normalize_list_numpy() do?
  • @OuuGiii it makes the vector have length 1.
  • @OuuGiii see the result of np.linalg.norm(test_array / np.linalg.norm(test_array)) to understand @phg's comment.