## Pandas - compute new colum based on the relative value in other rows

create pandas column with new values based on values in other columns
pandas create new column based on condition
pandas create new column based on multiple condition
pandas drop column
pandas apply
pandas dataframe
pandas rename column
create a new column based on two columns from two different dataframes

With data like below

```data = """
Class,Location,Long,Lat
A,ABC11,139.6295542,35.61144069
A,ABC20,139.630596,35.61045559
A,ABC03,139.6300307,35.61327781
B,ABC54,139.7787818,35.68847945
B,ABC05,139.7814447,35.6816882
B,ABC06,139.7788191,35.681865
B,ABC24,139.7790396,35.67781697
"""
```

Each row contains data pertaining to a location. For each location, need to find the distance to other locations (rows) as follows (simplified for ease)

```distance = sqrt((Long1-Long2)^2 + (Lat1-Lat2)^2)
```

if it was done outside pandas I would do as follows

```import math

rows = df.to_dict('records')

# distance of each location w.r.t other locations excluding self
results = {}
for row in rows:
loc = row['Location']
results[loc] = {}
# get a new list excl the curr row
nrows = [row for row in rows if row['Location'] != loc]
for nrow in nrows:
dist = math.sqrt((row["Long"] - nrow["Long"])**2 + (row["Lat"] - nrow["Lat"])**2)
results[loc][nrow["Location"]] = dist

# find the location with min distance
fin_results = {}
for k, v in results.items():
fin_results[k] = {}
minValKey = min(v, key = v.get)
fin_results[k]["location"] = minValKey
fin_results[k]["dist"] = v[minValKey]
```

This would give an output like below which for each location gives the location which is the most nearest and distance to that location.

```{'ABC11': {'location': 'ABC20', 'dist': 0.001433795400325211}, 'ABC20': {'location': 'ABC11', 'dist': 0.001433795400325211}, 'ABC03': {'location': 'ABC11', 'dist': 0.001897909941062068}, 'ABC54': {'location': 'ABC06', 'dist': 0.006614555169662396}, 'ABC05': {'location': 'ABC06', 'dist': 0.002631545857463665}, 'ABC06': {'location': 'ABC05', 'dist': 0.002631545857463665}, 'ABC24': {'location': 'ABC06', 'dist': 0.004054030973106164}}
```

While this works functionally, wanted to know what would be the `pandas` way of doing this.

The desired output

```+----------+-------------------+----------------------------+
| location |  nearest_location |  nearest_location_distance |
+----------+-------------------+----------------------------+
| 'ABC11'  | 'ABC20'           | 0.001433795400325211       |
| 'ABC20'  | 'ABC11'           | 0.001433795400325211       |
| 'ABC03'  | 'ABC11'           | 0.001897909941062068       |
| 'ABC54'  | 'ABC06'           | 0.006614555169662396       |
| 'ABC05'  | 'ABC06'           | 0.002631545857463665       |
| 'ABC06'  | 'ABC05'           | 0.002631545857463665       |
| 'ABC24'  | 'ABC06'           | 0.004054030973106164       |
+----------+-------------------+----------------------------+
```

You can use `numpy` broadcasting

```long_ = df.Long.to_numpy()
lat   = df.Lat.to_numpy()

distances = np.sqrt((long_ - long_[:, None]) ** 2 + (lat - lat[:,None]) **2)

dist_df = pd.DataFrame(distances, index=df.Location, columns=df.Location)
```

```Location     ABC11     ABC20     ABC03     ABC54     ABC05     ABC06     ABC24

ABC11     0.000000  0.001434  0.001898  0.167940  0.167348  0.165044  0.163559
ABC20     0.001434  0.000000  0.002878  0.167472  0.166822  0.164528  0.163012
ABC03     0.001898  0.002878  0.000000  0.166680  0.166151  0.163836  0.162385
ABC54     0.167940  0.167472  0.166680  0.000000  0.007295  0.006615  0.010666
ABC05     0.167348  0.166822  0.166151  0.007295  0.000000  0.002632  0.004558
ABC06     0.165044  0.164528  0.163836  0.006615  0.002632  0.000000  0.004054
ABC24     0.163559  0.163012  0.162385  0.010666  0.004558  0.004054  0.000000
```

```m = dist_df[dist_df>0]
pd.concat([m.idxmin(1).rename('nearest_location'),
m.min(1).rename('nearest_location_distance'), ],1)
```

The output data frame would be something like

```        nearest_location  nearest_location_distance
Location
ABC11               ABC20                   0.001434
ABC20               ABC11                   0.001434
ABC03               ABC11                   0.001898
ABC54               ABC06                   0.006615
ABC05               ABC06                   0.002632
ABC06               ABC05                   0.002632
ABC24               ABC06                   0.004054
```

This will find the distance from one row to all others. That's how I had interpreted the question, not sure if is your goal.

Deriving New Columns & Defining Python Functions, Make new columns from existing data and build custom functions. This lesson builds on the pandas DataFrame data type you learned about in a previous lesson. Run this code so you can see the first five rows of the dataset. You can do this by creating a derived column based on the values in the platform column. pandas.DataFrame.diff¶ DataFrame.diff (self, periods = 1, axis = 0) → ’DataFrame’ [source] ¶ First discrete difference of element. Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is the element in the same column of the previous row).

You can use `scipy`'s `distance_matrix`, which is actually what @rafaelc coded:

```from scipy.spatial import distance_matrix

dist_mat = distance_matrix(df[['Long','Lat']],df[['Long','Lat']])

# assign distance matrix with appropriate name
dist_mat = pd.DataFrame(dist_mat,
index=df.Location,
columns=df.Location)

# convert the data frame to dict
(dist_mat.where(dist_mat>0)
.agg(('idxmin', 'min'))
.to_dict()
)
```

Output:

```{'ABC11': {'idxmin': 'ABC20', 'min': 0.001433795400325211},
'ABC20': {'idxmin': 'ABC11', 'min': 0.001433795400325211},
'ABC03': {'idxmin': 'ABC11', 'min': 0.001897909941062068},
'ABC54': {'idxmin': 'ABC06', 'min': 0.006614555169662396},
'ABC05': {'idxmin': 'ABC06', 'min': 0.002631545857463665},
'ABC06': {'idxmin': 'ABC05', 'min': 0.002631545857463665},
'ABC24': {'idxmin': 'ABC06', 'min': 0.004054030973106164}}    ​
```

If you want the dataframe only:

```(dist_mat.where(dist_mat>0)
.agg(('idxmin', 'min'))
.T
)
```

Output:

```      idxmin         min
ABC11  ABC20   0.0014338
ABC20  ABC11   0.0014338
ABC03  ABC11  0.00189791
ABC54  ABC06  0.00661456
ABC05  ABC06  0.00263155
ABC06  ABC05  0.00263155
ABC24  ABC06  0.00405403
```

Indexing and Selecting Data, This makes interactive work intuitive, as there's little new to learn if you already As using integer slices with .ix have different behavior depending on whether the Getting values from an object with multi-axes selection uses the following -​0.370647 -1.157892 -1.344312 0.844885 [8 rows x 4 columns] In : df[['B', 'A']]​  pandas: How do I select rows based on if X number of columns is greater than a number? Tag: python , pandas I can use data[data[data > 10].any(1)] to select rows where any of the columns are greater than 10.

Also you can use df.iterrows:

```distance_min=[]
location_min=[]
output_df=df.copy()
for i, col in df.iterrows():
dist=((col['Long']-df['Long']).pow(2)+(col['Lat']-df['Lat']).pow(2)).pow(1/2)
location_min.append(df.at[dist[dist>0].idxmin(),'Location'])
distance_min.append(dist[dist>0].min())

output_df['nearest_location']=location_min
output_df['nearest_location_distance']=distance_min
output_df=output_df.reindex(columns=['Location','nearest_location','nearest_location_distance'])
print(output_df)
```

``` Location  nearest_location  nearest_location_distance
0    ABC11            ABC20                   0.001434
1    ABC20            ABC11                   0.001434
2    ABC03            ABC11                   0.001898
3    ABC54            ABC06                   0.006615
4    ABC05            ABC06                   0.002632
5    ABC06            ABC05                   0.002632
6    ABC24            ABC06                   0.004054
```

How to create new columns derived from existing columns?, The calculation is again element-wise, so the / is applied for the values in each row. Also other mathematical operators (+, -, *, /) or logical operators (<, >, =,…)  0 Pandas - compute new colum based on the relative value in other rows Oct 8 '19 0 How to extract text in-between 2 different closed html tags that are not inside the tags? Oct 22 '19

As ansev propose the same solution a bit more finished

```import pandas as pd
from io import StringIO