What is the most efficient algorithm/data structure for finding the smallest range containing a point?

find smallest range containing elements from k lists
range minimum query
find range leetcode
queries for count of array elements with values in given range
range queries for frequencies of array elements
range queries - geeksforgeeks
frequency of maximum value
k pairs with largest sums

Given a data set of a few millions of price ranges, we need to find the smallest range that contains a given price. The following rules apply:

  • Ranges can be fully nested (ie, 1-10 and 5-10 is valid)
  • Ranges cannot be partially nested (ie, 1-10 and 5-15 is invalid)

Example: Given the following price ranges:

  • 1-100
  • 50-100
  • 100-120
  • 5-10
  • 5-20

The result for searching price 7 should be 5-10 The result for searching price 100 should be 100-120 (smallest range containing 100).

What's the most efficient algorithm/data structure to implement this? Searching the web, I only found solutions for searching ranges within ranges. I've been looking at Morton count and Hilbert curve, but can't wrap my head around how to use them for this case. Thanks.

Because you did not mention this ad hoc algorithm, I'll propose this as a simple answer to your question:

This is a python function, but it's fairly easy to understand and convert it in another language.

def min_range(ranges, value):
    # ranges = [(1, 100), (50, 100), (100, 120), (5, 10), (5, 20)]
    # value = 100

    # INIT
    import math
    best_range = None
    best_range_len = math.inf

    # LOOP THROUGH ALL RANGES
    for b, e in ranges:

        # PICK THE SMALLEST
        if b <= value <= e and e - b < best_range_len:
            best_range = (b, e)
            best_range_len = e - b

    print(f'Minimal range containing {value} = {best_range}')

I believe there are more efficient and complicated solutions (if you can do some precomputation for example) but this is the first step you must take.

EDIT : Here is a better solution, probably in O(log(n)) but it's not trivial. It is a tree where each node is an interval, and has a child list of all strictly non overlapping intervals that are contained inside him. Preprocessing is done in O(n log(n)) time and queries are O(n) in worst case (when you can't find 2 ranges that don't overlap) and probably O(log(n)) in average.

2 classes: Tree that holds the tree and can query:

class tree:
    def __init__(self, ranges):
        # sort the ranges by lowest starting and then greatest ending
        ranges = sorted(ranges, key=lambda i: (i[0], -i[1]))
        # recursive building -> might want to optimize that in python
        self.node = node( (-float('inf'), float('inf')) , ranges)

    def __str__(self):
        return str(self.node)

    def query(self, value):
        # bisect is for binary search
        import bisect
        curr_sol = self.node.inter
        node_list = self.node.child_list

        while True:
            # which of the child ranges can include our value ?
            i = bisect.bisect_left(node_list, (value, float('inf'))) - 1
            # does it includes it ?
            if i < 0 or i == len(node_list):
                return curr_sol
            if value > node_list[i].inter[1]:
                return curr_sol
            else:
                # if it does then go deeper
                curr_sol = node_list[i].inter
                node_list = node_list[i].child_list

Node that holds the structure and information:

class node:
    def __init__(self, inter, ranges):
        # all elements in ranges will be descendant of this node !
        import bisect

        self.inter = inter
        self.child_list = []

        for i, r in enumerate(ranges):
            if len(self.child_list) == 0:
                # append a new child when list is empty
                self.child_list.append(node(r, ranges[i + 1:bisect.bisect_left(ranges, (r[1], r[1] - 1))]))

            else:
                # the current range r is included in a previous range 
                # r is not a child of self but a descendant !
                if r[0] < self.child_list[-1].inter[1]:
                    continue
                # else -> this is a new child
                self.child_list.append(node(r, ranges[i + 1:bisect.bisect_left(ranges, (r[1], r[1] - 1))]))

    def __str__(self):
        # fancy
        return f'{self.inter} : [{", ".join([str(n) for n in self.child_list])}]'

    def __lt__(self, other):
        # this is '<' operator -> for bisect to compare our items
        return self.inter < other

and to test that:

ranges = [(1, 100), (50, 100), (100, 120), (5, 10), (5, 20), (50, 51)]
t = tree(ranges)
print(t)
print(t.query(10))
print(t.query(5))
print(t.query(40))
print(t.query(50))

Find smallest range containing elements from k lists, What's Difference? Quizzes expand_more. C · C++ · Java · Python · Data Structures · Algorithms · Operating Systems · DBMS · Compiler Design  An array is the simplest and most widely used data structure. Other data structures like stacks and queues are derived from arrays. Here’s an image of a simple array of size 4, containing elements (1, 2, 3 and 4).

Preprocessing that generates disjoined intervals (I call source segments as ranges and resulting segments as intervals)

For ever range border (both start and end) make tuple: (value, start/end fiels, range length, id), put them in array/list

Sort these tuples by the first field. In case of tie make longer range left for start and right for end.

Make a stack
Make StartValue variable.
Walk through the list:
     if current tuple contains start:
          if interval is opened:   //we close it
             if  current value > StartValue:   //interval is not empty
                  make interval with   //note id remains in stack
                      (start=StartValue, end = current value, id = stack.peek)       
                  add interval to result list
          StartValue = current value //we open new interval
          push id from current tuple onto stack
     else:   //end of range
             if  current value > StartValue:   //interval is not empty
                 make interval with    //note id is removed from stack
                      (start=StartValue, end = current value, id = stack.pop)
                 add interval to result list
         if stack is not empty:
              StartValue = current value //we open new interval

After that we have sorted list of disjointed intervals containing start/end value and id of the source range (note that many intervals might correspond to the same source range), so we can use binary search easily.

If we add source ranges one-by-one in nested order (nested after it parent), we can see that every new range might generate at most two new intervals, so overall number of intervals M <= 2*N and overall complexity is O(Nlog N + Q * logN) where Q is number of queries

Edit: Added if stack is not empty section

Result for your example 1-100, 50-100, 100-120, 5-10, 5-20 is

1-5(0), 5-10(3), 10-20(4), 20-50(0), 50-100(1), 100-120(2) 

Closest Pair of Points using Divide and Conquer algorithm , The second subarray contains points from P[n/2+1] to P[n-1]. 3) Recursively find the smallest distances in both subarrays. Let the distances be dl and dr. Find the​  Input: k = 3 arr1[] : [4, 7] arr2[] : [1, 2] arr3[] : [20, 40] Output: The smallest range is [2 20] Explanation:The range [2, 20] contains 2, 4, 7, 20 which contains element from all the three arrays.

Since pLOPeGG already covered the ad hoc case, I will answer the question under the premise that preporcessing is performed in order to support multiple queries efficiently.

General data structures for efficient queries on intervals are the Interval Tree and the Segment Tree

Algorithms and Data Structures: 2nd Workshop, WADS '91, Ottawa, , Preprocess P for orthogonal range queries using the layered range tree [6] 2. y= Optimality is justified by observing the time required to find the smallest an efficient algorithm which, given x, reports all pairs of points within L^ distance x the algorithm returns all pairs whose interdistance is at most x in O (n + rm(x)) time. R-Tree is the best data structure suitable for this use case. R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The information of all rectangles can be stored in tree form so searching will be easy

Handbook of Data Structures and Applications, 19.3.1 Range Query Range queries are commonly used in database applications​. points using a hierarchy of cells, the range query can be answered by finding To avoid this problem, first compute the smallest cell encompassing the query region. point may be large, group queries can be answered more efficiently by​  If the map is less dense then you can do it quicker. Say you have a billion numbers from 0 to 10^12. So on average a range of thousand numbers only contains one number. You could have a bitmap to represent the fact that "the range 128k to 128k + 127 contains at least one number".

Algorithms and Data Structures: 5th International Workshop, WADS , We use the algorithm in [18] to solve our problem in the following way: apply the can find the smallest area sector of a constrained ring that contains k points (k > the points in order to answer efficiently whether k or more points are enclosed by a The naive approach to this problem is to build a range tree [3] on the set 5. Balanced tree data structure with logN complexities on searching, inserting, and deleting in both the worst and average cases. In this data structure, every node with children has either two children and one data element, or three children and two data elements. Leaf nodes will contain only one or two data elements.

Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete , a point (the query IP packet header values) in a partition of space formed by certain overlapping regions among rectangles of highest priority that overlap a first known efficient algorithm for auditing multi-method libraries, (2) improved We solve this subproblem efficiently using a linear-time union-find data structure. Push and Pop are notations associated with which data structure: Select one: a. Queue b. Stack c. List d. Array The correct answer is: Stack Question 90 True/False: The stack data structure is implemented as a LIFO structure (last in first out) Select one: True False The correct answer is 'True'. Question 91

A dive into spatial search algorithms, Spatial data has two fundamental query types: nearest neighbors and range queries. Almost all spatial data structures share the same principle to enable efficient same process a few more times until the final boxes contain 9 points at most: In our proposed concave hull algorithm, finding nearest inside points — these  This set data structure uses about n/w words of space, where w is the number of bits in each machine word. Whether the least significant bit (of the word) or the most significant bit indicates the smallest-index number is largely irrelevant, but the former tends to be preferred (on little-endian machines).

Comments
  • There is an obvious O(n) algorithm, is that what you are looking for ?
  • If your problem is to index the data for efficient querying then you should look at en.wikipedia.org/wiki/Interval_tree and en.wikipedia.org/wiki/Segment_tree
  • Are you going to search for multiple price points, are the segments already ordered? Will each queried price point have different segment inputs as well?
  • @pLOPeGG of course, O(n) is the obvious answer, but I'm looking for a more efficient one (maybe O(logn)?)
  • @SaiBot interval tree is an interesting solution, however it starts to lose its efficiency when there are many nesting ranges
  • This is an obvious algorithm, but it's O(n) in the worst case. We need an efficient method
  • Oh okay, then consider SaiBot method. Another solution is because there is no overlapping, you can build a tree where each node is a range and has a list of range as child. In this list put only ranges that are directly contained in the parent range and so on. To query the tree, start to the root, binary search in his child list the right child range. If there is noone -> the current range is your answer, else go deeper and repeat. It might be faster (or slower) depending on your data.
  • This is log(m) worst case, being m the amount of ranges left after making them disjoint. If you can prove that m is on the same order as n, I think this is the best solution
  • @juvian Yes, new interval might appear only at start and end point of source range, so m~n. Added to answer.
  • I think this might not work, or might be harder to implement than you think. The problem is, once you've found the largest starting value smaller than the query, the final result might not start with this largest value. Here is an example : ranges = [(1, 100), (5, 20), (5, 10), (50, 100)] If we query for 30, then the largest stating value is 5 but the right answer is (1, 100) This imply some difficulties that might prevent us from getting this O(log(n)) computation time.
  • @pLOPeGG You are right I have to go back to the whiteborad ;-) I am removing the second part of my answer for now.