LeetCode – 295 : Find Median from Data Stream

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.For example,

[2,3,4], the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

  • void addNum(int num) – Add a integer number from the data stream to the data structure.
  • double findMedian() – Return the median of all elements so far.

Example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3) 
findMedian() -> 2

Follow up:

  1. If all integer numbers from the stream are between 0 and 100, how would you optimize it?
  2. If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?
[code lang="java"]
class MedianFinder {

    /** initialize your data structure here. */
    public MedianFinder() {
        
    }
    
    public void addNum(int num) {
        
    }
    
    public double findMedian() {
        
    }
}

/**
 * Your MedianFinder object will be instantiated and called as such:
 * MedianFinder obj = new MedianFinder();
 * obj.addNum(num);
 * double param_2 = obj.findMedian();
 */
[code]

Idea – 1

We could use a list, addNum() takes time O(n). To find the median, we could sort the list, thus findMedian() takes time O(n\lg{n}). Space complexity is O(n).
[code lang="java"]
class MedianFinder {

    List<Integer> data;
    
    /** initialize your data structure here. */
    public MedianFinder() {
        data = new ArrayList<>();
    }
    
    public void addNum(int num) {
        data.add(num);
    }
    
    public double findMedian() {
        int n = data.size();
        
        Collections.sort(data);
        
        // assume n > 0
        if(n%2 == 1)
        {
            return data.get(n/2);
        }
        else
        {
            // assume no overflow or underflow
            return (data.get(n/2-1)+data.get(n/2))*0.5;
        }
    }
}
[code]

Runtime: 483 ms, faster than 7.04% of Java online submissions for Find Median from Data Stream.Memory Usage: 65.4 MB, less than 66.73% of Java online submissions for Find Median from Data Stream.

Idea – 2

In findMedian() we could use the quicksort partitioning idea to find an element of rank x, which should make the average runtime O(n) — worst case quadratic. Space is linear.
[code lang="java"]
class MedianFinder {

    List<Integer> data;
    
    static final Random rng = new Random( System.currentTimeMillis()%Integer.MAX_VALUE );
    
    /** initialize your data structure here. */
    public MedianFinder() {
        data = new ArrayList<>();
    }
    
    public void addNum(int num) {
        data.add(num);
    }
    
    public double findMedian() {
        int n = data.size();
        
        int rightRank = n/2;
        double rightElement = select(rightRank, 0, n-1);
        
        if(n%2==1)
        {
            return rightElement;
        }
        else
        {
            double leftElement = select(rightRank-1, 0, rightRank);
            
            return (leftElement+rightElement)*0.5;
        }
    }
    
    private int select(int rank, int lo, int hi)
    {
        if(lo==hi)
        {
            return data.get(lo);
        }
        
        int pi = lo + rng.nextInt(hi-lo+1);
        int pivot = data.get(pi);
        
        int i = lo-1, k = lo;
        int j = hi;
        
        while(k <= j)
        {
            if(data.get(k) > pivot)
            {
                Collections.swap(data, k, j--);
            }
            else if(data.get(k) < pivot)
            {
                Collections.swap(data, ++i, k++);
            }
            else
            {
                ++k;
            }
        }
        
        int pivotRank = i+1;
        if(pivotRank==rank)
        {
            return pivot;
        }
        else if(pivotRank < rank)
        {
            return select(rank, i+2, hi);
        }
        else
        {
            return select(rank, lo, i);
        }
    }
}
[code]

Unfortunately that one had time limit.

Idea – 3

We could use a binary search tree to get both addNum and findMedian run in time proportional to height or for random input on average O(\lg{n}). Space is linear.
[code lang="java"]
class BST
{
    class Node
    {
        int key;
        int size;
        Node left, right;
        
        public Node(int key)
        {
            this.key = key;
            this.size = 1;
        }
    }
    
    Node root;
    
    public int size()
    {
        return size(root);
    }
    
    private int size(Node x)
    {
        return x==null ? 0 : x.size;
    }
    
    public int select(int rank)
    {
        return select(root, rank);
    }
    
    private int select(Node x, int rank)
    {
        int leftSize = size(x.left);
        if(leftSize==rank)
        {
            return x.key;
        }
        else if(rank < leftSize)
        {
            return select(x.left, rank);
        }
        else
        {
            return select(x.right, rank-leftSize-1);
        }
    }
    
    public void put(int key)
    {
        root = put(root, key);
    }
    
    private Node put(Node x, int key)
    {
        if(x==null)
        {
            return new Node(key);
        }
        
        if(x.key <= key)
        {
            x.left = put(x.left, key);
        }
        else if(x.key > key)
        {
            x.right = put(x.right, key);
        }
        
        x.size = size(x.left)+size(x.right)+1;
        
        return x;
    }
}


class MedianFinder {

    BST data;
    
    /** initialize your data structure here. */
    public MedianFinder() {
        data = new BST();
    }
    
    public void addNum(int num) {
        data.put(num);
    }
    
    public double findMedian() {
        int n = data.size();
        
        double rightElement = data.select(n/2);
        if(n%2==1)
        {
            return rightElement;
        }
        
        double leftElement = data.select(n/2-1);
        
        return (leftElement+rightElement)*0.5;
    }
}
[code]

Runtime: 112 ms, faster than 77.72% of Java online submissions for Find Median from Data Stream.Memory Usage: 57.8 MB, less than 87.98% of Java online submissions for Find Median from Data Stream.

Idea – 4

If we can keep two groups: smaller \frac{n+1}{2} in one group and the rest in the other group, we can find the median in O(1) time – if total number of element is odd, we take the max from left half, otherwise we take the average of leftmax and rightmin. To maintain the two groups we use a maxheap for the left half and minheap for the right half. Add takes O(\lg{n}) time and findMedian() takes O(1) time.
[code lang="java"]
class MedianFinder {
    
    PriorityQueue<Integer> leftq;
    PriorityQueue<Integer> rightq;

    /** initialize your data structure here. */
    public MedianFinder() {
        leftq = new PriorityQueue<Integer>(new Comparator<Integer>()
                                           {
                                               public int compare(Integer a, Integer b)
                                               {
                                                   return b-a;
                                               }
                                           });
        
        rightq = new PriorityQueue<>();
    }
    
    public void addNum(int num) {
        if(leftq.size() == rightq.size())
        {
            leftq.offer(num);
        }
        else
        {
            rightq.offer(num);
        }
        
        if(!rightq.isEmpty())
        {
            leftq.offer(rightq.poll());
            rightq.offer(leftq.poll());
        }
    }
    
    public double findMedian() {
        int n = leftq.size()+rightq.size();
        if(n%2==1)
        {
            return leftq.peek();
        }
        else
        {
            return (leftq.peek()+rightq.peek())*0.5;
        }
    }
}
[code]

Runtime: 124 ms, faster than 39.99% of Java online submissions for Find Median from Data Stream.Memory Usage: 65.6 MB, less than 66.73% of Java online submissions for Find Median from Data Stream.

Follow up

If all integers are between 0 and 100, using counting sort idea and an array of size 100 we can keep the numbers sorted, add takes O(1) time. Binary search can be used to find the median in O(\lg{n}) time. If 99% of the numbers are in [0, 100], we can use the two heap idea for < 0 and > 100 and for the rest of the numbers we use the array idea.