DSC 40B – Theoretical Foundations of Data Science II


Problems tagged with "hashing"

Problem #095

Tags: hashing

Suppose a hash table is constructed so that collisions are resolved through chaining. The hash table has \(10\) bins, and 100 items are inserted into the hash table.

True or False: it is possible for one of the bins to contain all 100 items.

True False
Solution

True.

Problem #096

Tags: hashing

Suppose a hash table in constructed so that collisions are resolved through chaining. The hash table has a number of bins \(k\) that is \(\Theta(n)\), where \(n\) is the number of elements being stored in the hash table; that is, the hash table grows as more elements are inserted.

Suppose the hash function uses only the first \(k / 2\) bins of the hash table, but appears to hash items into these bins evenly.

What is the expected time complexity of a membership query on this hash table? State your answer in asymptotic notation in terms of the number of elements in the hash table, \(n\).

Solution

\(\Theta(1)\) Since only \(k/2\) bins are being used, the expected number of elements within each bin is \(n / (k/2) = 2n / k\). But since \(k = \Theta(n)\), this is just \(\Theta(1)\).

Problem #097

Tags: hashing

Consider the code below:


def intersection(A, B):
    common = set()
    for x in A:
        if x in B:
            common.add(x)
    return common

What is the expected time complexity of this code if A and B are Python sets (hash tables) with \(n\) elements each? State your answer as a function of \(n\) using asymptotic notation. You may assume that the hash function used is ``good'' in that it hashes evenly (that is, it satisfies the simple universal hashing assumption).

Solution

\(\Theta(n)\)

Problem #120

Tags: hashing

What is the expected time complexity of the code shown below, assuming that numbers is a Python list? Express your answer as a function of \(n\) using asymptotic notation (e.g., \(\Theta(n)\)).


def foo(numbers):
    """Assume that `numbers` is a list containing n numbers."""
    # counts is a Python dictionary, which is implemented using hash table
    counts = {}
    for x in numbers:
        for y in numbers:
            diff = abs(x - y)
            if diff not in counts:
                counts[diff] = 1
            else:
                counts[diff] = counts[diff] + 1
    return counts

foo([1, 2, 3, 4])

Solution

\(\Theta(n^2)\)

Problem #121

Tags: hashing

Suppose a hash table with \(n/2\) bins stores \(n\) elements, and suppose that when the next element is inserted into the hash table, a ``grow'' will be triggered. That is, the insertion will cause the hash table to resize to \(n\) bins containing the same \(n + 1\) elements, plus the newly-added element.

What will be the time complexity of this single insertion operation that causes the table to grow?

Solution

\(\Theta(n)\)

Problem #140

Tags: hashing

Suppose \(T\) is a hash table of size 100 (that is, it has 100 bins). 300 elements are inserted into the hash table using a hash function that satisfies the uniform hashing assumption (that is, any element is equally-likely to hash into any bin). What is the expected number of elements in the first bin?

Solution

3

Problem #141

Tags: hashing

Suppose a hash table is constructed using linked lists to store the data within each bin, and that collisions are resolved by chaining. What is the worst case time complexity of inserting a new element into the hash table? You may assume that the hash table will not need to be resized, and that evaluating the hash function takes \(\Theta(1)\) time.

Solution

\(\Theta(1)\). Inserting an item into a hash table takes two steps: first, we hash the item to find its bin -- this takes \(\Theta(1)\) time. Next, we actually insert the item into that bin. When we use chaining, we do this by appending the item to the linked list containing that bins elements. Appending to a linked list takes \(\Theta(1)\) time, and so overall this takes \(\Theta(1)\).