DSC 40B – Theoretical Foundations of Data Science II


Problems tagged with "hashing"

Problem #095

Tags: hashing

Suppose a hash table is constructed so that collisions are resolved through chaining. The hash table has \(10\) bins, and 100 items are inserted into the hash table.

True or False: it is possible for one of the bins to contain all 100 items.

True False
Solution

True.

Problem #096

Tags: hashing

Suppose a hash table in constructed so that collisions are resolved through chaining. The hash table has a number of bins \(k\) that is \(\Theta(n)\), where \(n\) is the number of elements being stored in the hash table; that is, the hash table grows as more elements are inserted.

Suppose the hash function uses only the first \(k / 2\) bins of the hash table, but appears to hash items into these bins evenly.

What is the expected time complexity of a membership query on this hash table? State your answer in asymptotic notation in terms of the number of elements in the hash table, \(n\).

Solution

\(\Theta(1)\) Since only \(k/2\) bins are being used, the expected number of elements within each bin is \(n / (k/2) = 2n / k\). But since \(k = \Theta(n)\), this is just \(\Theta(1)\).

Problem #097

Tags: hashing

Consider the code below:


def intersection(A, B):
    common = set()
    for x in A:
        if x in B:
            common.add(x)
    return common

What is the expected time complexity of this code if A and B are Python sets (hash tables) with \(n\) elements each? State your answer as a function of \(n\) using asymptotic notation. You may assume that the hash function used is ``good'' in that it hashes evenly (that is, it satisfies the simple universal hashing assumption).

Solution

\(\Theta(n)\)

Problem #120

Tags: hashing

What is the expected time complexity of the code shown below, assuming that numbers is a Python list? Express your answer as a function of \(n\) using asymptotic notation (e.g., \(\Theta(n)\)).


def foo(numbers):
    """Assume that `numbers` is a list containing n numbers."""
    # counts is a Python dictionary, which is implemented using hash table
    counts = {}
    for x in numbers:
        for y in numbers:
            diff = abs(x - y)
            if diff not in counts:
                counts[diff] = 1
            else:
                counts[diff] = counts[diff] + 1
    return counts

foo([1, 2, 3, 4])

Solution

\(\Theta(n^2)\)

Problem #121

Tags: hashing

Suppose a hash table with \(n/2\) bins stores \(n\) elements, and suppose that when the next element is inserted into the hash table, a ``grow'' will be triggered. That is, the insertion will cause the hash table to resize to \(n\) bins containing the same \(n + 1\) elements, plus the newly-added element.

What will be the time complexity of this single insertion operation that causes the table to grow?

Solution

\(\Theta(n)\)

Problem #140

Tags: hashing

Suppose \(T\) is a hash table of size 100 (that is, it has 100 bins). 300 elements are inserted into the hash table using a hash function that satisfies the uniform hashing assumption (that is, any element is equally-likely to hash into any bin). What is the expected number of elements in the first bin?

Solution

3

Problem #141

Tags: hashing

Suppose a hash table is constructed using linked lists to store the data within each bin, and that collisions are resolved by chaining. What is the worst case time complexity of inserting a new element into the hash table? You may assume that the hash table will not need to be resized, and that evaluating the hash function takes \(\Theta(1)\) time.

Solution

\(\Theta(1)\). Inserting an item into a hash table takes two steps: first, we hash the item to find its bin -- this takes \(\Theta(1)\) time. Next, we actually insert the item into that bin. When we use chaining, we do this by appending the item to the linked list containing that bins elements. Appending to a linked list takes \(\Theta(1)\) time, and so overall this takes \(\Theta(1)\).

Problem #212

Tags: hashing

In the problem below, suppose arr is an array of \(n\) integers. What is the expected time complexity of the code below? Write your answer using asymptotic notation as a function of \(n\).

Remember: set is implemented using a hash table. You may assume that the hash table is large enough to store all of th numbers without needing to resize.



def find_duplicates(arr):
    seen = set()
    duplicates = set()
    for num in arr:
        if num in seen:
            duplicates.add(num)
        else:
            seen.add(num)
    return duplicates

Problem #224

Tags: hashing

Assume that a hash table is implemented using chaining to resolve collisions. The hash table stores \(n\) numbers.

Suppose a membership query is made for 42. True or False: more than one bin in the hash table may be checked with a linear search during the query.

True False
Solution

False

Problem #247

Tags: hashing

What is the expected time complexity of the code shown below? Assume that \(n\) is the length of the list of numbers, and write your answer in asymptotic notation in terms of \(n\).


def foo(numbers):
    """Assume that `numbers` is a list containing n numbers."""
    # counts is a Python dictionary, which is implemented using a hash table
    counts = {}
    for x in numbers:
        for y in numbers:
            diff = abs(x - y)
            if diff not in counts:
                counts[diff] = 1
            else:
                counts[diff] = counts[diff] + 1
    return counts