Skip to content
Sean Raven edited this page Oct 28, 2015 · 1 revision

A Set is a data structure that can not contain duplicate elements. Sets usually have very efficient membership testing, making them useful for determining whether an item has a particular property (e.g. checking if a vertex has been visited when traversing a graph).

There are two common implementations of sets: binary search trees and hashes.

Binary search trees keep their elements in order, using either their natural ordering or a comparator. In C++, natural ordering is specified by overloading operator< and operator== for a class, and a comparator is a function which takes two instances of the class and returns bool. In Java, natural ordering for a class C is specified by C implementing Comparable<C>, and a comparator is a (different) class which implements Comparator<C>.

Hashes use a hash() function to generate a unique index for an item. Two items which are considered "equal" must have the same hash() value. In C++11, two items which are not equal must not have the same hash() value. Java does not have this restriction, but it is encouraged. In order to implement a hash() function, it is first necessary to determine what makes two instances of the same class "equal". This is done by overriding operator== in C++ or Object.equals(Object) in Java. Then you must write a hash function which names the same fields as the equals function. This is done by writing a specialization of hash in C++11, or overriding Object.hashCode() in Java. The simplest implementation in Java is to use Objects.hash(); the exact algorithm can be found in the Javadocs. In C++11, you must implement the hash() function yourself. The simplest thing to do is to bitwise-xor the hashcodes of the named fields; the best thing to do is re-implement the Java algorithm in C++.

Nota Bene: hashsets are only available in C++11

Examples using the Student class above:

Java:

public class Student{
    String name;
    double gpa;
    @Override
    public boolean equals(Object other){
        if(!(other instanceof Student)){
            return false;
        }
        Student o = (Student)other;
        return name.equals(o.name) && gpa == o.gpa;
    }
    @Override
    public int hashCode(){
        return Objects.hash(name, gpa);
    }
}

C++:

#include <functional>
struct Student{
    string name;
    float gpa;
}
bool operator==(Student left, Student right){
    return left.name == right.name && left.gpa == right.gpa;
}
template<>
struct hash<Student> {
    size_t operator()(const Student& val) const{
        hash<string> str_hash;
        hash<float> flt_hash;
        return str_hash(val.name) ^ flt_hash(val.gpa);
        //or
        //return hash<string>()(val.name) ^ hash<float>()(val.gpa);
    }
}

Clone this wiki locally