Set
set
is a mutable, unordered collection of objects. frozenset
is similar to set
, but immutable. See docs.python: set, frozenset for documentation.
Initialization
Sets are declared as a collection of objects separated by a comma within {}
curly brace characters. The set() function can be used to initialize an empty set
and to convert iterables.
>>> empty_set = set()
>>> empty_set
set()
>>> nums = {-0.1, 3, 2, -5, 7, 1, 6.3, 5}
# note that the order is not the same as declaration
>>> nums
{-0.1, 1, 2, 3, 5, 6.3, 7, -5}
# duplicates are automatically removed
>>> set([3, 2, 11, 3, 5, 13, 2])
{2, 3, 5, 11, 13}
>>> set('initialize')
{'a', 'n', 't', 'l', 'e', 'i', 'z'}
set
doesn't allow mutable objects as elements.
>>> {1, 3, [1, 2], 4}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> {1, 3, (1, 2), 4}
{3, 1, (1, 2), 4}
Set methods and operations
The in
operator checks if a value is present in the given set
. Since set
uses hashtable (similar to dict
keys), the lookup time is constant and much faster than ordered collections like list
or tuple
for large data sets.
>>> colors = {'red', 'blue', 'green'}
>>> 'blue' in colors
True
>>> 'orange' in colors
False
Here's some examples for set
operations like union, intersection, etc. You can either use methods or operators, both will give you a new set
object instead of in-place modification. The difference is that set
methods can accept any iterable, whereas the operators can work only with set
or set-like objects.
>>> color_1 = {'teal', 'light blue', 'green', 'yellow'}
>>> color_2 = {'light blue', 'black', 'dark green', 'yellow'}
# union of two sets: color_1 | color_2
>>> color_1.union(color_2)
{'light blue', 'green', 'dark green', 'black', 'teal', 'yellow'}
# common items: color_1 & color_2
>>> color_1.intersection(color_2)
{'light blue', 'yellow'}
# items from color_1 not present in color_2: color_1 - color_2
>>> color_1.difference(color_2)
{'teal', 'green'}
# items from color_2 not present in color_1: color_2 - color_1
>>> color_2.difference(color_1)
{'dark green', 'black'}
# items present in one of the sets, but not both
# i.e. union of previous two operations: color_1 ^ color_2
>>> color_1.symmetric_difference(color_2)
{'green', 'dark green', 'black', 'teal'}
As mentioned in Dict chapter, methods like keys()
, values()
and items()
return a set-like object. You can apply set
operators on them.
>>> marks_1 = dict(Rahul=86, Ravi=92, Rohit=75)
>>> marks_2 = dict(Jo=89, Rohit=78, Joe=75, Ravi=100)
>>> marks_1.keys() & marks_2.keys()
{'Ravi', 'Rohit'}
>>> marks_1.keys() - marks_2.keys()
{'Rahul'}
Methods like add()
, update()
, symmetric_difference_update()
, intersection_update()
and difference_update()
will do the modifications in-place.
>>> color_1 = {'teal', 'light blue', 'green', 'yellow'}
>>> color_2 = {'light blue', 'black', 'dark green', 'yellow'}
# union
>>> color_1.update(color_2)
>>> color_1
{'light blue', 'green', 'dark green', 'black', 'teal', 'yellow'}
# adding a single value
>>> color_2.add('orange')
>>> color_2
{'black', 'yellow', 'dark green', 'light blue', 'orange'}
The pop()
method will return a random element being removed. Use the remove()
method if you want to delete an element based on its value. The discard()
method is similar to remove()
, but it will not generate an error if the element doesn't exist. The clear()
method will delete all the elements.
>>> colors = {'red', 'blue', 'green'}
>>> colors.pop()
'blue'
>>> colors
{'green', 'red'}
# you'll get KeyError if you use 'remove()' method here
>>> colors.discard('black')
>>> colors.clear()
>>> colors
set()
Here's some examples for comparison operations.
>>> names_1 = {'Ravi', 'Rohit'}
>>> names_2 = {'Ravi', 'Ram', 'Rohit', 'Raj'}
>>> names_1 == names_2
False
# same as: names_1 <= names_2
>>> names_1.issubset(names_2)
True
# same as: names_2 >= names_1
>>> names_2.issuperset(names_1)
True
# disjoint means there's no common elements
# same as: not names_1 & names_2
>>> names_1.isdisjoint(names_2)
False
>>> names_1.isdisjoint({'Jo', 'Joe'})
True
Exercises
Write a function that checks whether an iterable has duplicate values or not.
>>> has_duplicates('pip') True >>> has_duplicates((3, 2)) False
What does the above function return for
has_duplicates([3, 2, 3.0])
?