Python Quick Guide - Step 1 Basic Syntax and Data Types (10) - Sets
- This course is designed to help you learn the basics of Python programming as quickly as possible through hands-on practice.
- The “Style Guide” sections primarily cover guidelines from PEP8 for writing clean Python code.
- You can run and see the results of each code example.
Feel free to experiment with the code - reloading the page will reset the content.
This is a continuation of “Step 1: Basic Syntax and Data Types”.
1.5. Grouping Data
1.5.4. Sets
Following lists, tuples, and dictionaries, another data type for grouping data is the set (set
).
A set is a collection of unique elements with no defined order.
Creating Sets
- Sets are created by placing elements separated by commas (
,
) inside curly braces ({
,}
). - Unlike dictionaries, sets contain only values, not
key: value
pairs.
- An empty set is created using
set()
.- Note that
{}
creates an empty dictionary, not an empty set.
- Note that
- You can also create a set from a list or tuple using the
set
function.
- Add a space after each comma (
,
).
- For long sets, it’s more readable to split them across multiple lines.
- Indent each element with 4 spaces at the beginning of the line.
- Include a trailing comma after the last element.
Create a set named my_fruits
that includes at least 3 of your favorite fruits.
Also, create an identical set using the set()
function.
Set Characteristics
Sets have the following characteristics:
- No duplicate elements: The same value can only be stored once
- No order: Elements have no order, and can’t be accessed by index
- Mutable: Elements can be added and removed
- Only hashable values can be elements: Like dictionary keys, only immutable objects can be elements in a set
No Duplicate Elements
No Order
Mutable
Like lists and dictionaries, sets are mutable (can be changed).
Set operations will be introduced below.
Only Hashable Values Can Be Elements
Similar to dictionary keys, sets can only contain hashable values.
- The following immutable types can be used:
- Numbers (
int
,float
) - Strings (
str
) - Tuples (
tuple
) - only if all contained elements are also hashable - Booleans (
bool
) None
- Numbers (
- The following mutable types cannot be used as set elements:
- Lists (
list
) - Dictionaries (
dict
) - Sets (
set
)
- Lists (
Basic Set Operations
Getting the Size
Like lists, you can use the len
function to get the size of a set.
Adding Elements (add
)
- Use the
add
method to add elements to a set.
Removing Elements (remove
, discard
)
- The
remove
method removes the specified element but raises an error if the element doesn’t exist. - The
discard
method also removes elements but doesn’t raise an error if the element doesn’t exist.
Popping Elements (pop
)
- The
pop
method removes and returns an arbitrary element from the set. - Since sets have no order, you cannot predict which element will be popped.
Attempting to pop
from an empty set (set()
) raises a KeyError
.
Checking for Existence (in
, not in
)
- Like lists and dictionaries, you can check for the existence of elements using the
in
andnot in
operators.
Set Operations
Sets support various set operations.
Union (|
operator)
- Creates a new set containing all elements from both sets.
Intersection (&
operator)
- Creates a new set containing only the common elements of both sets.
Difference (-
operator)
- Creates a new set containing elements that are in the first set but not in the second set.
Symmetric Difference (^
operator)
- Creates a new set containing elements that are in either set, but not in both.
Other Set Methods and Operators
Copying Sets (copy
)
- Like lists and dictionaries, you can create a (shallow) copy of a set using the
copy
method.
Updating Sets (update
, |=
)
- Like dictionaries, the
update
method updates the existing set with the union of itself and another set.
- You can also use the
|=
operator to perform the same operation.
(This is a shorthand similar to+=
for addition)
Other Shorthand Operators (&=
, -=
, ^=
)
&=
updates the set to contain only elements that are common to itself and another set.
-=
removes all elements of another set from the current set.
^=
updates the set to contain only elements that are in either the set or the other set, but not both.
Converting to Lists
- Use the
list
function to convert a set to a list.
- Use the
sorted
function to convert a set to a sorted list
(only if the elements can be ordered).
Type Conversion with bool
- Applying the
bool
function to a set returnsFalse
for an empty set, andTrue
otherwise.
Each of the data types we’ve studied has different characteristics and appropriate uses.
-
Sets (
set
) are appropriate when:- You need to eliminate duplicates
- You frequently check for the existence of elements
- You need to find common elements or differences between multiple data sets
- Order is not important
-
Lists (
list
) are appropriate when:- Order matters
- You need to add, remove, or modify elements
- You want to allow duplicate values
- You need to access elements by index
-
Tuples (
tuple
) are appropriate when:- You don’t want the data to be modified
- You want to use it as a dictionary key or a set element
- You want to group related data together
- Order matters
-
Dictionaries (
dict
) are appropriate when:- You need to manage data as key-value pairs
- You want to quickly retrieve values by key
- You want to label data with meaningful names
Understanding these characteristics helps you choose the appropriate data type for each situation.
Summary of Sets
Sets (set
) are data types that represent collections of unique elements with various defined operations.
Main characteristics:
- Cannot contain duplicate elements
- Have no defined order
- Are mutable (elements can be added or removed)
- Can only contain hashable values
Main operations:
.add()
: Add an element.remove()
: Remove an element (raises an error if the element doesn’t exist).discard()
: Remove an element (doesn’t raise an error if the element doesn’t exist).pop()
: Remove and return an arbitrary element
Set operations:
- Union (
|
): A set containing all elements from both sets - Intersection (
&
): A set containing only elements common to both sets - Difference (
-
): A set containing elements in the first set but not in the second - Symmetric difference (
^
): A set containing elements in either set but not both
Let’s solve the following problems using set operations.
-
We have two sets of fruit names:
fruits1 = {"apple", "banana", "orange", "grape"}
fruits2 = {"grape", "kiwi", "melon", "banana"}
-
Create a set that combines both sets
-
Create a set containing fruits that are in
fruits1
but not infruits2
-
Create a set containing fruits that are in one set but not in both
Sample Solution
You are an analyst for a website. You have lists of visitor IDs from last week and this week.
Perform the following analysis:
- Output the number and list of “unique” visitors for last week and this week
- ID
101
is your own ID, so exclude it from further analysis - Output the number of users who visited in both last week and this week
- Output the number of users who visited only this week (not last week)
- Output how many unique visitors there were in total over the two weeks
- Output the set of users who visited only last week (not this week)
Sample Solution
Let’s look at the time complexity of set operations in detail.
Python sets use hash tables internally, so like dictionaries, many operations are fast.
In particular, checking for the existence of elements (using the in
operator) is very efficient compared to lists.
Note on time complexity notation:
O(1)
: Processed in constant time regardless of input sizeO(n)
: In the worst case, takes time proportional to the number of elementsn
in the setO(m)
: Takes time proportional to the number of elementsm
in the second input set
Operation | Time Complexity | Description |
---|---|---|
x in s |
O(1) (average) | Usually constant time, but could be O(n) if many hash collisions occur |
len(s) |
O(1) | Always constant time as size is maintained by internal counter |
s.add(x) |
O(1) (average) | Usually constant time, but could be O(n) if hash collisions or internal resizing is needed |
s.remove(x) , s.discard(x) |
O(1) (average) | Element search and removal is usually constant time, but could be O(n) with hash collisions |
s.pop() |
O(1) (average) | Extracting an arbitrary element is usually constant time, but could be O(n) depending on internal state |
s | t |
O(n+m) | Needs to examine elements of both sets, proportional to the sum of both sizes |
s & t |
O(min(n,m)) | Checks each element of the smaller set against the larger one, proportional to the size of the smaller set |
s - t |
O(n) | Checks each element of the first set against the second, proportional to the first set’s size |
s ^ t |
O(n+m) | Needs to examine both sets, proportional to the sum of both sets’ sizes |
s.update(t) |
O(m) (average) | Proportional to the size of the set being added, but could be O(n+m) if rehashing is needed |
s.copy() |
O(n) | Creates a shallow copy of all elements, proportional to the number of elements |
In particular, when removing duplicates from large datasets or checking for the existence of elements,
using sets instead of lists can significantly improve performance.