Contents

Python Quick Guide - Step 1 Basic Syntax and Data Types (10) - Sets

Info
  • This course is designed to help you learn the basics of Python programming as quickly as possible through hands-on practice.
  • The “Style Guide” sections primarily cover guidelines from PEP8 for writing clean Python code.
  • You can run and see the results of each code example.
    Feel free to experiment with the code - reloading the page will reset the content.

This is a continuation of “Step 1: Basic Syntax and Data Types”.

Following lists, tuples, and dictionaries, another data type for grouping data is the set (set).
A set is a collection of unique elements with no defined order.

  • Sets are created by placing elements separated by commas (,) inside curly braces ({, }).
  • Unlike dictionaries, sets contain only values, not key: value pairs.
  • An empty set is created using set().
    • Note that {} creates an empty dictionary, not an empty set.
  • You can also create a set from a list or tuple using the set function.
Style Guide
  • Add a space after each comma (,).
  • For long sets, it’s more readable to split them across multiple lines.
    • Indent each element with 4 spaces at the beginning of the line.
    • Include a trailing comma after the last element.
📚 Exercise

Create a set named my_fruits that includes at least 3 of your favorite fruits.
Also, create an identical set using the set() function.

Sets have the following characteristics:

  1. No duplicate elements: The same value can only be stored once
  2. No order: Elements have no order, and can’t be accessed by index
  3. Mutable: Elements can be added and removed
  4. Only hashable values can be elements: Like dictionary keys, only immutable objects can be elements in a set

Like lists and dictionaries, sets are mutable (can be changed).
Set operations will be introduced below.

Similar to dictionary keys, sets can only contain hashable values.

  • The following immutable types can be used:
    • Numbers (int, float)
    • Strings (str)
    • Tuples (tuple) - only if all contained elements are also hashable
    • Booleans (bool)
    • None
  • The following mutable types cannot be used as set elements:
    • Lists (list)
    • Dictionaries (dict)
    • Sets (set)

Like lists, you can use the len function to get the size of a set.

  • Use the add method to add elements to a set.
  • The remove method removes the specified element but raises an error if the element doesn’t exist.
  • The discard method also removes elements but doesn’t raise an error if the element doesn’t exist.
  • The pop method removes and returns an arbitrary element from the set.
  • Since sets have no order, you cannot predict which element will be popped.

Attempting to pop from an empty set (set()) raises a KeyError.

  • Like lists and dictionaries, you can check for the existence of elements using the in and not in operators.

Sets support various set operations.

  • Creates a new set containing all elements from both sets.
  • Creates a new set containing only the common elements of both sets.
  • Creates a new set containing elements that are in the first set but not in the second set.
  • Creates a new set containing elements that are in either set, but not in both.
  • Like lists and dictionaries, you can create a (shallow) copy of a set using the copy method.
  • Like dictionaries, the update method updates the existing set with the union of itself and another set.
  • You can also use the |= operator to perform the same operation.
    (This is a shorthand similar to += for addition)
  • &= updates the set to contain only elements that are common to itself and another set.
  • -= removes all elements of another set from the current set.
  • ^= updates the set to contain only elements that are in either the set or the other set, but not both.
  • Use the list function to convert a set to a list.
  • Use the sorted function to convert a set to a sorted list
    (only if the elements can be ordered).
  • Applying the bool function to a set returns False for an empty set, and True otherwise.
When to Use Sets vs Lists, Tuples, and Dictionaries

Each of the data types we’ve studied has different characteristics and appropriate uses.

  • Sets (set) are appropriate when:

    • You need to eliminate duplicates
    • You frequently check for the existence of elements
    • You need to find common elements or differences between multiple data sets
    • Order is not important
  • Lists (list) are appropriate when:

    • Order matters
    • You need to add, remove, or modify elements
    • You want to allow duplicate values
    • You need to access elements by index
  • Tuples (tuple) are appropriate when:

    • You don’t want the data to be modified
    • You want to use it as a dictionary key or a set element
    • You want to group related data together
    • Order matters
  • Dictionaries (dict) are appropriate when:

    • You need to manage data as key-value pairs
    • You want to quickly retrieve values by key
    • You want to label data with meaningful names

Understanding these characteristics helps you choose the appropriate data type for each situation.

Sets (set) are data types that represent collections of unique elements with various defined operations.

Main characteristics:

  • Cannot contain duplicate elements
  • Have no defined order
  • Are mutable (elements can be added or removed)
  • Can only contain hashable values

Main operations:

  • .add(): Add an element
  • .remove(): Remove an element (raises an error if the element doesn’t exist)
  • .discard(): Remove an element (doesn’t raise an error if the element doesn’t exist)
  • .pop(): Remove and return an arbitrary element

Set operations:

  • Union (|): A set containing all elements from both sets
  • Intersection (&): A set containing only elements common to both sets
  • Difference (-): A set containing elements in the first set but not in the second
  • Symmetric difference (^): A set containing elements in either set but not both
📚 Exercise 1: Set Operations

Let’s solve the following problems using set operations.

  1. We have two sets of fruit names:

    • fruits1 = {"apple", "banana", "orange", "grape"}
    • fruits2 = {"grape", "kiwi", "melon", "banana"}
  2. Create a set that combines both sets

  3. Create a set containing fruits that are in fruits1 but not in fruits2

  4. Create a set containing fruits that are in one set but not in both

Sample Solution
📚 Exercise 2: Website Visitor Analysis

You are an analyst for a website. You have lists of visitor IDs from last week and this week.
Perform the following analysis:

  1. Output the number and list of “unique” visitors for last week and this week
  2. ID 101 is your own ID, so exclude it from further analysis
  3. Output the number of users who visited in both last week and this week
  4. Output the number of users who visited only this week (not last week)
  5. Output how many unique visitors there were in total over the two weeks
  6. Output the set of users who visited only last week (not this week)
Sample Solution
Column: Time Complexity of Set Operations

Let’s look at the time complexity of set operations in detail.

Python sets use hash tables internally, so like dictionaries, many operations are fast.
In particular, checking for the existence of elements (using the in operator) is very efficient compared to lists.

Note on time complexity notation:

  • O(1): Processed in constant time regardless of input size
  • O(n): In the worst case, takes time proportional to the number of elements n in the set
  • O(m): Takes time proportional to the number of elements m in the second input set
Operation Time Complexity Description
x in s O(1) (average) Usually constant time, but could be O(n) if many hash collisions occur
len(s) O(1) Always constant time as size is maintained by internal counter
s.add(x) O(1) (average) Usually constant time, but could be O(n) if hash collisions or internal resizing is needed
s.remove(x), s.discard(x) O(1) (average) Element search and removal is usually constant time, but could be O(n) with hash collisions
s.pop() O(1) (average) Extracting an arbitrary element is usually constant time, but could be O(n) depending on internal state
s | t O(n+m) Needs to examine elements of both sets, proportional to the sum of both sizes
s & t O(min(n,m)) Checks each element of the smaller set against the larger one, proportional to the size of the smaller set
s - t O(n) Checks each element of the first set against the second, proportional to the first set’s size
s ^ t O(n+m) Needs to examine both sets, proportional to the sum of both sets’ sizes
s.update(t) O(m) (average) Proportional to the size of the set being added, but could be O(n+m) if rehashing is needed
s.copy() O(n) Creates a shallow copy of all elements, proportional to the number of elements

In particular, when removing duplicates from large datasets or checking for the existence of elements,
using sets instead of lists can significantly improve performance.

Related Content