Advanced and scientific Python

Logo

Materials to brush up your Python skills

View the Project on GitHub xoolive/pyclass

Advanced data structures

↑ Home | » Next

Warning    This section is optional. If you run after time, feel free to skip to next section.

Advanced data structures

defaultdict: dictionaries with default values

The defaultdict is a particular dictionaries with a default factory method used when accessing a key not present in the dictionary.

The idea behind the structure is to avoid checks like:

# Look Before You Leap (LBYL) style
if elt in references.keys():
    references[elt].add(something)
else:
    references[elt] = {something}

or

# More pythonic, faster
# Easier to Ask for Forgiveness than Permission (EAFP) style
try:
    references[elt].add(something)
except KeyError:
    references[elt] = {something}

with the code of the default case only:

references[elt].add(something)

This is made possible by describing how to create a default value in the dictionary. Here, the default case is an empty set created by set():

from collections import defaultdict

references = defaultdict(set)
Exercice    Implement the word count exercice from previous section with the proper defaultdict instance.

Counter: dictionaries for counting objects

A Counter is a particular defaultdict which has been designed for counting objects passed as an iterable:

>>> from collections import Counter
>>> Counter(random.randint(0, 5) for _ in range(100))
Counter({4: 22, 2: 19, 1: 18, 3: 18, 5: 15, 0: 8})
Exercice    Implement the word count exercice from previous section with the proper Counter instance.

dataclass

Dataclasses are facilities to create objects looking like dictionaries, with more flexibility with respect to mutable and non mutable entries.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
>>> Person()  # constructor automatically generated for you
Traceback (most recent call last):
  ...
TypeError: __init__() missing 2 required positional arguments: 'name' and 'age'
>>> p = Person("John", 30)
>>> p
Person(name='John', age=30)
>>> p.name
'John'
>>> p.name = "Peter"  # mutable
>>> p
Person(name='Peter', age=30)

Since dataclasses are mutable, they cannot be hashed and used in sets or as keys in dictionaries:

>>> {p}
Traceback (most recent call last):
  ...
TypeError: unhashable type: 'Person

However, dataclasses provide a frozen option, where all fields become immutable: this allows instances to be hashed.

@dataclass(frozen=True)
class Person:
    name: str
    age: int
>>> p = Person("John", 30)
>>> {p}  # hashable
{Person(name='John', age=30)}
>>> p.name = "Peter"  # immutable
Traceback (most recent call last):
  ...
dataclasses.FrozenInstanceError: cannot assign to field 'name'

A dataclass can be extended with usual methods:

@dataclass
class Person:
    name: str
    age: int

    def underage(self) -> bool:
        self.age <= 18
>>> p = Person("John", 30)
>>> p.underage()
False

Fields may be hidden from the default representation.
Also, they may be described with factories to represent default values.

from dataclasses import field

@dataclass
class Person:
    name: str
    age: int = field(repr=False)
    friends: list[str] = field(default_factory=list)
>>> p = Person("John", 30)
>>> p.friends.append("Peter")
>>> p
Person(name='John', friends=['Peter'])  # age is hidden here
Shameless advertising    You will find more details and examples (in French) about advanced structures in Chapter 4 of the reference book.

↑ Home | » Next