
Picture by Editor | ChatGPT
Introduction
Python’s normal library is in depth, providing a variety of modules to carry out frequent duties effectively.
Amongst these, the collections
module is a standout instance, which gives specialised container knowledge varieties that may function options to Python’s general-purpose built-in containers like dict
, listing
, set
, and tuple
. Whereas many builders are aware of a few of its elements, the module hosts a wide range of functionalities which can be surprisingly helpful and may simplify code, enhance readability, and increase efficiency.
This tutorial explores ten sensible — and maybe stunning — functions of the Python collections
module.
1. Counting Hashable Objects Effortlessly with Counter
A typical job in nearly any knowledge evaluation mission is counting the occurrences of things in a sequence. The collections.Counter
class is designed particularly for this. It is a dictionary subclass the place components are saved as keys and their counts are saved as values.
from collections import Counter
# Depend the frequency of phrases in a listing
phrases = ['galaxy', 'nebula', 'asteroid', 'comet', 'gravitas', 'galaxy', 'stardust', 'quasar', 'galaxy', 'comet']
word_counts = Counter(phrases)
# Discover the 2 most typical phrases
most_common = word_counts.most_common(2)
# Output outcomes
print(f"Phrase counts: {word_counts}")
print(f"Commonest phrases: {most_common}")
Output:
Phrase counts: Counter({'galaxy': 3, 'comet': 2, 'nebula': 1, 'asteroid': 1, 'gravitas': 1, 'stardust': 1, 'quasar': 1})
Commonest phrases: [('galaxy', 3), ('comet', 2)]
2. Creating Light-weight Lessons with namedtuple
If you want a easy class only for grouping knowledge, with out strategies, a namedtuple
is a helpful, memory-efficient choice. It lets you create tuple-like objects which have fields accessible by attribute lookup in addition to being indexable and iterable. This makes your code extra readable than utilizing a typical tuple.
from collections import namedtuple
# Outline a E-book namedtuple
# Fields: title, creator, year_published, isbn
E-book = namedtuple('E-book', ['title', 'author', 'year_published', 'isbn'])
# Create an occasion of the E-book
my_book = E-book(
title="The Hitchhiker"s Information to the Galaxy',
creator="Douglas Adams",
year_published=1979,
isbn='978-0345391803'
)
print(f"E-book Title: {my_book.title}")
print(f"Creator: {my_book.creator}")
print(f"Yr Revealed: {my_book.year_published}")
print(f"ISBN: {my_book.isbn}")
print("n--- Accessing by index ---")
print(f"Title (by index): {my_book[0]}")
print(f"Creator (by index): {my_book[1]}")
print(f"Yr Revealed (by index): {my_book[2]}")
print(f"ISBN (by index): {my_book[3]}")
Output:
Accessing e book knowledge by area title
Title (by area title): The Hitchhiker's Information to the Galaxy
Creator (by area title): Douglas Adams
Yr Revealed (by area title): 1979
ISBN (by area title): 978-0345391803
Accessing e book knowledge by index
Title (by index): The Hitchhiker's Information to the Galaxy
Creator (by index): Douglas Adams
Yr Revealed (by index): 1979
ISBN (by index): 978-0345391803
You may consider a namedtuple
as just like a mutable C struct, or as an information class with out strategies. They positively have their makes use of.
3. Dealing with Lacking Dictionary Keys Gracefully with defaultdict
A typical frustration when working with dictionaries is the KeyError
that happens once you attempt to entry a key that does not exist. The collections.defaultdict
is the right resolution. It is a subclass of dict
that calls a manufacturing unit perform to produce a default worth for lacking keys. That is particularly helpful for grouping objects.
from collections import defaultdict
# Group a listing of tuples by the primary factor
scores_by_round = [('contestantA', 8), ('contestantB', 7), ('contestantC', 5),
('contestantA', 7), ('contestantB', 7), ('contestantC', 6),
('contestantA', 9), ('contestantB', 5), ('contestantC', 4)]
grouped_scores = defaultdict(listing)
for key, worth in scores_by_round:
grouped_scores[key].append(worth)
print(f"Grouped scores: {grouped_scores}")
Output:
Grouped scores: defaultdict(, {'contestantA': [8, 7, 9], 'contestantB': [7, 7, 5], 'contestantC': [5, 6, 4]})
4. Implementing Quick Queues and Stacks with deque
Python lists can be utilized as stacks and queues, regardless that they aren’t optimized for these operations. Appending and popping from the tip of a listing is quick, however doing the identical from the start is sluggish as a result of all different components must be shifted. The collections.deque
(double-ended queue) is designed for quick appends and pops from each ends.
First, this is an instance of a queue utilizing deque
.
from collections import deque
# Create a queue
d = deque([1, 2, 3])
print(f"Unique queue: {d}")
# Add to the precise
d.append(4)
print("Including merchandise to queue: 4")
print(f"New queue: {d}")
# Take away from the left
print(f"Popping queue merchandise (from left): {d.popleft()}")
# Output closing queue
print(f"Ultimate queue: {d}")
 
Output:
Unique queue: deque([1, 2, 3])
Including merchandise to queue: 4
New queue: deque([1, 2, 3, 4])
Popping queue merchandise (from left): 1
Ultimate queue: deque([2, 3, 4])
And now let’s use deque
to create a stack:
from collections import deque
# Create a stack
d = deque([1, 2, 3])
print(f"Unique stack: {d}")
# Add to the precise
d.append(5)
print("Including merchandise to stack: 5")
print(f"New stack: {d}")
# Take away from the precise
print(f"Popping stack merchandise (from proper): {d.pop()}")
# Output closing stack
print(f"Ultimate stack: {d}")
Output:
Unique stack: deque([1, 2, 3])
Including merchandise to stack: 5
New stack: deque([1, 2, 3, 5])
Popping stack merchandise (from proper): 5
Ultimate stack: deque([1, 2, 3])
5. Remembering Insertion Order with OrderedDict
Earlier than Python 3.7, normal dictionaries didn’t protect the order during which objects had been inserted. To resolve this, the collections.OrderedDict
was used. Whereas normal dicts now preserve insertion order, OrderedDict
nonetheless has distinctive options, just like the move_to_end()
methodology, which is helpful for duties like making a easy cache.
from collections import OrderedDict
# An OrderedDict remembers the order of insertion
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
print(f"Begin order: {listing(od.keys())}")
# Transfer 'a' to the tip
od.move_to_end('a')
print(f"Ultimate order: {listing(od.keys())}")
Output:
Begin order: ['a', 'b', 'c']
Ultimate order: ['b', 'c', 'a']
6. Combining A number of Dictionaries with ChainMap
The collections.ChainMap
class gives a method to hyperlink a number of dictionaries collectively to allow them to be handled as a single unit. It is typically a lot quicker than creating a brand new dictionary and working a number of replace()
calls. Lookups search the underlying mappings one after the other till a secret’s discovered.
Let’s create a ChainMap named chain and question it for keys.
from collections import ChainMap
# Create dictionaries
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
# Create a ChainMap
chain = ChainMap(dict1, dict2)
# Print dictionaries
print(f"dict1: {dict1}")
print(f"dict2: {dict2}")
# Question ChainMap for keys and return values
print("nQuerying ChainMap for keys")
print(f"a: {chain['a']}")
print(f"c: {chain['c']}")
print(f"b: {chain['b']}")
Output:
dict1: {'a': 1, 'b': 2}
dict2: {'b': 3, 'c': 4}
Querying keys for values
a: 1
c: 4
b: 2
Word that, within the above situation, ‘b’ is present in first in dict1
, the primary dictionary in chain
, and so it’s the worth related to this key that’s returned.
7. Holding a Restricted Historical past with deque’s maxlen
A deque
may be created with a set most size utilizing the maxlen
argument. If extra objects are added than the utmost size, the objects from the alternative finish are mechanically discarded. That is excellent for holding a historical past of the final N objects.
from collections import deque
# Hold a historical past of the final 3 objects
historical past = deque(maxlen=3)
historical past.append("cd ~")
historical past.append("ls -l")
historical past.append("pwd")
print(f"Begin historical past: {historical past}")
# Add a brand new merchandise, push out the left-most merchandise
historical past.append("mkdir knowledge")
print(f"Ultimate historical past: {historical past}")
Output:
Begin historical past: deque(['cd ~', 'ls -l', 'pwd'], maxlen=3)
Ultimate historical past: deque(['ls -l', 'pwd', 'mkdir data'], maxlen=3)
8. Creating Nested Dictionaries Simply with defaultdict
Constructing on defaultdict
, you’ll be able to create nested or tree-like dictionaries with ease. By offering a lambda
perform that returns one other defaultdict
, you’ll be able to create dictionaries of dictionaries on the fly.
from collections import defaultdict
import json
# A perform that returns a defaultdict
def tree():
return defaultdict(tree)
# Create a nested dictionary
nested_dict = tree()
nested_dict['users']['user1']['name'] = 'Felix'
nested_dict['users']['user1']['email'] = 'user1@instance.com'
nested_dict['users']['user1']['phone'] = '515-KL5-5555'
# Output formatted JSON to console
print(json.dumps(nested_dict, indent=2))
Output:
{
"customers": {
"user1": {
"title": "Felix",
"electronic mail": "user1@instance.com",
"cellphone": "515-KL5-5555"
}
}
}
9. Performing Arithmetic Operations on Counters
Information flash: you’ll be able to carry out arithmetic operations, resembling addition, subtraction, intersection, and union, on Counter
objects. This can be a highly effective software for evaluating and mixing frequency counts from completely different sources.
from collections import Counter
c1 = Counter(a=4, b=2, c=0, d=-2)
c2 = Counter(a=1, b=2, c=3, d=4)
# Add counters -> provides counts for frequent keys
print(f"c1 + c2 = {c1 + c2}")
# Subtract counters -> retains solely optimistic counts
print(f"c1 - c2 = {c1 - c2}")
# Intersection -> takes minimal of counts
print(f"c1 & c2 = {c1 & c2}")
# Union -> takes most of counts
print(f"c1 | c2 = c2")
Output:
c1 + c2 = Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2})
c1 - c2 = Counter({'a': 3})
c1 & c2 = Counter({'b': 2, 'a': 1})
c1 | c2 = Counter({'a': 4, 'd': 4, 'c': 3, 'b': 2})
10. Effectively Rotating Components with deque
The deque
object has a rotate()
methodology that lets you rotate the weather effectively. A optimistic argument rotates components to the precise; a adverse, to the left. That is a lot quicker than slicing and re-joining lists or tuples.
from collections import deque
d = deque([1, 2, 3, 4, 5])
print(f"Unique deque: {d}")
# Rotate 2 steps to the precise
d.rotate(2)
print(f"After rotating 2 to the precise: {d}")
# Rotate 3 steps to the left
d.rotate(-3)
print(f"After rotating 3 to the left: {d}")
Output:
Unique deque: deque([1, 2, 3, 4, 5])
After rotating 2 to the precise: deque([4, 5, 1, 2, 3])
After rotating 3 to the left: deque([2, 3, 4, 5, 1])
Wrapping Up
The collections
module in Python is a killer assortment of specialised, high-performance container datatypes. From counting objects with Counter
to constructing environment friendly queues with deque
, these instruments could make your code cleaner, extra environment friendly, and extra Pythonic. By familiarizing your self with these stunning and highly effective options, you’ll be able to clear up frequent programming issues in a extra elegant and efficient approach.
Matthew Mayo (@mattmayo13) holds a grasp’s diploma in pc science and a graduate diploma in knowledge mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Studying Mastery, Matthew goals to make complicated knowledge science ideas accessible. His skilled pursuits embrace pure language processing, language fashions, machine studying algorithms, and exploring rising AI. He’s pushed by a mission to democratize data within the knowledge science neighborhood. Matthew has been coding since he was 6 years outdated.