Python’s iteration capabilities are among its most powerful features, enabling developers to write clean, efficient, and expressive code. Whether you’re processing data, building algorithms, or working with collections, understanding iteration is fundamental to writing Pythonic code. This comprehensive guide will take you from basic loop concepts to advanced iterator patterns, helping you master one of Python’s core strengths.
Table of Contents #
- Understanding Basic Loops
- Advanced Iteration Techniques
- List Comprehensions and Generator Expressions
- Custom Iterators and Generators
- The itertools Module
- Performance Optimization and Best Practices
- Common Pitfalls and How to Avoid Them
- Practical Examples and Exercises
Understanding Basic Loops #
Python’s iteration model is built on the concept of iterables—objects that can return their elements one at a time. Let’s explore the fundamental iteration constructs that every Python developer should master.
The for Loop: Python’s Primary Iteration Tool #
The for
loop is the most common way to iterate in Python. Unlike many other languages that use index-based iteration, Python’s for
loop works directly with iterable objects, making code more readable and less error-prone.
# Iterating over a list
fruits = ['apple', 'banana', 'cherry', 'date']
for fruit in fruits:
print(f"Processing {fruit}")
# Output: Processing apple, Processing banana, etc.
# Iterating over a string (strings are iterable!)
word = "Python"
for character in word:
print(character.upper())
# Output: P, Y, T, H, O, N
# Iterating over a dictionary
person = {'name': 'Alice', 'age': 30, 'city': 'Beijing'}
for key, value in person.items():
print(f"{key}: {value}")
# Iterating over dictionary keys only
for key in person:
print(key)
# Iterating over dictionary values only
for value in person.values():
print(value)
Understanding range(): Generating Numeric Sequences #
The range()
function is a memory-efficient way to generate sequences of numbers. It doesn’t create a list in memory but instead generates numbers on-the-fly as you iterate.
# Basic range usage
for i in range(5):
print(i) # Prints: 0, 1, 2, 3, 4
# range with start and stop
for i in range(2, 7):
print(i) # Prints: 2, 3, 4, 5, 6
# range with start, stop, and step
for i in range(1, 10, 2):
print(i) # Prints: 1, 3, 5, 7, 9
# Counting backwards
for i in range(10, 0, -1):
print(i) # Prints: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1
# Creating a list from range (if you really need it)
numbers = list(range(5)) # [0, 1, 2, 3, 4]
# Using range with len() for index-based iteration
# (Note: usually enumerate() is better for this)
items = ['a', 'b', 'c']
for i in range(len(items)):
print(f"Index {i}: {items[i]}")
The while Loop: Condition-Based Iteration #
While for
loops are more common in Python, while
loops are essential when you need to iterate based on a condition rather than a sequence.
# Basic while loop
count = 0
while count < 5:
print(f"Count is {count}")
count += 1
# while loop with break
while True:
user_input = input("Enter 'quit' to exit: ")
if user_input == 'quit':
break
print(f"You entered: {user_input}")
# while loop with continue
number = 0
while number < 10:
number += 1
if number % 2 == 0:
continue # Skip even numbers
print(number) # Only prints odd numbers
Advanced Iteration Techniques #
Once you’ve mastered basic loops, Python offers several powerful built-in functions that make iteration more elegant and efficient.
enumerate(): Access Both Index and Value #
The enumerate()
function is essential when you need both the index and the value during iteration. It returns tuples containing the index and value for each element.
colors = ['red', 'green', 'blue', 'yellow']
# Basic enumeration (index starts at 0)
for index, color in enumerate(colors):
print(f"Color {index}: {color}")
# Output: Color 0: red, Color 1: green, etc.
# Start enumeration at a custom number
for index, color in enumerate(colors, start=1):
print(f"Color #{index}: {color}")
# Output: Color #1: red, Color #2: green, etc.
# Practical example: finding positions of specific elements
text = "hello world"
vowels = 'aeiou'
vowel_positions = [(idx, char) for idx, char in enumerate(text) if char in vowels]
print(vowel_positions) # [(1, 'e'), (4, 'o'), (7, 'o')]
# Creating a dictionary with enumeration
fruits = ['apple', 'banana', 'cherry']
fruit_dict = {idx: fruit for idx, fruit in enumerate(fruits)}
print(fruit_dict) # {0: 'apple', 1: 'banana', 2: 'cherry'}
zip(): Parallel Iteration Over Multiple Sequences #
The zip()
function allows you to iterate over multiple sequences simultaneously, pairing up elements from each sequence.
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
cities = ['Shanghai', 'London', 'Paris']
# Zip two sequences
for name, age in zip(names, ages):
print(f"{name} is {age} years old")
# Zip three or more sequences
for name, age, city in zip(names, ages, cities):
print(f"{name} is {age} years old and lives in {city}")
# Create a dictionary from two lists
person_dict = dict(zip(names, ages))
print(person_dict) # {'Alice': 25, 'Bob': 30, 'Charlie': 35}
# Important: zip stops at the shortest sequence
numbers = [1, 2, 3, 4, 5]
letters = ['a', 'b', 'c']
pairs = list(zip(numbers, letters))
print(pairs) # [(1, 'a'), (2, 'b'), (3, 'c')] - stops at 3
# Using zip_longest for different length sequences
from itertools import zip_longest
numbers = [1, 2]
letters = ['a', 'b', 'c', 'd']
for num, letter in zip_longest(numbers, letters, fillvalue=None):
print(f"Number: {num}, Letter: {letter}")
# Output includes: Number: None, Letter: c
# Unzipping: converting columns to rows
pairs = [(1, 'a'), (2, 'b'), (3, 'c')]
numbers, letters = zip(*pairs)
print(numbers) # (1, 2, 3)
print(letters) # ('a', 'b', 'c')
reversed(): Iterate in Reverse Order #
The reversed()
function returns a reverse iterator without creating a new list in memory.
numbers = [1, 2, 3, 4, 5]
# Iterate in reverse
for num in reversed(numbers):
print(num) # Prints: 5, 4, 3, 2, 1
# Works with strings
word = "Python"
for char in reversed(word):
print(char) # Prints: n, o, h, t, y, P
# Create a reversed list if needed
reversed_list = list(reversed(numbers))
print(reversed_list) # [5, 4, 3, 2, 1]
sorted(): Iterate Over Sorted Sequences #
The sorted()
function returns a new sorted list from any iterable, allowing you to iterate over elements in order.
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
# Iterate over sorted sequence
for num in sorted(numbers):
print(num) # Prints in ascending order
# Sort in reverse
for num in sorted(numbers, reverse=True):
print(num) # Prints in descending order
# Sort with a custom key
words = ['apple', 'pie', 'a', 'longer']
for word in sorted(words, key=len):
print(word) # Sorts by length: a, pie, apple, longer
# Sort dictionary by values
scores = {'Alice': 85, 'Bob': 92, 'Charlie': 78}
for name, score in sorted(scores.items(), key=lambda x: x[1], reverse=True):
print(f"{name}: {score}") # Prints in descending score order
List Comprehensions and Generator Expressions #
Comprehensions are one of Python’s most elegant features, allowing you to create new sequences in a single, readable line of code.
List Comprehensions: Elegant List Creation #
List comprehensions provide a concise way to create lists based on existing sequences or ranges.
# Basic list comprehension
numbers = [1, 2, 3, 4, 5]
squares = [n ** 2 for n in numbers]
print(squares) # [1, 4, 9, 16, 25]
# List comprehension with conditional
even_squares = [n ** 2 for n in numbers if n % 2 == 0]
print(even_squares) # [4, 16]
# List comprehension with if-else
parity = ['even' if n % 2 == 0 else 'odd' for n in numbers]
print(parity) # ['odd', 'even', 'odd', 'even', 'odd']
# Nested list comprehension
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flattened = [num for row in matrix for num in row]
print(flattened) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Creating a matrix with nested comprehension
matrix = [[i * j for j in range(1, 4)] for i in range(1, 4)]
print(matrix) # [[1, 2, 3], [2, 4, 6], [3, 6, 9]]
# String manipulation with list comprehension
words = ['hello', 'world', 'python']
capitalized = [word.capitalize() for word in words]
print(capitalized) # ['Hello', 'World', 'Python']
Dictionary and Set Comprehensions #
Python also supports comprehensions for dictionaries and sets.
# Dictionary comprehension
numbers = [1, 2, 3, 4, 5]
square_dict = {n: n ** 2 for n in numbers}
print(square_dict) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Dictionary comprehension with conditional
even_square_dict = {n: n ** 2 for n in numbers if n % 2 == 0}
print(even_square_dict) # {2: 4, 4: 16}
# Swapping keys and values
original = {'a': 1, 'b': 2, 'c': 3}
swapped = {value: key for key, value in original.items()}
print(swapped) # {1: 'a', 2: 'b', 3: 'c'}
# Set comprehension
sentence = "hello world"
unique_letters = {char.lower() for char in sentence if char.isalpha()}
print(unique_letters) # {'h', 'e', 'l', 'o', 'w', 'r', 'd'}
# Set comprehension for filtering
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique_evens = {n for n in numbers if n % 2 == 0}
print(unique_evens) # {2, 4}
Generator Expressions: Memory-Efficient Iteration #
Generator expressions look like list comprehensions but use parentheses instead of square brackets. They generate values on-the-fly rather than creating a list in memory.
# Generator expression
numbers = range(1000000)
squares_gen = (n ** 2 for n in numbers)
# Generators are memory efficient - they don't store all values
print(squares_gen) # <generator object at 0x...>
# You can iterate over a generator
for square in (n ** 2 for n in range(5)):
print(square) # Prints: 0, 1, 4, 9, 16
# Generators can only be iterated once
gen = (n for n in range(3))
print(list(gen)) # [0, 1, 2]
print(list(gen)) # [] - generator is exhausted!
# Use generators with functions that accept iterables
sum_of_squares = sum(n ** 2 for n in range(1000000))
max_value = max(n for n in range(100) if n % 7 == 0)
# Generator expression with conditional
even_numbers = (n for n in range(20) if n % 2 == 0)
print(list(even_numbers)) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Custom Iterators and Generators #
Understanding how to create your own iterators and generators gives you powerful control over iteration behavior.
Creating Custom Iterator Classes #
To create a custom iterator, you need to implement the __iter__()
and __next__()
methods.
class CountDown:
"""Iterator that counts down from a starting number."""
def __init__(self, start):
self.current = start
def __iter__(self):
# Return the iterator object (self)
return self
def __next__(self):
if self.current <= 0:
raise StopIteration # Signal end of iteration
result = self.current
self.current += timedelta(days=1)
return result
# Usage
start = date(2024, 1, 1)
end = date(2024, 1, 7)
for day in DateRange(start, end):
print(day.strftime('%Y-%m-%d'))
# Generator version (simpler!)
def date_range(start_date, end_date):
"""Generator function for date ranges."""
current = start_date
while current <= end_date:
yield current
current += timedelta(days=1)
# Usage
for day in date_range(date(2024, 1, 1), date(2024, 1, 7)):
print(day.strftime('%Y-%m-%d'))
Example 2: Processing CSV-Like Data #
def parse_csv_lines(lines):
"""Generator that parses CSV lines efficiently."""
for line in lines:
yield line.strip().split(',')
# Simulating CSV data
csv_data = [
'Name,Age,City',
'Alice,25,Shanghai',
'Bob,30,Beijing',
'Charlie,35,Shenzhen'
]
# Process header and data separately
lines = iter(csv_data)
header = next(lines).split(',')
# Process each row
for row in parse_csv_lines(lines):
person = dict(zip(header, row))
print(person)
# Output: {'Name': 'Alice', 'Age': '25', 'City': 'Shanghai'}, etc.
Example 3: Batch Processing with Iterators #
def batch(iterable, batch_size):
"""
Generator that yields batches of items from an iterable.
Useful for processing large datasets in chunks.
"""
batch_list = []
for item in iterable:
batch_list.append(item)
if len(batch_list) == batch_size:
yield batch_list
batch_list = []
# Don't forget the last partial batch
if batch_list:
yield batch_list
# Usage: process data in batches of 3
data = range(10)
for batch_items in batch(data, 3):
print(f"Processing batch: {batch_items}")
# Output: [0, 1, 2], [3, 4, 5], [6, 7, 8], [9]
# Alternative using itertools
from itertools import islice
def batch_itertools(iterable, batch_size):
"""More efficient batch generator using itertools."""
iterator = iter(iterable)
while True:
batch_list = list(islice(iterator, batch_size))
if not batch_list:
break
yield batch_list
# Usage
for batch_items in batch_itertools(range(10), 3):
print(f"Processing batch: {batch_items}")
Example 4: Sliding Window Iterator #
from collections import deque
def sliding_window(iterable, window_size):
"""
Generator that yields sliding windows of items.
Useful for analyzing sequences and time series data.
"""
iterator = iter(iterable)
window = deque(maxlen=window_size)
# Fill the initial window
for _ in range(window_size):
try:
window.append(next(iterator))
except StopIteration:
return
yield list(window)
# Slide the window
for item in iterator:
window.append(item)
yield list(window)
# Usage: moving average calculation
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for window in sliding_window(numbers, 3):
average = sum(window) / len(window)
print(f"Window: {window}, Average: {average:.2f}")
# Output: Window: [1, 2, 3], Average: 2.00
# Window: [2, 3, 4], Average: 3.00, etc.
Example 5: Filtering and Transforming Data Pipeline #
def read_numbers(filename):
"""Generator: read numbers from file."""
with open(filename) as f:
for line in f:
try:
yield int(line.strip())
except ValueError:
continue # Skip invalid lines
def filter_positive(numbers):
"""Generator: filter positive numbers."""
for num in numbers:
if num > 0:
yield num
def square(numbers):
"""Generator: square each number."""
for num in numbers:
yield num ** 2
def take_first_n(iterable, n):
"""Generator: take first n items."""
for i, item in enumerate(iterable):
if i >= n:
break
yield item
# Create a processing pipeline
# pipeline = take_first_n(square(filter_positive(read_numbers('data.txt'))), 5)
# Or using a more readable approach
def process_numbers(filename, n):
"""Process numbers through a pipeline."""
numbers = read_numbers(filename)
positive = filter_positive(numbers)
squared = square(positive)
result = take_first_n(squared, n)
return list(result)
# This entire pipeline is memory-efficient because it uses generators!
Example 6: Tree Traversal Iterator #
class TreeNode:
"""Simple binary tree node."""
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def traverse_inorder(node):
"""Generator for in-order tree traversal."""
if node is not None:
# Traverse left subtree
yield from traverse_inorder(node.left)
# Visit current node
yield node.value
# Traverse right subtree
yield from traverse_inorder(node.right)
def traverse_preorder(node):
"""Generator for pre-order tree traversal."""
if node is not None:
yield node.value
yield from traverse_preorder(node.left)
yield from traverse_preorder(node.right)
def traverse_postorder(node):
"""Generator for post-order tree traversal."""
if node is not None:
yield from traverse_postorder(node.left)
yield from traverse_postorder(node.right)
yield node.value
# Build a sample tree
# 4
# / \
# 2 6
# / \ / \
# 1 3 5 7
root = TreeNode(4,
TreeNode(2, TreeNode(1), TreeNode(3)),
TreeNode(6, TreeNode(5), TreeNode(7))
)
# Traverse the tree
print("In-order:", list(traverse_inorder(root))) # [1, 2, 3, 4, 5, 6, 7]
print("Pre-order:", list(traverse_preorder(root))) # [4, 2, 1, 3, 6, 5, 7]
print("Post-order:", list(traverse_postorder(root))) # [1, 3, 2, 5, 7, 6, 4]
Example 7: Infinite Sequence Generators #
def fibonacci_infinite():
"""Generate Fibonacci numbers infinitely."""
a, b = 0, 1
while True:
yield a
a, b = b, a + b
def primes_infinite():
"""Generate prime numbers infinitely."""
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
n = 2
while True:
if is_prime(n):
yield n
n += 1
def collatz_sequence(n):
"""Generate Collatz sequence for a given number."""
while n != 1:
yield n
if n % 2 == 0:
n = n // 2
else:
n = 3 * n + 1
yield 1
# Usage with islice to get finite results
from itertools import islice
# First 15 Fibonacci numbers
fibs = list(islice(fibonacci_infinite(), 15))
print(f"Fibonacci: {fibs}")
# First 10 primes
primes = list(islice(primes_infinite(), 10))
print(f"Primes: {primes}")
# Collatz sequence for 13
collatz = list(collatz_sequence(13))
print(f"Collatz(13): {collatz}") # [13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
Practice Exercises #
Challenge yourself with these exercises to reinforce your understanding:
Exercise 1: Chunk Text into Fixed-Length Pieces #
Write a generator that splits text into chunks of a specified length, preserving word boundaries when possible.
def chunk_text(text, chunk_size):
"""
Split text into chunks of approximately chunk_size characters,
preserving word boundaries.
"""
words = text.split()
current_chunk = []
current_length = 0
for word in words:
word_length = len(word) + 1 # +1 for space
if current_length + word_length > chunk_size and current_chunk:
yield ' '.join(current_chunk)
current_chunk = []
current_length = 0
current_chunk.append(word)
current_length += word_length
if current_chunk:
yield ' '.join(current_chunk)
# Test
text = "Python iteration is powerful and elegant. It allows you to write clean and efficient code."
for i, chunk in enumerate(chunk_text(text, 30), 1):
print(f"Chunk {i}: {chunk}")
Exercise 2: Flatten Nested Iterables #
Write a generator that flattens arbitrarily nested iterables.
def flatten(iterable):
"""
Recursively flatten nested iterables.
"""
for item in iterable:
if isinstance(item, (list, tuple)):
yield from flatten(item)
else:
yield item
# Test
nested = [1, [2, 3, [4, 5]], 6, [7, [8, 9]]]
flat = list(flatten(nested))
print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
Exercise 3: Running Statistics Generator #
Create a generator that yields running statistics (mean, min, max) as it processes numbers.
def running_stats(numbers):
"""
Generator that yields running statistics for a sequence of numbers.
"""
total = 0
count = 0
min_val = float('inf')
max_val = float('-inf')
for num in numbers:
total += num
count += 1
min_val = min(min_val, num)
max_val = max(max_val, num)
mean = total / count
yield {
'count': count,
'mean': mean,
'min': min_val,
'max': max_val,
'current': num
}
# Test
numbers = [10, 5, 20, 15, 8]
for stats in running_stats(numbers):
print(f"After {stats['count']} numbers: "
f"mean={stats['mean']:.2f}, "
f"min={stats['min']}, "
f"max={stats['max']}")
Exercise 4: Pairwise Iterator #
Create a generator that yields consecutive pairs from an iterable.
def pairwise(iterable):
"""
Generate consecutive pairs from an iterable.
s -> (s0,s1), (s1,s2), (s2, s3), ...
"""
iterator = iter(iterable)
try:
prev = next(iterator)
except StopIteration:
return
for item in iterator:
yield prev, item
prev = item
# Test
numbers = [1, 2, 3, 4, 5]
for pair in pairwise(numbers):
print(pair)
# Output: (1, 2), (2, 3), (3, 4), (4, 5)
# Using itertools (Python 3.10+)
from itertools import pairwise as itertools_pairwise
pairs = list(itertools_pairwise(numbers))
print(pairs)
Exercise 5: Custom groupby Implementation #
Implement a simplified version of itertools.groupby.
def simple_groupby(iterable, key=None):
"""
Simplified groupby that groups consecutive elements by key.
"""
if key is None:
key = lambda x: x
iterator = iter(iterable)
try:
current_item = next(iterator)
except StopIteration:
return
current_key = key(current_item)
group = [current_item]
for item in iterator:
item_key = key(item)
if item_key == current_key:
group.append(item)
else:
yield current_key, group
current_key = item_key
group = [item]
yield current_key, group
# Test
data = ['a', 'a', 'b', 'b', 'b', 'c', 'a', 'a']
for key, group in simple_groupby(data):
print(f"{key}: {group}")
# Output: a: ['a', 'a']
# b: ['b', 'b', 'b']
# c: ['c']
# a: ['a', 'a']
Summary and Best Practices #
As you’ve learned throughout this guide, Python’s iteration capabilities are both powerful and elegant. Here are the key takeaways to remember:
When to Use Each Iteration Tool #
for
loops: Default choice for most iteration tasksenumerate()
: When you need both index and valuezip()
: For parallel iteration over multiple sequences- List comprehensions: For creating new lists with transformations
- Generator expressions: For memory-efficient transformations
- Generator functions: For complex iteration logic or infinite sequences
itertools
: For advanced iteration patterns and combinations
Performance Guidelines #
- Use generators for large datasets or when memory is a concern
- Avoid modifying sequences while iterating over them
- Prefer comprehensions over manual append loops for better performance
- Use
itertools
functions for complex iteration patterns instead of manual loops - Batch process large datasets to balance memory and performance
Code Quality Tips #
- Write clear, readable code - Python’s iteration tools enable expressive solutions
- Choose the most appropriate tool for each task rather than forcing one approach
- Use meaningful variable names in iteration (avoid single letters unless in mathematical contexts)
- Add docstrings to custom iterators and generators explaining their behavior
- Consider memory implications when choosing between lists and generators
Common Patterns to Remember #
# Iterate with index
for i, item in enumerate(items):
pass
# Parallel iteration
for x, y in zip(list1, list2):
pass
# Iterate in reverse
for item in reversed(items):
pass
# Iterate over sorted items
for item in sorted(items):
pass
# Filter and transform
result = [transform(x) for x in items if condition(x)]
# Generator for memory efficiency
gen = (transform(x) for x in large_dataset if condition(x))
Python’s iteration model is one of the language’s greatest strengths, enabling you to write code that is both efficient and readable. By mastering these concepts and patterns, you’ll be well-equipped to handle any iteration challenge that comes your way.
Remember: the best code is not just correct, but also clear, maintainable, and efficient. Python’s iteration tools help you achieve all three goals simultaneously.
Additional Resources #
- Python Official Documentation: Iterators
- Python Official Documentation: Generators
- itertools Documentation
- PEP 255 - Simple Generators
- PEP 289 - Generator Expressions
Happy coding, and may your iterations always be efficient and elegant! -= 1 return result
Using the custom iterator #
for num in CountDown(5): print(num) # Prints: 5, 4, 3, 2, 1
More complex example: Fibonacci iterator #
class Fibonacci: “““Iterator that generates Fibonacci numbers.”””
def __init__(self, max_count):
self.max_count = max_count
self.count = 0
self.a, self.b = 0, 1
def __iter__(self):
return self
def __next__(self):
if self.count >= self.max_count:
raise StopIteration
result = self.a
self.a, self.b = self.b, self.a + self.b
self.count += 1
return result
Generate first 10 Fibonacci numbers #
for num in Fibonacci(10): print(num) # Prints: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34
### Generator Functions: The Elegant Approach
Generator functions use the `yield` keyword to produce values one at a time, making them much simpler to write than iterator classes.
```python
def countdown(start):
"""Generator function that counts down from start."""
while start > 0:
yield start
start -= 1
# Using the generator
for num in countdown(5):
print(num) # Prints: 5, 4, 3, 2, 1
# Fibonacci generator
def fibonacci(n):
"""Generate first n Fibonacci numbers."""
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + b
# Using the generator
fib_numbers = list(fibonacci(10))
print(fib_numbers) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
# Infinite generator (use with caution!)
def count_forever(start=0, step=1):
"""Generate an infinite sequence of numbers."""
current = start
while True:
yield current
current += step
# Using with itertools.islice to get finite results
from itertools import islice
first_ten = list(islice(count_forever(), 10))
print(first_ten) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Generator with input processing
def process_lines(filename):
"""Generator that reads and processes lines from a file."""
with open(filename, 'r') as f:
for line in f:
# Process each line without loading entire file into memory
yield line.strip().upper()
# Prime number generator
def primes():
"""Generate an infinite sequence of prime numbers."""
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
n = 2
while True:
if is_prime(n):
yield n
n += 1
# Get first 10 primes
first_ten_primes = list(islice(primes(), 10))
print(first_ten_primes) # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
Generator Expressions vs Generator Functions #
Both create generators, but they have different use cases.
# Generator expression: concise, single-line
squares = (x ** 2 for x in range(10))
# Generator function: more control, complex logic
def squares_gen(n):
for x in range(n):
print(f"Generating square of {x}")
yield x ** 2
# Generator functions can have setup and teardown code
def file_reader(filename):
print(f"Opening {filename}")
with open(filename) as f:
for line in f:
yield line.strip()
print(f"Closing {filename}")
The itertools Module #
The itertools
module provides a collection of fast, memory-efficient tools for working with iterators.
Infinite Iterators #
from itertools import count, cycle, repeat
# count: infinite counter
for i in count(10, 2):
if i > 20:
break
print(i) # Prints: 10, 12, 14, 16, 18, 20
# cycle: cycle through an iterable infinitely
colors = ['red', 'green', 'blue']
color_cycle = cycle(colors)
for i, color in enumerate(color_cycle):
if i >= 7:
break
print(color) # Prints: red, green, blue, red, green, blue, red
# repeat: repeat an object
for item in repeat('Hello', 3):
print(item) # Prints: Hello, Hello, Hello
Combinatoric Iterators #
from itertools import (
combinations, combinations_with_replacement,
permutations, product
)
items = ['A', 'B', 'C']
# combinations: all r-length combinations
for combo in combinations(items, 2):
print(combo)
# Output: ('A', 'B'), ('A', 'C'), ('B', 'C')
# combinations_with_replacement: combinations allowing repeated elements
for combo in combinations_with_replacement(items, 2):
print(combo)
# Output: ('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')
# permutations: all r-length permutations
for perm in permutations(items, 2):
print(perm)
# Output: ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')
# product: cartesian product
for pair in product(['A', 'B'], [1, 2]):
print(pair)
# Output: ('A', 1), ('A', 2), ('B', 1), ('B', 2)
# product with repeat
for triplet in product([0, 1], repeat=3):
print(triplet)
# Output: All binary triplets from (0,0,0) to (1,1,1)
Filtering and Grouping Iterators #
from itertools import (
chain, compress, dropwhile, filterfalse,
groupby, islice, takewhile, tee
)
# chain: combine multiple iterables
list1 = [1, 2, 3]
list2 = [4, 5, 6]
for item in chain(list1, list2):
print(item) # Prints: 1, 2, 3, 4, 5, 6
# compress: filter based on boolean selectors
data = ['A', 'B', 'C', 'D']
selectors = [True, False, True, False]
result = list(compress(data, selectors))
print(result) # ['A', 'C']
# dropwhile: drop elements while predicate is true
numbers = [1, 3, 5, 2, 4, 6]
result = list(dropwhile(lambda x: x % 2 == 1, numbers))
print(result) # [2, 4, 6]
# takewhile: take elements while predicate is true
result = list(takewhile(lambda x: x % 2 == 1, numbers))
print(result) # [1, 3, 5]
# filterfalse: opposite of filter
numbers = [1, 2, 3, 4, 5, 6]
odd_numbers = list(filterfalse(lambda x: x % 2 == 0, numbers))
print(odd_numbers) # [1, 3, 5]
# islice: slice an iterator
numbers = range(100)
subset = list(islice(numbers, 5, 15, 2))
print(subset) # [5, 7, 9, 11, 13]
# groupby: group consecutive elements by key
data = [('A', 1), ('A', 2), ('B', 3), ('B', 4), ('C', 5)]
for key, group in groupby(data, key=lambda x: x[0]):
print(f"{key}: {list(group)}")
# Output: A: [('A', 1), ('A', 2)], B: [('B', 3), ('B', 4)], C: [('C', 5)]
# tee: create independent iterators from one
iterator = iter([1, 2, 3])
iter1, iter2 = tee(iterator, 2)
print(list(iter1)) # [1, 2, 3]
print(list(iter2)) # [1, 2, 3]
Performance Optimization and Best Practices #
Understanding performance implications helps you write efficient iteration code.
Memory Efficiency: Generators vs Lists #
import sys
# Lists consume memory proportional to their size
large_list = [x ** 2 for x in range(1000000)]
print(f"List size: {sys.getsizeof(large_list)} bytes")
# Generators consume constant memory
large_gen = (x ** 2 for x in range(1000000))
print(f"Generator size: {sys.getsizeof(large_gen)} bytes")
# Example: Processing large files
# Bad: loads entire file into memory
def read_file_bad(filename):
with open(filename) as f:
lines = f.readlines() # All lines in memory
return [line.strip() for line in lines]
# Good: processes line by line
def read_file_good(filename):
with open(filename) as f:
for line in f: # One line at a time
yield line.strip()
# Example: Chaining operations efficiently
# Bad: creates intermediate lists
numbers = range(1000000)
squares = [x ** 2 for x in numbers]
even_squares = [x for x in squares if x % 2 == 0]
result = sum(even_squares)
# Good: uses generators throughout
numbers = range(1000000)
squares = (x ** 2 for x in numbers)
even_squares = (x for x in squares if x % 2 == 0)
result = sum(even_squares)
Choosing the Right Iteration Tool #
# Use enumerate when you need indices
items = ['apple', 'banana', 'cherry']
for index, item in enumerate(items):
print(f"{index}: {item}")
# Use zip for parallel iteration
names = ['Alice', 'Bob']
ages = [25, 30]
for name, age in zip(names, ages):
print(f"{name}: {age}")
# Use map for simple transformations
numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
# Use filter for simple filtering
evens = list(filter(lambda x: x % 2 == 0, numbers))
# Use comprehensions for complex transformations
result = [x ** 2 for x in numbers if x % 2 == 0]
# Use itertools for advanced iteration patterns
from itertools import accumulate
cumulative = list(accumulate(numbers)) # [1, 3, 6, 10]
Loop Optimization Techniques #
# Avoid repeated attribute lookups
# Bad
for i in range(len(my_list)):
my_list.append(my_list[i] * 2) # len() called repeatedly
# Good
list_length = len(my_list)
for i in range(list_length):
my_list.append(my_list[i] * 2)
# Use local variables for frequently accessed values
# Bad
for item in items:
result.append(math.sqrt(item)) # Repeated attribute lookup
# Good
sqrt = math.sqrt
for item in items:
result.append(sqrt(item))
# Prefer list comprehensions over append loops
# Bad
squares = []
for x in range(100):
squares.append(x ** 2)
# Good
squares = [x ** 2 for x in range(100)]
Common Pitfalls and How to Avoid Them #
Learn from common mistakes to write more robust iteration code.
Don’t Modify Sequences While Iterating #
# WRONG: Modifying list while iterating
numbers = [1, 2, 3, 4, 5, 6]
for num in numbers:
if num % 2 == 0:
numbers.remove(num) # This causes unexpected behavior!
# Correct: Create a new list
numbers = [1, 2, 3, 4, 5, 6]
numbers = [num for num in numbers if num % 2 != 0]
# Or: Iterate over a copy
numbers = [1, 2, 3, 4, 5, 6]
for num in numbers[:]: # Iterate over a copy
if num % 2 == 0:
numbers.remove(num)
# For dictionaries: iterate over a copy of keys
person = {'name': 'Alice', 'age': 30, 'city': 'Beijing'}
for key in list(person.keys()): # Create a list copy
if key.startswith('c'):
del person[key]
Exhausted Generators #
# Generators can only be iterated once
gen = (x ** 2 for x in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
print(list(gen)) # [] - Generator is exhausted!
# Solution: recreate the generator or convert to a list
def make_gen():
return (x ** 2 for x in range(5))
gen1 = make_gen()
print(list(gen1)) # [0, 1, 4, 9, 16]
gen2 = make_gen()
print(list(gen2)) # [0, 1, 4, 9, 16]
# Or convert to list if you need multiple iterations
gen = (x ** 2 for x in range(5))
values = list(gen)
print(values) # [0, 1, 4, 9, 16]
print(values) # [0, 1, 4, 9, 16] - List can be reused
Avoiding Unnecessary List Creation #
# Bad: Creating intermediate lists
numbers = range(1000000)
result = list(map(str, numbers)) # Creates large list
for num_str in result:
process(num_str)
# Good: Use generator or iterator directly
numbers = range(1000000)
for num_str in map(str, numbers):
process(num_str)
# Bad: Using list() when not needed
total = sum(list(x ** 2 for x in range(1000)))
# Good: sum() accepts generators
total = sum(x ** 2 for x in range(1000))
Dictionary Iteration Best Practices #
# Iterating over keys (default behavior)
scores = {'Alice': 85, 'Bob': 92, 'Charlie': 78}
for name in scores:
print(name) # Just the keys
# Iterating over values
for score in scores.values():
print(score)
# Iterating over key-value pairs
for name, score in scores.items():
print(f"{name}: {score}")
# Getting default values during iteration
for name in ['Alice', 'David']:
score = scores.get(name, 0) # 0 if name not found
print(f"{name}: {score}")
Practical Examples and Exercises #
Let’s apply what we’ve learned to real-world scenarios.
Example 1: Custom Date Range Iterator #
from datetime import date, timedelta
class DateRange:
"""Iterator for date ranges."""
def __init__(self, start_date, end_date):
self.current = start_date
self.end_date = end_date
def __iter__(self):
return self
def __next__(self):
if self.current > self.end_date:
raise StopIteration
result = self.current
self.current