Functions and Modules¶
Version: 0.1 Year: 2026
Copyright Notice¶
Copyright (c) 2025-2026 Ryan Thomas Robson / Robworks Software LLC. Licensed under CC BY-NC-ND 4.0. You may share this material for non-commercial purposes with attribution, but you may not distribute modified versions.
You already know how to define functions with def and install packages with pip. This guide goes deeper on both fronts. Python treats functions as ordinary objects - you can pass them around, nest them, and transform them with decorators. On the module side, the import system that runs every time you write import os is a sophisticated machinery of finders, loaders, and caches. Understanding both halves - functions as building blocks and modules as the organizational layer - is what separates scripts from maintainable projects.
flowchart LR
A["First-Class Functions"] --> B["Closures"]
B --> C["Decorators"]
C --> D["Generators"]
D --> E["Import System"]
E --> F["Packages"]
F --> G["Environments"]
The guide follows the path from left to right: advanced function features first, then the module and package system, and finally the tooling that ties it all together.
First-Class Functions¶
In Python, functions are objects. A def statement creates a function object and binds it to a name, but that name is just a variable like any other. You can assign it, store it in a data structure, or pass it as an argument.
def add(a, b):
return a + b
def subtract(a, b):
return a - b
# Functions stored in a dictionary
operations = {
"add": add,
"subtract": subtract,
}
result = operations["add"](10, 3) # 13
This pattern - a dispatch table - replaces long if/elif chains with a clean lookup. It appears everywhere in Python: web framework route registries, CLI argument handlers, and plugin systems.
Functions can also accept other functions as arguments. You have already used this with built-in functions like sorted():
servers = [
{"name": "web-1", "load": 0.82},
{"name": "db-1", "load": 0.45},
{"name": "cache-1", "load": 0.91},
]
# Pass a function (lambda) as the sort key
by_load = sorted(servers, key=lambda s: s["load"])
And functions can return other functions. This is the foundation for closures and decorators, which you will see next.
Closures and Scope¶
When a function is defined inside another function, the inner function can reference variables from the enclosing scope. This creates a closure - the inner function "closes over" the enclosing variables, keeping them alive even after the outer function returns.
Python resolves variable names using the LEGB rule, checking four scopes in order:
- Local - names assigned inside the current function
- Enclosing - names in any enclosing function (for nested functions)
- Global - names at the module level
- Built-in - names in the
builtinsmodule (len,print,range, etc.)
limit = 100 # Global scope
def make_checker(threshold): # threshold is in the enclosing scope
def check(value): # value is in the local scope
return value > threshold
return check
is_high = make_checker(80)
is_high(95) # True - threshold=80 is captured in the closure
is_high(50) # False
The function make_checker is a factory function - it creates and returns a new function each time it is called. Each returned function carries its own copy of threshold.
The nonlocal keyword
If an inner function needs to modify an enclosing variable (not just read it), use nonlocal. Without it, an assignment creates a new local variable instead of updating the enclosing one.
def make_counter(start=0):
count = start
def increment():
nonlocal count
count += 1
return count
return increment
counter = make_counter()
counter() # 1
counter() # 2
counter() # 3
A practical use of closures is building configurable retry logic:
import time
def make_retrier(max_attempts=3, delay=1.0):
def retry(func, *args, **kwargs):
last_error = None
for attempt in range(1, max_attempts + 1):
try:
return func(*args, **kwargs)
except Exception as e:
last_error = e
if attempt < max_attempts:
time.sleep(delay)
raise last_error
return retry
cautious_retry = make_retrier(max_attempts=5, delay=2.0)
result = cautious_retry(fetch_data, "https://api.example.com/health")
Each call to make_retrier captures max_attempts and delay in a closure, producing a self-contained retry function with its own configuration.
Decorators¶
A decorator is a function that takes a function as input and returns a modified version of it. You have already seen the building blocks - first-class functions and closures. A decorator combines them into a pattern for wrapping behavior around existing functions.
Start with the manual approach:
import time
def timing(func):
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
print(f"{func.__name__} took {elapsed:.4f}s")
return result
return wrapper
def process_data(records):
# ... expensive computation ...
return sorted(records, key=lambda r: r["score"])
process_data = timing(process_data) # Manually wrap
The last line is what the @ syntax replaces. These two forms are equivalent:
# With @ syntax
@timing
def process_data(records):
return sorted(records, key=lambda r: r["score"])
# Without @ syntax
def process_data(records):
return sorted(records, key=lambda r: r["score"])
process_data = timing(process_data)
There is one problem with the simple wrapper above: the wrapped function loses its original name and docstring. Calling process_data.__name__ returns "wrapper" instead of "process_data". The functools.wraps decorator fixes this by copying metadata from the original function to the wrapper:
Decorators with Arguments¶
Sometimes you want a decorator that takes configuration. A decorator factory is a function that returns a decorator:
import functools
import time
def timing(threshold=0.0):
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
if elapsed > threshold:
print(f"SLOW: {func.__name__} took {elapsed:.4f}s")
return result
return wrapper
return decorator
@timing(threshold=0.5)
def process_batch(items):
# Only logs if execution exceeds 0.5 seconds
...
There are three layers: timing() returns decorator, which takes func and returns wrapper. The @timing(threshold=0.5) call executes the outer function first, producing the actual decorator.
Stacking Decorators¶
Multiple decorators execute from bottom to top:
@require_auth # 3rd: checks authentication
@validate_input # 2nd: validates the arguments
@timing() # 1st: wraps the innermost function
def create_user(username, email):
...
The function is first wrapped by @timing(), then that result is wrapped by @validate_input, and finally by @require_auth. When create_user() is called, require_auth runs first (outermost), then validate_input, then timing, then the original function.
Decorator order matters
If @timing() is above @require_auth, the timer includes the authentication check. If it is below, it only measures the core function. Place decorators deliberately based on what you want each one to see.
Generators and Iterators¶
A generator is a function that produces a sequence of values one at a time, pausing between each. Instead of building an entire list in memory and returning it, a generator yields values on demand.
def count_up(limit):
n = 1
while n <= limit:
yield n
n += 1
for number in count_up(5):
print(number) # 1, 2, 3, 4, 5
When Python encounters yield, the function's execution is suspended - local variables, instruction pointer, and all - until the next value is requested. This makes generators ideal for processing large datasets or infinite sequences without exhausting memory.
Generator State¶
A generator object has four possible states:
- Created - the generator function was called, but
next()has not been called yet - Suspended - paused at a
yieldexpression, waiting for the nextnext()call - Running - currently executing (between a
next()call and the nextyield) - Closed - the function has returned or
close()was called
gen = count_up(3) # Created
next(gen) # Running -> yields 1 -> Suspended
next(gen) # Running -> yields 2 -> Suspended
next(gen) # Running -> yields 3 -> Suspended
next(gen) # Running -> returns -> raises StopIteration
Generator Expressions¶
Just as list comprehensions provide a compact syntax for building lists, generator expressions produce values lazily:
# List comprehension - builds entire list in memory
squares_list = [x ** 2 for x in range(1_000_000)]
# Generator expression - produces values on demand
squares_gen = (x ** 2 for x in range(1_000_000))
The generator expression uses parentheses instead of brackets and consumes almost no memory regardless of the range size. You can iterate over it once, but you cannot index into it or get its length without consuming it.
yield from¶
When a generator needs to delegate to another generator or iterable, yield from passes values through directly:
def read_files(paths):
for path in paths:
yield from read_lines(path)
def read_lines(path):
with open(path) as f:
for line in f:
yield line.rstrip("\n")
Without yield from, you would need an inner for loop that yields each value individually. yield from also propagates send() and throw() calls to the inner generator, which matters for advanced coroutine patterns.
When to use generators
Use a generator when you are processing data that could be large, you only need to iterate through it once, and you do not need random access. Common examples: reading files line by line, streaming API responses, transforming database result sets, and pipeline-style data processing.
The Import System¶
Every import statement triggers a multi-step process. Understanding it helps you debug import errors, avoid circular imports, and structure projects effectively.
How Import Works¶
When Python encounters import mymodule, it follows these steps:
- Check the cache - look in
sys.modulesfor an already-loaded module. If found, return it immediately. - Find the module - search
sys.pathusing a series of finders (importlib meta path finders). The default finders check built-in modules, frozen modules, and the filesystem. - Load the module - once found, a loader reads the source, compiles it to bytecode (cached in
__pycache__/), and executes the module's top-level code. - Cache the module - store the module object in
sys.modulesso future imports skip steps 2-3.
import sys
# After importing os, it is cached
import os
print("os" in sys.modules) # True
# Importing again returns the cached object - the module code does not re-execute
import os # No-op, returns cached module
Import Styles¶
import os # Access as os.path.join(...)
from os.path import join # Access as join(...)
from os.path import join as pjoin # Access as pjoin(...)
import os.path # Access as os.path.join(...)
The from form binds specific names into the current namespace. The plain import form binds the top-level module. Neither form is inherently better - from is convenient for frequently used names, while plain import makes the source module explicit at every call site.
Relative Imports¶
Inside a package, you can import from sibling or parent modules using dots:
# Inside mypackage/utils/helpers.py
from . import formatting # Same directory (mypackage/utils/)
from .formatting import bold # Specific name from sibling
from .. import config # Parent directory (mypackage/)
from ..core import engine # Sibling of parent (mypackage/core/)
A single dot (.) means the current package directory. Two dots (..) mean the parent. Relative imports only work inside packages - they fail in standalone scripts run directly with python script.py.
sys.path¶
sys.path is a list of directory paths that Python's filesystem finder searches, in order:
The first entry is typically the directory containing the script being run (or an empty string "" for the interactive interpreter). The rest come from the PYTHONPATH environment variable, the site-packages directory (where pip installs packages), and the standard library path.
Modifying sys.path
You can append directories to sys.path at runtime, but doing so makes your code dependent on filesystem layout. Prefer installing packages properly (with pip install -e .) over sys.path manipulation.
Circular Imports¶
A circular import happens when module A imports module B, and module B imports module A. Python does not raise an error immediately - it returns the partially initialized module from sys.modules - but you may get ImportError or AttributeError if you try to use a name that has not been defined yet.
# models.py
from validators import validate_user # validators.py imports models.py too
class User:
def save(self):
validate_user(self)
# validators.py
from models import User # Circular: models.py imports validators.py
def validate_user(user):
if not isinstance(user, User):
raise TypeError("Expected a User instance")
Three strategies to break the cycle:
- Move the import inside the function that needs it (lazy import)
- Reorganize so shared types live in a third module both can import
- Use
TYPE_CHECKINGfor type-hint-only imports that do not need runtime access
# Option 1: Lazy import
def validate_user(user):
from models import User # Imported only when this function runs
if not isinstance(user, User):
raise TypeError("Expected a User instance")
The __name__ Guard¶
When Python loads a module, it sets the __name__ attribute. If the module is the entry point (run directly), __name__ is "__main__". If it is imported, __name__ is the module's qualified name.
This guard lets a file serve as both an importable module and a standalone script. Without it, the main() call would execute every time another module imports mymodule.
Creating Packages¶
A package is a directory that Python recognizes as a collection of modules. The simplest way to make one is to add an __init__.py file.
Package Structure¶
mypackage/
├── __init__.py # Makes this directory a package
├── __main__.py # Entry point for python -m mypackage
├── core.py
├── formatting.py
└── utils/
├── __init__.py
└── helpers.py
The __init__.py file runs when the package is first imported. It can be empty, or it can define the package's public API:
# mypackage/__init__.py
from .core import Engine
from .formatting import bold, table
__all__ = ["Engine", "bold", "table"]
__all__ controls what from mypackage import * exports. Without it, a wildcard import brings in every public name defined in __init__.py.
Namespace Packages¶
Since Python 3.3, a directory without __init__.py can still act as a namespace package. This allows a single logical package to be split across multiple directories (useful for large organizations with distributed codebases). For most projects, use regular packages with __init__.py - namespace packages are a specialized tool.
Project Layouts¶
Two common layouts for distributable packages:
# Flat layout # src layout
myproject/ myproject/
├── pyproject.toml ├── pyproject.toml
├── mypackage/ ├── src/
│ ├── __init__.py │ └── mypackage/
│ └── core.py │ ├── __init__.py
└── tests/ │ └── core.py
└── test_core.py └── tests/
└── test_core.py
The src layout prevents accidentally importing the local source directory instead of the installed package during testing. The flat layout is simpler and works well for smaller projects. The Python Packaging User Guide has a detailed comparison.
The __main__.py Entry Point¶
Adding __main__.py to a package lets you run it with python -m mypackage:
# mypackage/__main__.py
from .core import Engine
def main():
engine = Engine()
engine.run()
if __name__ == "__main__":
main()
This is the standard way to make a package executable. The -m flag tells Python to find the package in sys.path and run its __main__.py.
Virtual Environments Deep Dive¶
The Introduction guide covered creating and activating virtual environments. Here you will look at what venv actually builds and how activation works under the hood.
What venv Creates¶
Running python3 -m venv myenv creates a directory structure like this:
myenv/
├── bin/ # Scripts and symlinks (Linux/macOS)
│ ├── python -> python3.12 # Symlink to the base Python
│ ├── python3 -> python3.12
│ ├── python3.12
│ ├── pip
│ ├── pip3
│ └── activate # Shell script that modifies PATH
├── include/ # C headers (for compiling extensions)
├── lib/
│ └── python3.12/
│ └── site-packages/ # Where pip installs packages
├── lib64 -> lib # Symlink (Linux only)
└── pyvenv.cfg # Configuration file
The pyvenv.cfg file tells Python this is a virtual environment:
How Activation Works¶
The activate script does two things:
- Prepends
myenv/bin/toPATH- so runningpythonorpipresolves to the virtual environment's copies first - Sets
VIRTUAL_ENVto the environment's root path
That is it. There is no system-level configuration change, no daemon, no container. Activation is just a PATH modification in your current shell session.
You can also use a virtual environment without activating it by calling its Python directly:
This is common in CI pipelines and automation scripts where sourcing an activate script adds unnecessary complexity.
Multiple Python Versions¶
If you have multiple Python versions installed, create separate environments for each:
python3.11 -m venv env311
python3.12 -m venv env312
env311/bin/python --version # Python 3.11.x
env312/bin/python --version # Python 3.12.x
Each environment is independent. Packages installed in env311 are not visible to env312.
System site-packages
The --system-site-packages flag lets a virtual environment fall back to globally installed packages. This is useful when you want access to system-wide libraries (like those installed by the OS package manager) but still want to isolate project-specific dependencies: python3 -m venv --system-site-packages myenv
Advanced pip and poetry¶
The Testing and Tooling guide covered basic pip install and pyproject.toml. This section goes further into reproducible dependency management.
pip: Beyond Basic Install¶
Constraints files restrict package versions without installing them. They are useful when you want to pin transitive dependencies (dependencies of your dependencies) without listing them in your requirements:
Hash checking verifies that downloaded packages match expected checksums, preventing supply-chain attacks:
This requires every entry in requirements.txt to include a hash:
Editable installs link your local source directory into site-packages so changes take effect immediately without reinstalling:
pip install -e . # Install current project in editable mode
pip install -e ./mylib # Install a local dependency in editable mode
pip-tools: Compiled Requirements¶
pip-tools adds a two-file workflow that separates what you want from what you get:
-
Write your direct dependencies in
requirements.in: -
Compile a fully pinned
requirements.txt:
This produces a requirements.txt with exact versions and hashes for every package, including transitive dependencies. When you want to update, run pip-compile --upgrade.
poetry: Advanced Workflows¶
Poetry manages dependencies, virtual environments, and packaging in one tool.
Dependency groups separate production, dev, and optional dependencies:
# pyproject.toml
[tool.poetry.dependencies]
python = "^3.11"
requests = "^2.31"
[tool.poetry.group.dev.dependencies]
pytest = "^8.0"
ruff = "^0.3"
[tool.poetry.group.docs.dependencies]
mkdocs = "^1.5"
poetry install # Install all groups
poetry install --without docs # Skip docs group
poetry install --only dev # Only dev dependencies
Lock files (poetry.lock) pin every dependency to an exact version and hash. The lock file is checked into version control so every developer and CI runner uses identical packages. Run poetry lock to regenerate it after changing pyproject.toml.
PEP 723: Inline Script Metadata¶
PEP 723 (Python 3.12+) lets you declare dependencies inside a script using a special comment block. Tools like pipx and uv read this metadata and install dependencies automatically:
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "requests>=2.31",
# "rich>=13.0",
# ]
# ///
import requests
from rich import print
response = requests.get("https://api.example.com/status")
print(response.json())
This eliminates the need for a separate requirements.txt or virtual environment setup for one-off scripts.
Bringing It Together¶
The concepts from this guide - decorators, dynamic imports, and package structure - combine naturally in a plugin architecture. The following exercise asks you to build one from scratch.
Further Reading¶
- Python Data Model - Callable Objects - how Python determines whether an object can be called as a function
- functools Module -
wraps,lru_cache,partial, and other function utilities - The Import System - the official reference for finders, loaders, and module specs
- Python Packaging User Guide - the definitive guide to creating distributable packages
- PEP 723 - Inline Script Metadata - embedding dependency declarations in standalone scripts
- Real Python: Primer on Decorators - practical decorator tutorial with additional examples
Previous: Object-Oriented Programming | Back to Index