Skip to content

Working with Files and APIs (Python)

Version: 0.2 Year: 2026


Copyright (c) 2025-2026 Ryan Thomas Robson / Robworks Software LLC. Licensed under CC BY-NC-ND 4.0. You may share this material for non-commercial purposes with attribution, but you may not distribute modified versions.


Sysadmin automation usually boils down to two things: reading data from somewhere and acting on it. The "somewhere" is either the local filesystem (config files, logs, CSVs) or a remote API (monitoring services, cloud providers, notification systems). Python handles both with clean, consistent patterns.


Local File Operations

Python uses the context manager pattern (with statement) to handle files safely. The file is guaranteed to close when the block exits, even if an error occurs mid-read.

Reading and Writing

open() is the built-in function for file access.

# Read an entire file into a string
with open("/etc/hostname", "r") as f:
    hostname = f.read().strip()

# Read line by line (memory-efficient for large files)
with open("/var/log/auth.log", "r") as f:
    for line in f:
        if "Failed password" in line:
            print(line.strip())

# Write to a file (overwrites existing content)
with open("inventory.txt", "w") as f:
    f.write("web01\nweb02\ndb01\n")

# Append to an existing file
with open("audit.log", "a") as f:
    f.write("2026-03-25: Updated server inventory.\n")

File Modes

Mode Description Creates File? Truncates?
"r" Read (text) No - raises FileNotFoundError No
"w" Write (text) Yes Yes - empties the file
"a" Append (text) Yes No - adds to end
"x" Exclusive create Yes - raises FileExistsError if exists N/A
"rb" Read (binary) No No
"wb" Write (binary) Yes Yes

Binary modes ("rb", "wb") are needed for non-text files: images, compressed archives, protocol buffers, database dumps.

Modern Path Handling with pathlib

The pathlib module (Python 3.4+) provides an object-oriented interface for filesystem paths. It's cleaner and more portable than string manipulation with os.path.

from pathlib import Path

# Build paths without worrying about separators
log_dir = Path("/var/log")
auth_log = log_dir / "auth.log"      # PosixPath('/var/log/auth.log')

# Check existence and type
auth_log.exists()                     # True
auth_log.is_file()                    # True
log_dir.is_dir()                      # True

# Read and write in one step
content = auth_log.read_text()
Path("output.txt").write_text("hello\n")

# List directory contents
for p in log_dir.iterdir():
    if p.suffix == ".log":
        print(f"{p.name}: {p.stat().st_size} bytes")

# Glob for pattern matching
for p in log_dir.glob("*.log"):
    print(p)

# Recursive glob
for p in Path("/etc").rglob("*.conf"):
    print(p)

Prefer pathlib over os.path

os.path.join("/var", "log", "auth.log") works, but Path("/var") / "log" / "auth.log" is more readable and gives you methods like .read_text(), .exists(), and .glob() for free. Most modern Python code and libraries accept Path objects wherever a string path works.


Working with JSON

JSON is the standard format for configuration files and API responses. Python's json module converts JSON strings to Python dictionaries and lists, and vice versa.

import json

# Parse a JSON file
with open("config.json") as f:
    config = json.load(f)          # File -> dict/list

# Parse a JSON string
raw = '{"status": "ok", "count": 42}'
data = json.loads(raw)             # String -> dict/list

# Write Python data as JSON
new_config = {"debug": True, "port": 8080, "hosts": ["web01", "web02"]}
with open("settings.json", "w") as f:
    json.dump(new_config, f, indent=2)  # indent for human-readable output

# Convert Python data to a JSON string
json_str = json.dumps(new_config, indent=2)

json.load() reads from a file object. json.loads() parses a string. The "s" stands for "string." This is the most common source of confusion.


Working with CSV

Many sysadmin data sources (inventory exports, billing reports, monitoring data) come as CSV files.

import csv

# Read a CSV file
with open("servers.csv") as f:
    reader = csv.DictReader(f)       # Each row becomes a dict
    for row in reader:
        print(f"{row['hostname']}: {row['ip_address']}")

# Write a CSV file
servers = [
    {"hostname": "web01", "ip": "10.0.0.1", "role": "frontend"},
    {"hostname": "db01", "ip": "10.0.1.1", "role": "database"},
]

with open("inventory.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["hostname", "ip", "role"])
    writer.writeheader()
    writer.writerows(servers)

csv.DictReader is almost always what you want - it maps each row to a dictionary using the header row as keys, so you access fields by name instead of index.


Working with YAML

YAML is common in configuration management (Ansible, Kubernetes, Docker Compose). It's not in the standard library, so you need the PyYAML package.

pip install pyyaml
import yaml

# Read a YAML file
with open("playbook.yml") as f:
    config = yaml.safe_load(f)        # Always use safe_load, never load()

# Write YAML
data = {"services": {"web": {"image": "nginx", "ports": ["80:80"]}}}
with open("compose.yml", "w") as f:
    yaml.dump(data, f, default_flow_style=False)

Always use yaml.safe_load()

yaml.load() (without safe_) can execute arbitrary Python code embedded in the YAML file. This is a remote code execution vulnerability if you're loading untrusted input. Always use yaml.safe_load() unless you have a specific, verified reason not to.


Interacting with APIs

While Python's standard library includes urllib, the requests library is the industry standard for HTTP calls. It handles encoding, sessions, headers, and error reporting with a clean interface.

pip install requests

GET Requests

import requests

response = requests.get("https://api.github.com/repos/python/cpython")

if response.status_code == 200:
    repo = response.json()          # Parse JSON response body
    print(f"Stars: {repo['stargazers_count']}")
    print(f"Language: {repo['language']}")
else:
    print(f"Error: HTTP {response.status_code}")

POST Requests

import requests

alert = {
    "severity": "critical",
    "message": "CPU usage exceeded 95% on app01",
    "timestamp": "2026-03-25T14:30:00Z"
}

response = requests.post(
    "https://hooks.slack.com/services/T00000/B00000/XXXXX",
    json=alert                      # Automatically serializes and sets Content-Type
)

if response.ok:                     # True for any 2xx status
    print("Alert sent successfully.")

Authentication and Headers

import requests

# API key in headers
headers = {
    "Authorization": "Bearer your-api-token-here",
    "Accept": "application/json"
}

response = requests.get(
    "https://api.cloudprovider.com/v1/instances",
    headers=headers
)

# Basic auth
response = requests.get(
    "https://monitoring.internal/api/status",
    auth=("username", "password")
)

Sessions and Connection Reuse

When making multiple requests to the same host, use a Session to reuse TCP connections and persist headers:

import requests

session = requests.Session()
session.headers.update({
    "Authorization": "Bearer your-token",
    "Accept": "application/json"
})

# All requests through this session include the headers above
instances = session.get("https://api.cloud.com/v1/instances").json()
volumes = session.get("https://api.cloud.com/v1/volumes").json()

Handling Timeouts and Errors

Network calls fail. Always set timeouts and handle errors:

import requests

try:
    response = requests.get(
        "https://api.example.com/status",
        timeout=10                   # 10 seconds (connect + read)
    )
    response.raise_for_status()      # Raises HTTPError for 4xx/5xx
    data = response.json()
except requests.ConnectionError:
    print("Could not connect to the API.")
except requests.Timeout:
    print("Request timed out after 10 seconds.")
except requests.HTTPError as e:
    print(f"API returned error: {e}")

Pagination

Many APIs return results in pages. You need to loop until there are no more pages:

import requests

def get_all_items(base_url, headers):
    items = []
    url = base_url

    while url:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        data = response.json()

        items.extend(data["results"])
        url = data.get("next")      # None when there are no more pages

    return items


Interactive Quizzes




Further Reading


Previous: Data Structures and Logic | Next: System Automation | Back to Index

Comments