Working with Files and APIs (Python)¶

Intermediate35 minPrereqs: data-structures-and-logicpython programming apis

Learning outcomes

Read and write files, parse JSON, and handle CSV data
Make HTTP requests and interact with REST APIs

Version: 0.2 Year: 2026

Copyright Notice¶

Copyright (c) 2025-2026 Ryan Thomas Robson / Robworks Software LLC. Licensed under CC BY-NC-ND 4.0. You may share this material for non-commercial purposes with attribution, but you may not distribute modified versions.

Sysadmin automation usually boils down to two things: reading data from somewhere and acting on it. The "somewhere" is either the local filesystem (config files, logs, CSVs) or a remote API (monitoring services, cloud providers, notification systems). Python handles both with clean, consistent patterns.

Local File Operations¶

Python uses the context manager pattern (with statement) to handle files safely. The file is guaranteed to close when the block exits, even if an error occurs mid-read.

Reading and Writing¶

open() is the built-in function for file access.

# Read an entire file into a string
with open("/etc/hostname", "r") as f:
    hostname = f.read().strip()

# Read line by line (memory-efficient for large files)
with open("/var/log/auth.log", "r") as f:
    for line in f:
        if "Failed password" in line:
            print(line.strip())

# Write to a file (overwrites existing content)
with open("inventory.txt", "w") as f:
    f.write("web01\nweb02\ndb01\n")

# Append to an existing file
with open("audit.log", "a") as f:
    f.write("2026-03-25: Updated server inventory.\n")

File Modes¶

Mode	Description	Creates File?	Truncates?
`"r"`	Read (text)	No - raises FileNotFoundError	No
`"w"`	Write (text)	Yes	Yes - empties the file
`"a"`	Append (text)	Yes	No - adds to end
`"x"`	Exclusive create	Yes - raises FileExistsError if exists	N/A
`"rb"`	Read (binary)	No	No
`"wb"`	Write (binary)	Yes	Yes

Binary modes ("rb", "wb") are needed for non-text files: images, compressed archives, protocol buffers, database dumps.

Modern Path Handling with `pathlib`¶

The pathlib module (Python 3.4+) provides an object-oriented interface for filesystem paths. It's cleaner and more portable than string manipulation with os.path.

from pathlib import Path

# Build paths without worrying about separators
log_dir = Path("/var/log")
auth_log = log_dir / "auth.log"      # PosixPath('/var/log/auth.log')

# Check existence and type
auth_log.exists()                     # True
auth_log.is_file()                    # True
log_dir.is_dir()                      # True

# Read and write in one step
content = auth_log.read_text()
Path("output.txt").write_text("hello\n")

# List directory contents
for p in log_dir.iterdir():
    if p.suffix == ".log":
        print(f"{p.name}: {p.stat().st_size} bytes")

# Glob for pattern matching
for p in log_dir.glob("*.log"):
    print(p)

# Recursive glob
for p in Path("/etc").rglob("*.conf"):
    print(p)

Prefer pathlib over os.path

os.path.join("/var", "log", "auth.log") works, but Path("/var") / "log" / "auth.log" is more readable and gives you methods like .read_text(), .exists(), and .glob() for free. Most modern Python code and libraries accept Path objects wherever a string path works.

Working with JSON¶

JSON is the standard format for configuration files and API responses. Python's json module converts JSON strings to Python dictionaries and lists, and vice versa.

import json

# Parse a JSON file
with open("config.json") as f:
    config = json.load(f)          # File -> dict/list

# Parse a JSON string
raw = '{"status": "ok", "count": 42}'
data = json.loads(raw)             # String -> dict/list

# Write Python data as JSON
new_config = {"debug": True, "port": 8080, "hosts": ["web01", "web02"]}
with open("settings.json", "w") as f:
    json.dump(new_config, f, indent=2)  # indent for human-readable output

# Convert Python data to a JSON string
json_str = json.dumps(new_config, indent=2)

json.load() reads from a file object. json.loads() parses a string. The "s" stands for "string." This is the most common source of confusion.

Working with CSV¶

Many sysadmin data sources (inventory exports, billing reports, monitoring data) come as CSV files.

import csv

# Read a CSV file
with open("servers.csv") as f:
    reader = csv.DictReader(f)       # Each row becomes a dict
    for row in reader:
        print(f"{row['hostname']}: {row['ip_address']}")

# Write a CSV file
servers = [
    {"hostname": "web01", "ip": "10.0.0.1", "role": "frontend"},
    {"hostname": "db01", "ip": "10.0.1.1", "role": "database"},
]

with open("inventory.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["hostname", "ip", "role"])
    writer.writeheader()
    writer.writerows(servers)

csv.DictReader is almost always what you want - it maps each row to a dictionary using the header row as keys, so you access fields by name instead of index.

Working with YAML¶

YAML is common in configuration management (Ansible, Kubernetes, Docker Compose). It's not in the standard library, so you need the PyYAML package.

pip install pyyaml

import yaml

# Read a YAML file
with open("playbook.yml") as f:
    config = yaml.safe_load(f)        # Always use safe_load, never load()

# Write YAML
data = {"services": {"web": {"image": "nginx", "ports": ["80:80"]}}}
with open("compose.yml", "w") as f:
    yaml.dump(data, f, default_flow_style=False)

Always use yaml.safe_load()

yaml.load() (without safe_) can execute arbitrary Python code embedded in the YAML file. This is a remote code execution vulnerability if you're loading untrusted input. Always use yaml.safe_load() unless you have a specific, verified reason not to.

Interacting with APIs¶

While Python's standard library includes urllib, the requests library is the industry standard for HTTP calls. It handles encoding, sessions, headers, and error reporting with a clean interface.

pip install requests

GET Requests¶

import requests

response = requests.get("https://api.github.com/repos/python/cpython")

if response.status_code == 200:
    repo = response.json()          # Parse JSON response body
    print(f"Stars: {repo['stargazers_count']}")
    print(f"Language: {repo['language']}")
else:
    print(f"Error: HTTP {response.status_code}")

POST Requests¶

import requests

alert = {
    "severity": "critical",
    "message": "CPU usage exceeded 95% on app01",
    "timestamp": "2026-03-25T14:30:00Z"
}

response = requests.post(
    "https://hooks.slack.com/services/T00000/B00000/XXXXX",
    json=alert                      # Automatically serializes and sets Content-Type
)

if response.ok:                     # True for any 2xx status
    print("Alert sent successfully.")

Authentication and Headers¶

import requests

# API key in headers
headers = {
    "Authorization": "Bearer your-api-token-here",
    "Accept": "application/json"
}

response = requests.get(
    "https://api.cloudprovider.com/v1/instances",
    headers=headers
)

# Basic auth
response = requests.get(
    "https://monitoring.internal/api/status",
    auth=("username", "password")
)

Sessions and Connection Reuse¶

When making multiple requests to the same host, use a Session to reuse TCP connections and persist headers:

import requests

session = requests.Session()
session.headers.update({
    "Authorization": "Bearer your-token",
    "Accept": "application/json"
})

# All requests through this session include the headers above
instances = session.get("https://api.cloud.com/v1/instances").json()
volumes = session.get("https://api.cloud.com/v1/volumes").json()

Handling Timeouts and Errors¶

Network calls fail. Always set timeouts and handle errors:

import requests

try:
    response = requests.get(
        "https://api.example.com/status",
        timeout=10                   # 10 seconds (connect + read)
    )
    response.raise_for_status()      # Raises HTTPError for 4xx/5xx
    data = response.json()
except requests.ConnectionError:
    print("Could not connect to the API.")
except requests.Timeout:
    print("Request timed out after 10 seconds.")
except requests.HTTPError as e:
    print(f"API returned error: {e}")

Pagination¶

Many APIs return results in pages. You need to loop until there are no more pages:

import requests

def get_all_items(base_url, headers):
    items = []
    url = base_url

    while url:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        data = response.json()

        items.extend(data["results"])
        url = data.get("next")      # None when there are no more pages

    return items

Working with Files and APIs (Python)¶

Copyright Notice¶

Local File Operations¶

Reading and Writing¶

File Modes¶

Modern Path Handling with `pathlib`¶

Working with JSON¶

Working with CSV¶

Working with YAML¶

Interacting with APIs¶

GET Requests¶

POST Requests¶

Authentication and Headers¶

Sessions and Connection Reuse¶

Handling Timeouts and Errors¶

Interactive Quizzes¶

Further Reading¶

Comments

Working with Files and APIs (Python)¶

Copyright Notice¶

Local File Operations¶

Reading and Writing¶

File Modes¶

Modern Path Handling with pathlib¶

Working with JSON¶

Working with CSV¶

Working with YAML¶

Interacting with APIs¶

GET Requests¶

POST Requests¶

Authentication and Headers¶

Sessions and Connection Reuse¶

Handling Timeouts and Errors¶

Pagination¶

Interactive Quizzes¶

Further Reading¶

Comments

Modern Path Handling with `pathlib`¶