Building a Content Pipeline with Python Scripts

Publishing 365 articles by hand through an admin UI would take weeks of tedious clicking. Instead, I built a content pipeline using Python scripts that batch-create articles via API. Here is how to build your own content automation.

Why Scripts Beat Manual Entry

When you need to create more than a handful of content items, manual entry fails:

It is slow (copy-paste, fill forms, click publish, repeat)
It is error-prone (typos in slugs, wrong categories, missing fields)
It is not reproducible (if the database resets, you do it all again)
It is not versionable (no git history of what was published)

Scripts solve all four problems.

The Pattern

Every content creation script follows the same structure:

python

#!/usr/bin/env python3
import json
import urllib.request
 
API_URL = 'http://localhost:8321/posts'
 
ARTICLES = [
    {
        'title': 'Article Title',
        'slug': 'article-title',
        'body': 'Markdown content here...',
        'excerpt': 'Brief summary',
        'categories': ['Technical', 'Python'],
        'status': 'published',
        'publishedAt': '2025-05-01T10:00:00Z',
    },
]
 
def post_article(payload):
    data = json.dumps(payload).encode('utf-8')
    req = urllib.request.Request(
        API_URL, data=data,
        headers={'Content-Type': 'application/json'},
        method='POST',
    )
    with urllib.request.urlopen(req) as resp:
        return json.loads(resp.read())

Why urllib Instead of requests

I use urllib.request from the standard library instead of the popular requests package. The reason is simple: zero dependencies. The script runs on any machine with Python 3 installed. No virtual environment, no pip install, no dependency conflicts.

For scripts that run once and create data, simplicity beats convenience.

Error Handling for Idempotent Scripts

Good content scripts are idempotent: you can run them multiple times without creating duplicates. Handle the 409 Conflict response that your API returns for duplicate slugs:

python

try:
    with urllib.request.urlopen(req) as resp:
        print(f'CREATED: {payload["slug"]}')
except urllib.error.HTTPError as e:
    if e.code == 409:
        print(f'EXISTS: {payload["slug"]}')
    else:
        print(f'ERROR {e.code}: {e.read().decode()}')

Version Control Your Content

Save your scripts in a .planning/artifacts/ directory and commit them to git. This gives you a reproducible content pipeline. If the database resets, run the scripts again. If you need to review what was published, read the scripts.

This is especially valuable during development with in-memory databases like DynamoDB Local. Every container restart wipes the data. The scripts recreate it in seconds.

Scaling the Pipeline

For 365 articles, I split scripts by month (25-31 articles each). Each script is independent and can run in any order. This keeps individual files readable and makes it easy to debug issues with specific date ranges.

Build your content as code. Version it. Automate it. Never manually enter 365 articles.

See the Python urllib documentation for the standard library HTTP client reference.