Anaphora Changelog to Blog Converter

This tool scrapes the Anaphora changelog and converts entries to blog posts.

Features

Automatically scrapes the Anaphora Featurebase changelog
Extracts titles, dates, content, and images
Downloads all images locally
Creates properly formatted blog posts with frontmatter
Converts HTML content to Markdown
Supports dry run mode for testing
Can limit the number of entries to process

Setup

Install requirements:

pip install -r requirements.txt

Run the script:

./run_scraper.sh

Usage

./run_scraper.sh [OPTIONS]

Options

--help: Display help message
--dry-run: Run without creating any blog posts
--limit NUMBER: Limit the number of entries to process

Examples

Test without creating blog posts:

./run_scraper.sh --dry-run

Process only the first 3 entries:

./run_scraper.sh --limit 3

Combine options:

./run_scraper.sh --dry-run --limit 2

Blog Post Format

The converter creates blog posts with the following structure:

Directory name: YYYY-MM-slug-of-title
Images stored in the assets directory
index.md file with frontmatter and markdown content

Frontmatter

---
layout: post
title: Entry Title
description: Changelog entry from Month Day, Year
date: YYYY-MM-DD
author: Anaphora Team
thumbnail: ./assets/image.png
tags:
  - changelog
  - update
---

Authentication

If the changelog requires authentication, you can add your credentials in the changelog_scraper.py file:

EMAIL = "[email protected]"
PASSWORD = "your-password"

Troubleshooting

If the script doesn't find any changelog entries:

Check the debug files in the blog directory:
- changelog_debug_selenium.html: The raw HTML fetched from the page
- changelog_pretty.html: A formatted version of the HTML
- changelog_screenshot.png: A screenshot of the page
You might need to modify the CSS selectors in the script to match the actual structure of your changelog page.

Notes

The script uses Selenium for JavaScript rendering, which requires Chrome/Chromium
Images are downloaded and stored locally
HTML content is converted to Markdown format