This tool scrapes the Anaphora changelog and converts entries to blog posts.
pip install -r requirements.txt
./run_scraper.sh
./run_scraper.sh [OPTIONS]
--help
: Display help message--dry-run
: Run without creating any blog posts--limit NUMBER
: Limit the number of entries to processTest without creating blog posts:
./run_scraper.sh --dry-run
Process only the first 3 entries:
./run_scraper.sh --limit 3
Combine options:
./run_scraper.sh --dry-run --limit 2
The converter creates blog posts with the following structure:
YYYY-MM-slug-of-title
assets
directoryindex.md
file with frontmatter and markdown content---
layout: post
title: Entry Title
description: Changelog entry from Month Day, Year
date: YYYY-MM-DD
author: Anaphora Team
thumbnail: ./assets/image.png
tags:
- changelog
- update
---
If the changelog requires authentication, you can add your credentials in the changelog_scraper.py
file:
EMAIL = "[email protected]"
PASSWORD = "your-password"
If the script doesn't find any changelog entries:
Check the debug files in the blog directory:
changelog_debug_selenium.html
: The raw HTML fetched from the pagechangelog_pretty.html
: A formatted version of the HTMLchangelog_screenshot.png
: A screenshot of the pageYou might need to modify the CSS selectors in the script to match the actual structure of your changelog page.