Adding New Websites¶

This guide explains how to contribute support for new websites to Kiosque.

Quick Start¶

Adding a new website typically involves:

Creating a new Python file in kiosque/website/
Implementing a class that extends Website
Defining the article extraction logic
(Optional) Adding authentication if the site is paywalled
Testing your implementation
Submitting a pull request

Step-by-Step Guide¶

For a complete tutorial on adding new website support, see the Contributing Guide.

The contributing guide includes:

Prerequisites and development setup
Website scraper implementation patterns
Authentication methods (simple, CSRF tokens, cookies)
Testing procedures
Code style guidelines
Pull request submission process

Quick Example¶

Here's a minimal example for a public website:

from typing import ClassVar
from ..core.website import Website

class ExampleNews(Website):
    base_url = "https://example.com/"

    clean_nodes: ClassVar = ["figure", "aside"]

    def article(self, url):
        soup = self.bs4(url)
        return soup.find("article", class_="main-content")

Common Patterns¶

With Authentication¶

class PaywalledNews(Website):
    base_url = "https://paywalled.com/"
    login_url = "https://paywalled.com/login"

    @property
    def login_dict(self):
        credentials = self.credentials
        assert credentials is not None
        return {
            "email": credentials["username"],
            "password": credentials["password"],
        }

    def article(self, url):
        soup = self.bs4(url)
        return soup.find("div", class_="article-body")

With Custom Cleanup¶

def clean(self, article):
    article = super().clean(article)

    # Remove empty paragraphs
    for p in article.find_all("p"):
        if not p.get_text(strip=True):
            p.decompose()

    return article

File Naming¶

Use lowercase, no spaces: lemonde.py, nytimes.py
Name after the publication, not the domain
Keep it simple and recognizable

Testing Checklist¶

Before submitting:

Article extraction works on multiple articles
Authentication works (if applicable)
Code passes ruff check and ruff format
Tests pass: uv run pytest -m "not login"
Website added to websites.md

Resources¶

Full Contributing Guide - Complete implementation tutorial
Architecture Guide - System design and patterns
Existing Implementations - Check kiosque/website/ for 30+ examples

Getting Help¶

Open a GitHub Issue with the website-support label
Check existing website implementations in kiosque/website/ for patterns
See Supported Sites for the list of current implementations

Thank You!¶

Your contributions help make journalism more accessible. Thank you for supporting Kiosque!