Adding New Websites¶
This guide explains how to contribute support for new websites to Kiosque.
Quick Start¶
Adding a new website typically involves:
- Creating a new Python file in
kiosque/website/ - Implementing a class that extends
Website - Defining the article extraction logic
- (Optional) Adding authentication if the site is paywalled
- Testing your implementation
- Submitting a pull request
Step-by-Step Guide¶
For a complete tutorial on adding new website support, see the Contributing Guide.
The contributing guide includes:
- Prerequisites and development setup
- Website scraper implementation patterns
- Authentication methods (simple, CSRF tokens, cookies)
- Testing procedures
- Code style guidelines
- Pull request submission process
Quick Example¶
Here's a minimal example for a public website:
from typing import ClassVar
from ..core.website import Website
class ExampleNews(Website):
base_url = "https://example.com/"
clean_nodes: ClassVar = ["figure", "aside"]
def article(self, url):
soup = self.bs4(url)
return soup.find("article", class_="main-content")
Common Patterns¶
With Authentication¶
class PaywalledNews(Website):
base_url = "https://paywalled.com/"
login_url = "https://paywalled.com/login"
@property
def login_dict(self):
credentials = self.credentials
assert credentials is not None
return {
"email": credentials["username"],
"password": credentials["password"],
}
def article(self, url):
soup = self.bs4(url)
return soup.find("div", class_="article-body")
With Custom Cleanup¶
def clean(self, article):
article = super().clean(article)
# Remove empty paragraphs
for p in article.find_all("p"):
if not p.get_text(strip=True):
p.decompose()
return article
File Naming¶
- Use lowercase, no spaces:
lemonde.py,nytimes.py - Name after the publication, not the domain
- Keep it simple and recognizable
Testing Checklist¶
Before submitting:
- Article extraction works on multiple articles
- Authentication works (if applicable)
- Code passes
ruff checkandruff format - Tests pass:
uv run pytest -m "not login" - Website added to
websites.md
Resources¶
- Full Contributing Guide - Complete implementation tutorial
- Architecture Guide - System design and patterns
- Existing Implementations - Check
kiosque/website/for 30+ examples
Getting Help¶
- Open a GitHub Issue with the
website-supportlabel - Check existing website implementations in
kiosque/website/for patterns - See Supported Sites for the list of current implementations
Thank You!¶
Your contributions help make journalism more accessible. Thank you for supporting Kiosque!