Advanced and scientific Python

Logo

Materials to brush up your Python skills

View the Project on GitHub xoolive/pyclass

Asynchronous programming

Asynchronous programming

Asynchronous programming is becoming more and more popular in Python since the introduction in Python 3.5 (with PEP 492) of two keywords: await and async.

It is important to be aware of few facts about Python performance:

Asynchronous programming, also known under the name of a Python module asyncio, provides a single threaded efficient implementation of programs made of blocking calls.

If the following Spongebob is rather multiprocessing:

asynchronous programming is more about vacuuming while the dishwasher cleans instead of waiting for it to finish before doing your next chores.

The tl;dr version of asyncio goes as follows:

So let’s start with a function which does nothing more than sleeping:

async def count():
    print("one")
    await asyncio.sleep(1)
    print("two")

If you run it once, it will take… one second:

>>> import asyncio
>>>
>>> loop = asyncio.get_event_loop()  # the loop in charge of sequencing async calls
>>> loop.run_until_complete(count())
one
two

But if you run several calls together, it will also take one second. Check the printing order: the loop schedules the next call of count() when it hits on an await instruction:

>>> loop.run_until_complete(asyncio.gather(count(), count(), count()))
one
one
one
two
two
two
Warning    Jupyter notebooks run in an asynchronous environment where an event loop already runs in background. It is therefore not possible to run the code above as is.
You would get the following exception:
RuntimeError: This event loop is already running.
It is however possible to run a cell with an `await` keyword. The following code is valid in Jupyter but not in Python:
await asyncio.gather(count(), count(), count())

In practice, many libraries made of blocking calls provide an asynchronous version of their code, which becomes relevant if you need to make many small blocking calls, e.g. many small downloads, or many calls to a database.

Comparison between blocking and non-blocking downloads

requests is the most common library for synchronous http requests. For this example, let’s download all flags of the world from https://flagcdn.com/.

The full list of flags is available at the following link:

import requests

c = requests.get("https://flagcdn.com/fr/codes.json")
c.raise_for_status()
codes = c.json()
# >>> codes {'ad': 'Andorre', 'ae': 'Émirats arabes unis', 'af': 'Afghanistan',
# 'ag': 'Antigua-et-Barbuda', 'ai': 'Anguilla', 'al': 'Albanie', 'am':
# 'Arménie', 'ao': 'Angola', 'aq': 'Antarctique', 'ar': 'Argentine', ...

Now we can time the synchronous download of all flags:

from tqdm import tqdm

for c in tqdm(codes.keys()):
    r = requests.get(f'https://flagcdn.com/256x192/{c}.png')
    r.raise_for_status()
    # ignoring content for this example
100%|█████████████████████████████████████████████████████████████| 306/306 [01:15<00:00,  3.77it/s]

One of the most widespread libraries for asynchronous web requests in aiohttp which syntax is somehow similar. The proper code would be here:

import aiohttp
import time

async def fetch(code, session):
    async with session.get(f"https://flagcdn.com/256x192/{code}.png") as resp:
        return await resp.read()


async def main():
    t0 = time.time()
    async with aiohttp.ClientSession() as session:
        futures = [fetch(code, session) for code in codes]
        for response in await asyncio.gather(*futures):
            data = response
    print(f"done in {time.time() - t0:.5f}s")


asyncio.run(main())
done in 0.52194s
Note    This approach leads to a speedup of nearly 150. This significant speedup makes a particular sense here, with a lot of small blocking requests.
Warning    If you run this code behind a proxy, you may need to adjust the code.
# with requests
requests.get(url, proxies={"http"=proxy, "https"=proxy})
# with aiohttp
async with session.get(url, proxy=proxy)

Exercice

We will implement a particular case of webcrawling in this example, with a breadth first exploration in a graph.

wikidata

You will find a suggestion of solution in the asyncio.ipynb notebook.

↑ Home