-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak for simple get call #4618
Comments
such session helps |
async def fetch (session, url):
async with session.get(...) as x:
return await x.text() |
@KKomarov but why they are not closed explicitly? According to docs, as I understood, calling |
@KKomarov, Dummy cookies have not helped |
@socketpair , here is a result. Also, not increase |
@dima-kov closing responses helped for me, memory is about 100mb. maybe it's closing after calling text but not immediately. |
@KKomarov , could you share your code, please? I tried with this and failed:
|
|
import asyncio
import aiohttp
url = 'https://docs.mapbox.com/mapbox-gl-js/assets/earthquakes.geojson'
load_times = 4000
def get_to_load():
global load_times
if load_times > 0:
load_times -= 1
return url
return None
async def fetch(session, url):
async with session.get(url) as r:
return await r.text()
async def load(session, worker_id):
to_load = get_to_load()
print(f'start {worker_id}')
while to_load is not None:
await fetch(session, to_load)
print('Done', worker_id)
to_load = get_to_load()
async def main(workers_num=90):
async with aiohttp.ClientSession() as session:
await asyncio.gather(*[load(session, i) for i in range(workers_num)])
asyncio.run(asyncio.sleep(3))
asyncio.run(main())
asyncio.run(asyncio.sleep(5)) but I suggest to rewrite it to something like this code import asyncio
import aiohttp
from typing import Coroutine
URL = 'https://docs.mapbox.com/mapbox-gl-js/assets/earthquakes.geojson'
MAX_LOAD_TIMES = 500
async def fetch(session: aiohttp.ClientSession, url: str):
async with session.get(url) as r:
return await r.text()
async def limited(semaphore: asyncio.Semaphore, coro: Coroutine) -> None:
async with semaphore:
await coro
async def main(workers_num=90):
await asyncio.sleep(3)
semaphore = asyncio.Semaphore(workers_num)
async with aiohttp.ClientSession() as session:
await asyncio.gather(*[limited(semaphore, fetch(session, URL)) for _ in range(MAX_LOAD_TIMES)])
await asyncio.sleep(5)
asyncio.run(main()) |
A week ago I faced almost the same problem. Seems Python makes huge memory fragmentation. So in order to account memory you should consider two methods:
Probably, due to enormous memory fragmentation point 1 will grow and point 2 will not. I'm not sure, but possibly you need to use |
this bug has not been fixed? |
No, not fixed. Do you have sequence how to reproduce ? |
relates to #4833 |
This feels like a cpython bug. Even when I add to the end of the script:
The memory usage still doesn't go down. This is after asyncio.run() has completed, so there should be no references to anything still around. |
Can you reproduce it if fetch a plaintext non ssl url? |
Hmm, it might be the Python parser... I'm running the first example at: #4618 (comment) If I just run it with aiohttp 3.10.6 installed, then memory usage sits at 1.6% the entire time. It doesn't increase over time or anything. |
Yes, if I run it the same way with |
Commenting out the other imports, it's definitely caused by HttpResponseParser. There's no leak if I just use that one class from the C module. |
tracemalloc seems useless. It seems to suggest there is more memory allocated when using the C extension than when there's an obvious memory leak... |
Same results when testing with aiohttp 3.6.2 and 3.10.6, and on Python 3.8 and 3.10. Not sure what's happening, but it's trivial to reproduce by looking at |
Through trial and error, I think I've narrowed it down to StreamReader or HttpPayloadParser at: aiohttp/aiohttp/http_parser.py Lines 374 to 390 in 0b8be7f
But, struggling to figure out any more than that. |
|
at the end after a gc
|
import gc
import asyncio
import aiohttp
import objgraph
import pprint
url = 'http://docs.mapbox.com/mapbox-gl-js/assets/earthquakes.geojson'
load_times = 4000
def get_to_load():
print(objgraph.show_growth())
global load_times
if load_times > 0:
load_times -= 1
return url
return None
async def fetch(session, url):
async with session.get(url) as r:
return await r.text()
async def load(session, worker_id):
to_load = get_to_load()
print(f'start {worker_id}')
while to_load is not None:
await fetch(session, to_load)
print('Done', worker_id)
to_load = get_to_load()
async def main(workers_num=90):
async with aiohttp.ClientSession() as session:
await asyncio.gather(*[load(session, i) for i in range(workers_num)])
asyncio.run(asyncio.sleep(3))
asyncio.run(main())
print("common")
print(objgraph.show_most_common_types())
asyncio.run(asyncio.sleep(5))
gc.collect()
print("common after")
print(objgraph.show_most_common_types())
def _safe_repr(obj) -> str:
"""Get the repr of an object but keep going if there is an exception.
We wrap repr to ensure if one object cannot be serialized, we can
still get the rest.
"""
try:
return repr(obj)
except Exception: # noqa: BLE001
return f"Failed to serialize {type(obj)}"
for obj in objgraph.by_type('list'):
print(_safe_repr(obj)) |
I'm not so sure of my previous analysis anymore. I tested it for longer, and with the Python parser the memory usage never exceeds 4%. It might just be that it's less memory efficient than the C parser, and so much slower that it takes about a minute to reach peak memory usage instead of only a few seconds with the C parser. |
I did notice we create a lot of small objects and gc overhead starts to affect performance when request overhead is high. We can probably save a bit of ram for these with slots |
Some of those objects are now dataclasses in 4.x which don’t support slots on all python versions we support |
3.10 is needed for slots. I think we solved a similar with a different solution for ESP home. |
Objgraph can better tell us which ones churn and then we can add slots to any that can use it. That should help a bit with the gc overhead I haven’t been able to get it to leak though |
Your memory usage is the same at the start of the script, and when it's sleeping at the end? |
Here are the objects that we churn CIMultiDict 363 +3. -- nothing we can do here |
memory usage goes up but I didn't find any python object leaks. |
Every connection will end with the connection closed exception. Only create it once since its always the same. related issue #4618
We use __slots__ almost everywhere else in the codebase, however __slots__ was missing for these helpers related issue #4618
We use `__slots__` almost everywhere else in the codebase, however `__slots__` were not implemented in stream related issue #4618
import gc
import asyncio
import aiohttp
import psutil
import os
url = "http://docs.mapbox.com/mapbox-gl-js/assets/earthquakes.geojson"
load_times = 4000
def get_to_load():
global load_times
if load_times > 0:
load_times -= 1
return url
return None
async def fetch(session, url):
async with session.get(url) as r:
return await r.text()
async def load(session, worker_id):
to_load = get_to_load()
while to_load is not None:
await fetch(session, to_load)
to_load = get_to_load()
async def main(workers_num=90):
async with aiohttp.ClientSession() as session:
await asyncio.gather(*[load(session, i) for i in range(workers_num)])
async def single_request():
async with aiohttp.ClientSession() as session:
await fetch(session, url)
mem_before = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print("--memory before--")
print(mem_before)
asyncio.run(asyncio.sleep(3))
asyncio.run(single_request())
asyncio.run(asyncio.sleep(2))
print("--memory after single request--")
mem_after_single = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print(mem_after_single)
asyncio.run(main())
asyncio.run(asyncio.sleep(5))
mem_after = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print("--memory after, before gc--")
print(mem_after)
gc.collect()
mem_after_gc = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print("--memory after, after gc--")
print(mem_after_gc) measuring rss |
with linked PRs
|
before linked PRs
So it helps a little but not that much |
more fixes in the linked PR
So now a nice reduction, but still the memory isn't freed |
oh wait, we need to trim |
with trim import gc
import asyncio
import aiohttp
import psutil
import os
import ctypes
url = "http://docs.mapbox.com/mapbox-gl-js/assets/earthquakes.geojson"
load_times = 4000
def trim_memory() -> int:
libc = ctypes.CDLL("libc.so.6")
return libc.malloc_trim(0)
def get_to_load():
global load_times
if load_times > 0:
load_times -= 1
return url
return None
async def fetch(session, url):
async with session.get(url) as r:
return await r.text()
async def load(session, worker_id):
to_load = get_to_load()
while to_load is not None:
await fetch(session, to_load)
to_load = get_to_load()
async def main(workers_num=90):
async with aiohttp.ClientSession() as session:
await asyncio.gather(*[load(session, i) for i in range(workers_num)])
async def single_request():
async with aiohttp.ClientSession() as session:
await fetch(session, url)
mem_before = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print("--memory before--")
print(mem_before)
asyncio.run(asyncio.sleep(3))
asyncio.run(single_request())
asyncio.run(asyncio.sleep(2))
print("--memory after single request--")
mem_after_single = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print(mem_after_single)
asyncio.run(main())
asyncio.run(asyncio.sleep(5))
mem_after = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print("--memory after, before gc--")
print(mem_after)
gc.collect()
mem_after_gc = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print("--memory after, after gc--")
print(mem_after_gc)
trim_memory()
mem_after_trim = psutil.Process(os.getpid()).memory_info().rss / 1024**2
print("--memory after, after trim--")
print(mem_after_trim) |
trim test will only work on glibc |
doesn't seem to leak after trim. |
So I'd say there is no leak, and the test case was missing a call to At least we discovered a few places where |
Ah, OK. I'd have assumed that cpython would do any relevant trimming when the memory usage drops by a significant amount (though maybe that amount is just higher than my expectations). Anyway, I think we can call this resolved. |
🐞 Describe the bug
I have faced with memory leak in my program: after loading
100k
URLs the process eats additional 300mb of RAM. I had extracted this piece of code, that reproduces memory issue (below) and ran it withmprof
for 5 min for loading 4k urls:💡 To Reproduce
Use env:
Tested with different versions of aiohttp (>3.5)
Code:
💡 Expected behavior
Keep memory usage near straight during the whole program run.
📋 Your version of the Python
Python 3.7.5 (Also tested on 3.8)
📋 Your version of the aiohttp/yarl/multidict distributions
📋 Additional context
I have read all issues related to memory leaks, but have not found any appropriate solution. Here are what I tried:
aiohttp.ClientSession
instead of new on every request (actually, this gave a bit increase)📋 Enviroment
The text was updated successfully, but these errors were encountered: