⚡️ Speed up method BaseArangoService.get_app_by_name by 203%
#652
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 203% (2.03x) speedup for
BaseArangoService.get_app_by_nameinbackend/python/app/connectors/services/base_arango_service.py⏱️ Runtime :
4.58 milliseconds→1.51 milliseconds(best of177runs)📝 Explanation and details
The optimized code achieves a 202% speedup (3x faster runtime) and 5.4% throughput improvement through two key optimizations:
Primary Optimization: Pre-compute String Normalization
The most impactful change moves string normalization from the database to Python:
LOWER(SUBSTITUTE(app.name, ' ', ''))for every document during database scanlookup_name = name.replace(' ', '').lower()in Python once, then uses AQL'sTOLOWER(REGEX_REPLACE())only on the database sideThis reduces computational overhead significantly - instead of normalizing the input string repeatedly for each document comparison, it's normalized once upfront.
Secondary Optimization: More Efficient AQL Operations
The query structure was refined:
LOWER(SUBSTITUTE(@name, ' ', ''))required processing the parameter for every documentTOLOWER(REGEX_REPLACE())which is more efficient for whitespace removal, and compares against the pre-computed@lookup_namePerformance Impact
The line profiler shows the database query execution time (
db.aql.execute) dropped from 19.57ms to 3.20ms - a 84% reduction in database query time. This explains the dramatic overall speedup, as database operations typically dominate execution time in data access methods.Test Case Performance
The optimization performs consistently well across all test scenarios:
This optimization is particularly valuable for applications that frequently query app names, as it reduces both per-query latency and improves overall throughput for concurrent operations.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
import asyncio # used to run async functions
from typing import Dict, List, Optional
import pytest # used for our unit tests
from app.connectors.services.base_arango_service import BaseArangoService
--- Begin: Mocks and helpers for test environment ---
Minimal stub for CollectionNames, since only .APPS.value is used
class CollectionNames:
APPS = type("APPS", (), {"value": "apps"})()
Minimal stub for logger
class DummyLogger:
def init(self):
self.infos = []
self.warnings = []
self.errors = []
def info(self, msg, *args):
self.infos.append((msg, args))
def warning(self, msg, *args):
self.warnings.append((msg, args))
def error(self, msg, *args):
self.errors.append((msg, args))
Minimal stub for db cursor (iterator)
class DummyCursor:
def init(self, results: List[Dict]):
self._results = results
self._iter = iter(self._results)
def iter(self):
return self
def next(self):
return next(self._iter)
Minimal stub for db.aql.execute
class DummyAQL:
def init(self, apps_data: List[Dict], should_raise: bool = False):
self.apps_data = apps_data
self.should_raise = should_raise
self.last_query = None
self.last_bind_vars = None
def execute(self, query, bind_vars):
self.last_query = query
self.last_bind_vars = bind_vars
if self.should_raise:
raise Exception("AQL execution error")
# Simulate the filter logic as in the query
name = bind_vars["name"]
def normalize(n):
return n.replace(" ", "").lower()
norm_name = normalize(name)
for app in self.apps_data:
if "name" in app and normalize(app["name"]) == norm_name:
return DummyCursor([app])
return DummyCursor([])
Minimal stub for db object
class DummyDB:
def init(self, apps_data: List[Dict], should_raise: bool = False):
self.aql = DummyAQL(apps_data, should_raise=should_raise)
Minimal stub for TransactionDatabase (identical interface to DummyDB)
class DummyTransaction(DummyDB):
pass
from app.connectors.services.base_arango_service import BaseArangoService
--- Begin: Unit tests ---
1. BASIC TEST CASES
@pytest.mark.asyncio
async def test_get_app_by_name_exact_match():
"""Test basic retrieval with exact name match."""
logger = DummyLogger()
app_data = [{"_key": "1", "name": "MyApp"}]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
result = await svc.get_app_by_name("MyApp")
@pytest.mark.asyncio
async def test_get_app_by_name_case_insensitive():
"""Test retrieval is case-insensitive."""
logger = DummyLogger()
app_data = [{"_key": "2", "name": "SuperApp"}]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
result = await svc.get_app_by_name("superapp")
@pytest.mark.asyncio
async def test_get_app_by_name_ignores_spaces():
"""Test retrieval ignores spaces in name."""
logger = DummyLogger()
app_data = [{"_key": "3", "name": "My Cool App"}]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
# Input name with different spacing
result = await svc.get_app_by_name("MyCoolApp")
result2 = await svc.get_app_by_name("my cool app")
@pytest.mark.asyncio
async def test_get_app_by_name_returns_none_if_not_found():
"""Test None is returned if no app matches."""
logger = DummyLogger()
app_data = [{"_key": "4", "name": "Alpha"}]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
result = await svc.get_app_by_name("Beta")
@pytest.mark.asyncio
async def test_get_app_by_name_empty_db():
"""Test None is returned if apps collection is empty."""
logger = DummyLogger()
db = DummyDB([])
svc = BaseArangoService(logger, None, None)
svc.db = db
result = await svc.get_app_by_name("AnyApp")
2. EDGE TEST CASES
@pytest.mark.asyncio
async def test_get_app_by_name_with_transaction():
"""Test using a transaction database."""
logger = DummyLogger()
app_data = [{"_key": "5", "name": "TransactionalApp"}]
svc = BaseArangoService(logger, None, None)
# Main db is empty, but transaction contains the app
svc.db = DummyDB([])
transaction = DummyTransaction(app_data)
result = await svc.get_app_by_name("TransactionalApp", transaction=transaction)
@pytest.mark.asyncio
async def test_get_app_by_name_multiple_apps_same_normalized_name():
"""Test if multiple apps have the same normalized name, first is returned."""
logger = DummyLogger()
app_data = [
{"_key": "6", "name": "Test App"},
{"_key": "7", "name": "TestApp"}, # Both normalize to 'testapp'
]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
result = await svc.get_app_by_name("TestApp")
@pytest.mark.asyncio
async def test_get_app_by_name_exception_handling():
"""Test that exceptions in db.aql.execute are handled and None is returned."""
logger = DummyLogger()
db = DummyDB([], should_raise=True)
svc = BaseArangoService(logger, None, None)
svc.db = db
result = await svc.get_app_by_name("AnyApp")
@pytest.mark.asyncio
async def test_get_app_by_name_concurrent_calls():
"""Test concurrent execution of get_app_by_name."""
logger = DummyLogger()
app_data = [
{"_key": "8", "name": "ConcurrentApp"},
{"_key": "9", "name": "AnotherApp"},
]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
@pytest.mark.asyncio
async def test_get_app_by_name_special_characters():
"""Test names with special characters and spaces."""
logger = DummyLogger()
app_data = [{"_key": "10", "name": "App!@# 2024"}]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
# Different spacing and case
result = await svc.get_app_by_name("app!@#2024")
3. LARGE SCALE TEST CASES
@pytest.mark.asyncio
async def test_get_app_by_name_large_number_of_apps():
"""Test performance and correctness with a large number of apps."""
logger = DummyLogger()
# 500 apps, only one matches after normalization
app_data = [{"_key": str(i), "name": f"App {i}"} for i in range(500)]
app_data.append({"_key": "target", "name": "Special Large App"})
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
result = await svc.get_app_by_name("speciallargeapp")
@pytest.mark.asyncio
async def test_get_app_by_name_concurrent_large_scale():
"""Test concurrent calls on a large app collection."""
logger = DummyLogger()
app_data = [{"_key": str(i), "name": f"App {i}"} for i in range(100)]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
4. THROUGHPUT TEST CASES
@pytest.mark.asyncio
async def test_get_app_by_name_throughput_small_load():
"""Throughput: small number of concurrent requests."""
logger = DummyLogger()
app_data = [{"_key": str(i), "name": f"App{i}"} for i in range(5)]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
@pytest.mark.asyncio
async def test_get_app_by_name_throughput_medium_load():
"""Throughput: medium number (50) of concurrent requests."""
logger = DummyLogger()
app_data = [{"_key": str(i), "name": f"App{i}"} for i in range(50)]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
@pytest.mark.asyncio
async def test_get_app_by_name_throughput_high_volume():
"""Throughput: high volume (200) of concurrent requests, including misses."""
logger = DummyLogger()
app_data = [{"_key": str(i), "name": f"App{i}"} for i in range(100)]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
@pytest.mark.asyncio
async def test_get_app_by_name_throughput_sustained_pattern():
"""Throughput: sustained pattern of repeated requests for the same name."""
logger = DummyLogger()
app_data = [{"_key": "repeat", "name": "RepeatApp"}]
db = DummyDB(app_data)
svc = BaseArangoService(logger, None, None)
svc.db = db
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import asyncio # used to run async functions
from typing import Dict, Optional
import pytest # used for our unit tests
from app.connectors.services.base_arango_service import BaseArangoService
--- Begin: Minimal stubs and mocks for dependencies ---
class DummyLogger:
"""A dummy logger that records logs for test inspection."""
def init(self):
self.infos = []
self.warnings = []
self.errors = []
class DummyCursor:
"""A dummy cursor that mimics ArangoDB AQL cursor behavior."""
def init(self, results):
self.results = results
self.iter = iter(self.results)
class DummyDB:
"""A dummy DB object that mimics ArangoDB transaction or database."""
def init(self, apps):
self.apps = apps # List of app dicts
from app.connectors.services.base_arango_service import BaseArangoService
--- End: Function to test ---
--- Begin: Test fixtures ---
@pytest.fixture
def dummy_logger():
return DummyLogger()
@pytest.fixture
def dummy_db():
# Default: empty apps list
db = DummyDB([])
db.set_apps([])
return db
@pytest.fixture
def service(dummy_logger, dummy_db):
svc = BaseArangoService(
logger=dummy_logger,
arango_client=None,
config_service=None,
kafka_service=None
)
svc.db = dummy_db
return svc
--- End: Test fixtures ---
--- Begin: Basic Test Cases ---
@pytest.mark.asyncio
async def test_get_app_by_name_exact_match(service, dummy_db):
"""Test that the function returns the correct app for an exact name match."""
app = {'name': 'TestApp', 'id': 1}
dummy_db.set_apps([app])
result = await service.get_app_by_name('TestApp')
@pytest.mark.asyncio
async def test_get_app_by_name_case_insensitive(service, dummy_db):
"""Test that the function matches app names case-insensitively."""
app = {'name': 'MyApp', 'id': 2}
dummy_db.set_apps([app])
result = await service.get_app_by_name('myapp')
result2 = await service.get_app_by_name('MYAPP')
@pytest.mark.asyncio
async def test_get_app_by_name_ignore_spaces(service, dummy_db):
"""Test that the function ignores spaces in the app name."""
app = {'name': 'Space App', 'id': 3}
dummy_db.set_apps([app])
result = await service.get_app_by_name('SpaceApp')
result2 = await service.get_app_by_name('space app')
result3 = await service.get_app_by_name(' S p a c e A p p ')
@pytest.mark.asyncio
async def test_get_app_by_name_not_found(service, dummy_db):
"""Test that the function returns None if no app is found."""
dummy_db.set_apps([{'name': 'OtherApp', 'id': 4}])
result = await service.get_app_by_name('UnknownApp')
@pytest.mark.asyncio
async def test_get_app_by_name_empty_db(service, dummy_db):
"""Test that the function returns None if the database is empty."""
dummy_db.set_apps([])
result = await service.get_app_by_name('AnyApp')
--- End: Basic Test Cases ---
--- Begin: Edge Test Cases ---
@pytest.mark.asyncio
async def test_get_app_by_name_multiple_apps_same_name(service, dummy_db):
"""Test that the function returns the first app when multiple apps have the same normalized name."""
app1 = {'name': 'MultiApp', 'id': 5}
app2 = {'name': 'Multi App', 'id': 6}
dummy_db.set_apps([app1, app2])
result = await service.get_app_by_name('multiapp')
@pytest.mark.asyncio
async def test_get_app_by_name_concurrent_requests(service, dummy_db):
"""Test concurrent execution of get_app_by_name with different names."""
apps = [
{'name': 'Alpha', 'id': 7},
{'name': 'Bravo', 'id': 8},
{'name': 'Charlie', 'id': 9},
]
dummy_db.set_apps(apps)
names = ['Alpha', 'Bravo', 'Charlie', 'Unknown']
# Run concurrent queries
results = await asyncio.gather(
*(service.get_app_by_name(n) for n in names)
)
@pytest.mark.asyncio
async def test_get_app_by_name_exception_handling(service, dummy_db):
"""Test that the function handles exceptions gracefully and returns None."""
class FailingDB:
class aql:
@staticmethod
def execute(query, bind_vars):
raise RuntimeError("Database error!")
service.db = FailingDB()
result = await service.get_app_by_name('AnyApp')
@pytest.mark.asyncio
async def test_get_app_by_name_with_transaction(service, dummy_db):
"""Test that the function uses the transaction db if provided."""
app = {'name': 'TxApp', 'id': 10}
tx_db = DummyDB([app])
tx_db.set_apps([app])
result = await service.get_app_by_name('TxApp', transaction=tx_db)
--- End: Edge Test Cases ---
--- Begin: Large Scale Test Cases ---
@pytest.mark.asyncio
async def test_get_app_by_name_large_scale(service, dummy_db):
"""Test function performance and correctness with a large number of apps."""
# Create 500 apps with unique names
apps = [{'name': f'App{i}', 'id': i} for i in range(500)]
dummy_db.set_apps(apps)
# Pick some random names to query
names_to_query = ['App0', 'App199', 'App499', 'NonExistent']
results = await asyncio.gather(
*(service.get_app_by_name(n) for n in names_to_query)
)
@pytest.mark.asyncio
async def test_get_app_by_name_large_scale_spaces_and_case(service, dummy_db):
"""Test large scale with names having spaces and mixed case."""
apps = [{'name': f'Big App {i}', 'id': i} for i in range(300)]
dummy_db.set_apps(apps)
# Query with different spacings and cases
queries = [f'bigapp{i}' for i in range(0, 300, 50)] + [f'BIG APP {i}' for i in range(0, 300, 50)]
expected = [apps[i] for i in range(0, 300, 50)] * 2
results = await asyncio.gather(
*(service.get_app_by_name(q) for q in queries)
)
# Each should match the corresponding app
for r, e in zip(results, expected):
pass
--- End: Large Scale Test Cases ---
--- Begin: Throughput Test Cases ---
@pytest.mark.asyncio
async def test_get_app_by_name_throughput_small_load(service, dummy_db):
"""Test throughput under small load (10 concurrent requests)."""
apps = [{'name': f'SmallApp{i}', 'id': i} for i in range(10)]
dummy_db.set_apps(apps)
results = await asyncio.gather(
*(service.get_app_by_name(f'SmallApp{i}') for i in range(10))
)
for i, result in enumerate(results):
pass
@pytest.mark.asyncio
async def test_get_app_by_name_throughput_medium_load(service, dummy_db):
"""Test throughput under medium load (100 concurrent requests)."""
apps = [{'name': f'MediumApp{i}', 'id': i} for i in range(100)]
dummy_db.set_apps(apps)
results = await asyncio.gather(
*(service.get_app_by_name(f'MediumApp{i}') for i in range(100))
)
for i, result in enumerate(results):
pass
@pytest.mark.asyncio
async def test_get_app_by_name_throughput_mixed_load(service, dummy_db):
"""Test throughput with a mix of found and not found apps."""
apps = [{'name': f'MixedApp{i}', 'id': i} for i in range(50)]
dummy_db.set_apps(apps)
queries = [f'MixedApp{i}' for i in range(50)] + [f'UnknownApp{i}' for i in range(10)]
results = await asyncio.gather(
*(service.get_app_by_name(q) for q in queries)
)
for i in range(50):
pass
for i in range(10):
pass
@pytest.mark.asyncio
async def test_get_app_by_name_throughput_high_load(service, dummy_db):
"""Test throughput under high load (500 concurrent requests)."""
apps = [{'name': f'HighApp{i}', 'id': i} for i in range(500)]
dummy_db.set_apps(apps)
results = await asyncio.gather(
*(service.get_app_by_name(f'HighApp{i}') for i in range(500))
)
for i, result in enumerate(results):
pass
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes
git checkout codeflash/optimize-BaseArangoService.get_app_by_name-mhxsypnkand push.