Skip to content

High memory consumption while optimizing using InfluxDB data feed #10

@vladisld

Description

@vladisld

I'm using the InfluxDB to store the history data. Naturally the InfluxDB data feed is used for backtesting and optimizing the strategy.

Trying to optimize the strategy on ~10years 5min data set with few parameters ranges (resulting in 90 iterations) got me out of memory on my dev machine ( 12 cores + 12GB mem )

Here the cerebro flags I was using:

    cerebro = bt.Cerebro(maxcpus=args.maxcpus,
		                 live=False,
		                 runonce=True,
		                 exactbars=False,
		                 optdatas=True,
		                 optreturn=True,
		                 stdstats=False,
		                 quicknotify=True)

Analysis:

After a little bit of debugging the problem appears to be with InfluxDB data feed implementation which lacks a proper support for preload function.

In the current InfluxDB implementation the data from the influx database is loaded during the InfluxDB.start method and the result-set is kept in memory for the live time of the InfluDB instance. Even if cerebro preloads all the data, the result-set (which is no longer needed in such case) will still be in memory.

This is problematic when running optimization, where multiprocessing.Pool and Pool.imap is used for running the strategy with all its parameter permutations concurrently.

The way the multiprocessing.Pool works (the default method on Linux at least) is that the main process is simply forked for each worker process, where the latter inherits the main process memory (which will include the memory allocated for the aforementioned result-set the InfluxDB data feed). In addition, for each run of the strategy, the cerebro instance will be serialized (pickle-ized) and passed to the worker process - once again this will include the memory for the InfluxDB data feed since it is directly referenced by cerebro instance . This will unnecessarily increase the memory pressure during the optimization process

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions