High memory consumption while optimizing using InfluxDB data feed

I'm using the InfluxDB to store the history data. Naturally the InfluxDB data feed is used for backtesting and optimizing the strategy.

Trying to optimize the strategy on ~10years 5min data set with few parameters ranges (resulting in 90 iterations) got me out of memory on my dev machine ( 12 cores + 12GB mem )

Here the cerebro flags I was using:

```python
    cerebro = bt.Cerebro(maxcpus=args.maxcpus,
		                 live=False,
		                 runonce=True,
		                 exactbars=False,
		                 optdatas=True,
		                 optreturn=True,
		                 stdstats=False,
		                 quicknotify=True)
```

Analysis:

After a little bit of debugging the problem appears to be with InfluxDB data feed implementation which lacks a proper support for preload function.

In the current InfluxDB implementation the data from the influx database is loaded during the InfluxDB.start method and the result-set is kept in memory for the live time of the InfluDB instance. Even if cerebro preloads all the data, the result-set (which is no longer needed in such case) will still be in memory.

This is problematic when running optimization, where multiprocessing.Pool and Pool.imap is used for running the strategy with all its parameter permutations concurrently.

The way the multiprocessing.Pool works (the default method on Linux at least) is that the main process is simply forked for each worker process, where the latter inherits the main process memory (which will include the memory allocated for the aforementioned result-set the InfluxDB data feed). In addition, for each run of the strategy, the cerebro instance will be serialized (pickle-ized) and passed to the worker process - once again this will include the memory for the InfluxDB data feed since it is directly referenced by cerebro instance . This will unnecessarily increase the memory pressure during the optimization process

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High memory consumption while optimizing using InfluxDB data feed #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

High memory consumption while optimizing using InfluxDB data feed #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions