Skip to content

Performance idea: generator for object-style representation #432

@nsheff

Description

@nsheff

In the past we've raised some issues about peppy performance (See #388 #387). Peppy is fine for small projects (hundreds or even thousands of sample rows, but it gets slow when we are dealing with huge projects, like tens to hundreds of thousands of samples.

It would be nice if peppy could handle these very large projects.

One of the problems is that peppy is storing sample information in two forms: a table (as a pandas data frame object), and as a list of Sample objects. This is duplicating the information in memory.

An idea for improving the performance could be to switch to a single-memory model. But we really want to be able to access the metadata in both ways for different use cases... so what about using the pandas data.frame as the main data structure, and then providing some kind of a generator that could go through it and create objects on the fly, in case someone wants the list-based approach?

This could be one way to increase performance.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions