Skip to content

better read/write interface? #22

@stanleyjs

Description

@stanleyjs

Hello,

We are running sanity in a pipeline that's primarily implemented in python. Our datasets can be quite large. Our performance is really being crippled by Sanity's I/O interface. As I understand it, sanity expects a matrix market format file and outputs a csv.

Our data is already stored in memory in a python parent process, and we launch sanity with a subprocess.
Is there a way to more quickly send and receive data from sanity? Right now we are stuck waiting on Sanity to write an enormous csv file, and then we have to read that enormous CSV file back into memory in the parent process.

The most obvious solution to me is to write some Python-to-C interface for Sanity using CDLL / ctypes. I wonder if you guys have any plans for this, or any tips to speed up interfacing with Sanity without hitting the disk so much?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions