Skip to content

Conversation

solidiquis
Copy link
Collaborator

@solidiquis solidiquis commented Oct 6, 2025

This PR introduces the initial bits of a CLI to improve programmatic workflows. Currently it supports two primary sub-commands:

  • Credentials management
  • Imports CSV and Parquet

Following this PR will be work planned to add support for:

  • Parquet export
  • CSV Export
  • MCAP Import
  • etc.

initial v0.1.0 release to be discussed.

CLI to streamline programmatic workflows with Sift's API

Usage: sift_cli [OPTIONS] <COMMAND>

Commands:
  config  Manage Sift CLI configuration
  import  Import time series files into Sift
  help    Print this message or the help of the given subcommand(s)

Options:
      --profile <PROFILE>  The profile to use
      --disable-tls        Disable TLS for non-cloud Sift environments
  -h, --help               Print help
  -V, --version            Print version

Config Management

Manage Sift CLI configuration

Usage: sift_cli config [OPTIONS] <COMMAND>

Commands:
  show    Display the contents of the current config file
  where   Show the path to the current config file
  create  Create a new config file (fails if one already exists)
  update  Update fields in the existing config file
  help    Print this message or the help of the given subcommand(s)

Options:
      --profile <PROFILE>  The profile to use
      --disable-tls        Disable TLS for non-cloud Sift environments
  -h, --help               Print help

CSV Import

Import a CSV file into Sift. Unless manually specified all columns are inferred to type string or double

Usage: sift_cli import csv [OPTIONS] --asset <ASSET> <PATH>

Arguments:
  <PATH>  Path to the CSV file to import

Options:
  -a, --asset <ASSET>                        Name of the asset this data belongs to
  -r, --run <RUN>                            Optional run name to associate with this import
      --header-row <HEADER_ROW>              Row number containing column headers (1-based) [default: 1]
      --profile <PROFILE>                    The profile to use
      --disable-tls                          Disable TLS for non-cloud Sift environments
      --first-data-row <FIRST_DATA_ROW>      Row number where data starts (1-based) [default: 2]
  -c, --channel-column <CHANNEL_COLUMN>      1-based column indices to override; can appear multiple times
  -d, --data-type <DATA_TYPE>                Data type for each channel in `--channel-column`. Use `"infer"` to have the program infer the data type which
                                             is useful when wanting to just specify `--unit` and/or `--description` [possible values: infer, double,
                                             string, enum, bit-field, bool, float, int32, uint32, int64, uint64, bytes]
  -u, --unit <UNIT>                          Unit for each channel in `--channel-column` (can be empty)
  -n, --description <DESCRIPTION>            Description for each channel in `--channel-column` (can be empty)
  -e, --enum-config <ENUM_CONFIG>            Enum configuration pairs `<key,name>` (e.g. `"0,start|1,stop"`) for enum-type channels
  -b, --bit-field-config <BIT_FIELD_CONFIG>  Bit-field configuration triplets `<name,index,length>` (e.g. `"12v,0,4|led,4,4"`)
  -t, --time-column <TIME_COLUMN>            1-based index of the time column [default: 1]
  -f, --time-format <TIME_FORMAT>            Time format used in the file [default: absolute-rfc3339] [possible values: absolute-rfc3339,
                                             absolute-datetime, absolute-unix-seconds, absolute-unix-milliseconds, absolute-unix-microseconds,
                                             absolute-unix-nanoseconds, relative-nanoseconds, relative-microseconds, relative-milliseconds,
                                             relative-seconds, relative-minutes, relative-hours]
  -s <RELATIVE_START_TIME>                   Start time (RFC3339) to use if time format is relative
  -w, --wait                                 Wait until the import finishes processing
  -p, --preview                              Preview the parsed schema without uploading
  -h, --help                                 Print help (see more with '--help')

Parquet Imports

A parquet file where every column is exclusive to a single channel except for the time column

Usage: sift_cli import parquet flat-dataset [OPTIONS] --asset <ASSET> <PATH>

Arguments:
  <PATH>  Path to the Parquet file to import

Options:
  -a, --asset <ASSET>                            Name of the asset this data belongs to
  -r, --run <RUN>                                Optional run name to associate with this import
  -c, --channel-path <CHANNEL_PATH>              Paths of data columns to import; can be specified multiple times
      --profile <PROFILE>                        The profile to use
  -d, --data-type <DATA_TYPE>                    Data type for each channel in `--channel-path`. Use `"infer"` to have the program infer the data type
                                                 which is useful when wanting to just specify `--unit` and/or `--description` [possible values: infer,
                                                 double, string, enum, bit-field, bool, float, int32, uint32, int64, uint64, bytes]
      --disable-tls                              Disable TLS for non-cloud Sift environments
  -u, --unit <UNIT>                              Unit for each channel in `--channel-path` (can be empty)
  -n, --description <DESCRIPTION>                Description for each channel in `--channel-path` (can be empty)
  -e, --enum-config <ENUM_CONFIG>                Enum configuration pairs `<key,name>` for enum-type channels
  -b, --bit-field-config <BIT_FIELD_CONFIG>      Bit-field configuration triplets `<index,name,bit_count>` for bit-field channels
  -t, --time-path <TIME_PATH>                    Path to the time column [default: timestamp]
  -f, --time-format <TIME_FORMAT>                Time format used in the file [default: absolute-rfc3339] [possible values: absolute-rfc3339,
                                                 absolute-datetime, absolute-unix-seconds, absolute-unix-milliseconds, absolute-unix-microseconds,
                                                 absolute-unix-nanoseconds, relative-nanoseconds, relative-microseconds, relative-milliseconds,
                                                 relative-seconds, relative-minutes, relative-hours]
  -s <RELATIVE_START_TIME>                       Start time (RFC3339) to use if time format is relative
  -m, --complex-types-mode <COMPLEX_TYPES_MODE>  Strategy for handling complex types (maps, lists, structs) [default: ignore] [possible values: ignore,
                                                 both, string, bytes]
  -w, --wait                                     Wait until the import finishes processing
  -p, --preview                                  Preview the parsed schema without uploading
  -h, --help                                     Print help (see more with '--help')

indicatif = "0.18.0"
pbjson-types = { workspace = true }
reqwest = "0.12.23"
sift_rs = { version = "0.6.0", path = "../sift_rs" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the path here be removed before merging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thank you. This is now referencing the workspace dep

#[arg(short = 'n', long)]
pub description: Vec<String>,

/// <name,key> repeated pairs e.g. "0,start,1,stop". Corresponds to the order in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should another delimiter be used to separate the key from the value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree this was awkward. Updated to the following:

  -e, --enum-config <ENUM_CONFIG>            Enum configuration pairs `<key,name>` (e.g. `"0,start|1,stop"`) for enum-type channels
  -b, --bit-field-config <BIT_FIELD_CONFIG>  Bit-field configuration triplets `<name,index,length>` (e.g. `"12v,0,4|led,4,4"`)


/// Column-type corresponding to ordered positioning of --channel-column
#[arg(short, long)]
pub data_type: Vec<DataType>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having to redefine the enum and then convert to ChannelDataType, I think there is a way to use ChannelDataType directly as an arg here even though it is defined in another crate. Similar comment for TimeFormat.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not been able to figure this out in a way that is less work than what I'm doing here. We can come back to this unless you know how to do it.

@solidiquis solidiquis marked this pull request as draft October 10, 2025 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants