Skip to content

Show config options #346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from
Closed

Show config options #346

wants to merge 16 commits into from

Conversation

Dodecahedr0x
Copy link
Contributor

@Dodecahedr0x Dodecahedr0x commented Apr 15, 2025

Adds a CLI to start the validator. It has a -h command to display configuration and default values. The original interface is preserved to avoid breaking integrations.

CLOSES: #318

Greptile Summary

Major refactoring to add CLI configuration support using clap, maintaining compatibility with existing config.toml and environment variables. However, maintainers recommend pivoting to a simpler approach.

  • Added new CLI modules in test-bins/src/cli/ with structured argument parsing for accounts, metrics, RPC, and validator settings
  • Fixed typo in environment variable name from VALIDATOR_FDQN to VALIDATOR_FQDN in magicblock-config/src/lib.rs
  • Reduced CHANNEL_CAPACITY_DEFAULT from 250,000 to 1,024 in magicblock-geyser-plugin/src/config.rs
  • ⚠️ Current implementation introduces maintenance burden by duplicating configuration across CLI args, env vars, and TOML
  • ⚠️ Maintainers suggest generating args/env vars from a single source instead of the current approach

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

#[arg(
long,
value_name = "COMPONENTS",
default_value = "(accounts,transactions)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: default_value is set but the type is Option<String> - this will always contain a value and never be None

Comment on lines 106 to 117
// Set environment variables from CLI arguments
if let Some(keypair) = cli.keypair {
std::env::set_var("VALIDATOR_KEYPAIR", keypair);
}

if let Some(disable_geyser) = cli.disable_geyser {
std::env::set_var("GEYSER_DISABLE", disable_geyser);
}

if let Some(disable_cache) = cli.disable_geyser_cache {
std::env::set_var("GEYSER_CACHE_DISABLE", disable_cache);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: setting env vars after parsing them from CLI args could cause issues with override_from_envs() later picking up these modified values unexpectedly

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

2 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile

Copy link
Collaborator

@thlorenz thlorenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the PR.
I'm not sure that in the long run this makes sense since we're now starting to duplicate some configs that we can either provide via the config file or env vars.

I thought we agreed at some point to find a way to not have to repeat ourselves, but to generate args/env vars + config from one common source so we can maintain that in one place.
I realize that the issue doesn't explain that part at all, but I don't think it's a good idea to add an args implementation that we'll replace most likely fairly soon.

If you want a stop gap for now I think it'd be more useful to create a heavily documented config toml which also mentions the related env vars.
You could then also print that when the user passes -h.

And for that one arg (-h) we don't need to pull in clap, but can manually parse the args for that.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 26 to 27
/// Path to the configuration file
config: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: config parameter needs a long flag (--config) and better help text for discoverability

Comment on lines 168 to 169
Keypair::from_base58_string(&keypair)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: from_base58_string() can panic - needs error handling for invalid base58 strings

Suggested change
Keypair::from_base58_string(&keypair)
} else {
Keypair::from_base58_string(&keypair).unwrap_or_else(|_| {
panic!("Invalid base58 string provided for validator keypair");
})
} else {

#[command(about = "Runs a MagicBlock validator node")]
struct Cli {
/// Path to the configuration file
config: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clap shall support PathBuf. Also I would propose to consider config_path.

Copy link
Contributor

@bmuddha bmuddha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍🏽

@Dodecahedr0x Dodecahedr0x force-pushed the feat/show-config-options branch 2 times, most recently from 346c8b1 to adec389 Compare June 2, 2025 14:42
@Dodecahedr0x
Copy link
Contributor Author

I kept the argument parsing separate from the actual config to 1) prevent including clap/structop in the config crate 2) reduce changes to the config that could have downstream impacts 3) keep the logic for deserializing the config file separate from the overriding logic.

@Dodecahedr0x Dodecahedr0x changed the title feat(test-bins): show config options Show config options Jun 3, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 file(s) reviewed, 17 comment(s)
Edit PR Review Bot Settings | Greptile

@@ -9,14 +9,21 @@ edition.workspace = true
default-run = "rpc"

[dependencies]
clap = { version = "4.5.36", features = ["derive", "env"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using clap's workspace dependency pattern to match other dependencies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please do. No dep in any crate should define its own version for compatibility.
There are more cases in this PR which greptile pointed out and the same applies there.

Please follow the patterns we established in the codebase in all cases.

url = { workspace = true }

[dev-dependencies]
serial_test = { version = "3.2.0", features = ["logging"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using workspace dependency pattern for serial_test to maintain consistency

config.accounts.allowed_programs.extend(
self.allowed_programs.clone().into_iter().map(|id| {
AllowedProgram {
id: Pubkey::from_str_const(&id),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Using from_str_const() for program IDs is unsafe - should use regular from_str() and handle potential errors

Suggested change
id: Pubkey::from_str_const(&id),
id: Pubkey::from_str(&id).map_err(|e| format!("Invalid program ID {}: {}", id, e))?,

/// The payer init balance in lamports.
#[arg(
long = "accounts-payer-init-lamports",
env = "INIT_LAMPORTS",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Environment variable INIT_LAMPORTS should be prefixed with ACCOUNTS_ for consistency with other env vars

Suggested change
env = "INIT_LAMPORTS",
env = "ACCOUNTS_INIT_LAMPORTS",

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we need to be consistent, otherwise config in different ways will be very confusing.
Again if this was derived, the exact matching of option naming would come for free.

Comment on lines 82 to 85
config.accounts.lifecycle = self
.lifecycle
.unwrap_or(config.accounts.lifecycle.clone().into())
.into();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Unnecessary clone() call on lifecycle - LifecycleMode likely implements Copy

Comment on lines +94 to +97
fn keypair_parser(s: &str) -> Result<KeypairWrapper, String> {
let keypair = Keypair::from_base58_string(s);
Ok(KeypairWrapper(keypair))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: No error handling for invalid base58 strings. Should return Err instead of panicking.

Suggested change
fn keypair_parser(s: &str) -> Result<KeypairWrapper, String> {
let keypair = Keypair::from_base58_string(s);
Ok(KeypairWrapper(keypair))
}
fn keypair_parser(s: &str) -> Result<KeypairWrapper, String> {
let keypair = Keypair::from_base58_string(s).map_err(|_| "Invalid base58 string for keypair".to_string())?;
Ok(KeypairWrapper(keypair))
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used for the config which is processed during startup then panicking maybe ok.

Comment on lines +50 to +51
help = "Specifies geyser components to disable. [default: (accounts,transactions)]"
)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Help text mentions default value but doesn't specify it in the #[arg] attribute using default_value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should be part of the official config. I remember adding those as dev tools, thus that we could turn on/off parts while debugging memory leaks and such.
They were supposed to be only used via env vars and never in prod, as otherwise the validator doesn't fully work.

Comment on lines 84 to 85
let mut api =
MagicValidator::try_from_config(config, validator_keypair).unwrap();
MagicValidator::try_from_config(config, cli.keypair.0).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: unwrap() call could panic - should handle errors gracefully here

Suggested change
let mut api =
MagicValidator::try_from_config(config, validator_keypair).unwrap();
MagicValidator::try_from_config(config, cli.keypair.0).unwrap();
let mut api = MagicValidator::try_from_config(config, cli.keypair.0)
.map_err(|e| {
error!("Failed to create validator: {}", e);
std::process::exit(1);
})?;

pub sigverify: Option<bool>,

/// Fully Qualified Domain Name. If specified it will also register ER on chain
#[arg(long = "validator-fdqn", env = "VALIDATOR_FDQN")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: FQDN is misspelled as 'fdqn' in both the field name and CLI argument. Should be 'fqdn'

Suggested change
#[arg(long = "validator-fdqn", env = "VALIDATOR_FDQN")]
#[arg(long = "validator-fqdn", env = "VALIDATOR_FQDN")]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already misspelled, but could be a good time to fix it

.millis_per_slot
.unwrap_or(config.validator.millis_per_slot),
sigverify: self.sigverify.unwrap_or(config.validator.sigverify),
fdqn: self.fdqn.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using clone() on an Option<String> can be replaced with .cloned()

Copy link
Contributor

@GabrielePicco GabrielePicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general! Eventually, I believe delegating the parsing and overrides to Clap would simplify the codebase, but this is already a solid improvement.

@Dodecahedr0x Dodecahedr0x force-pushed the feat/show-config-options branch from 09a36f5 to 50cec8d Compare June 5, 2025 08:06
@Dodecahedr0x
Copy link
Contributor Author

Rebased the pubsub redesign, fixed the typo, and fixed a few greptile comments

@thlorenz
Copy link
Collaborator

thlorenz commented Jun 7, 2025

There are still a lot of very valid greptile comments that are unaddressed.
I am currently reviewing this, but will not repeat greptiles suggestions, thus you could fix those in the meantime please.

Copy link
Collaborator

@thlorenz thlorenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I believe in order to get args as a feature, we are making the maintenance worse.
We currently have a maintenance problem as we are maintaining a config file + env vars.
However this PR doesn't improve on that issue and instead makes the maintenance problem greater.

I already pointed that out a few months back.

I believe we should take the time to properly derive args and env vars from a toml config file specification or similar. Or we could derive a config file from some clap definitions, whichever is easier.
It shouldn't be too hard to generate code either via macros or at build time from a toml file that specifies all our config options.

Unless this feature is super urgent I'd prefer to get it right the first time even if it takes more time todo instead of creating a lot of future maintenance/debugging etc. for us in the future.

Also there are tons of unaddressed greptile warnings which point out valid issues and I upvoted them in those cases.

I also discovered some naming issues (hard to catch and another indicator that repeating ourselves like this is not the right way to go).

@@ -9,14 +9,21 @@ edition.workspace = true
default-run = "rpc"

[dependencies]
clap = { version = "4.5.36", features = ["derive", "env"] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please do. No dep in any crate should define its own version for compatibility.
There are more cases in this PR which greptile pointed out and the same applies there.

Please follow the patterns we established in the codebase in all cases.

/// Use a custom remote URL with WebSocket URL
#[arg(
long = "accounts-remote-custom-with-ws",
env = "ACCOUNTS_REMOTE_WS",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicting args name with env name (not sure what the config name is).
Env name doesn't have custom-with, args name does.

conflicts_with = "remote",
requires = "remote_custom"
)]
pub remote_custom_with_ws: Option<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add empty lines between each arg, this is very hard to read.

pub remote_custom_with_ws: Option<String>,
#[arg(long = "accounts-lifecycle", env = "ACCOUNTS_LIFECYCLE")]
pub lifecycle: Option<LifecycleModeArg>,
#[command(flatten)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally keep the commands separate from plain args for readability

) -> Result<(), String> {
if let Some(remote) = self.remote {
config.accounts.remote = match remote {
RemoteConfigArg::Devnet => RemoteConfig::Devnet,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we see the duplication of every single option which we will have to maintain in the future.
It'd be a lot better if we could derive one from the other.

}

#[test]
#[serial]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this has to run in series?
The test is isolated and doesn't affect the machine environment, nor does it startup a validator.
I believe we can run those in parallel which is faster and thus can also remove the dependency on the crate that runs tests in series.

At this point all tests that start up a validator are part of the test-integration and are instrumented to run in series already, thus if you have a test that requires starting up a validator then add it there.

Again just to test configuration that should not be needed.

}

#[test]
#[serial]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto can run in parallel

Comment on lines +19 to +22
config.rpc = RpcConfig {
addr: self.addr.unwrap_or(config.rpc.addr),
port: self.port.unwrap_or(config.rpc.port),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually a bad idea by greptile as after that change it would not fail to compile if we add future fields, but that's actually what we want to help with the maintenance overhead that we are creating with this PR.

pub millis_per_slot: Option<u64>,

/// Whether to verify transaction signatures.
#[arg(long = "validator-sigverify", env = "VALIDATOR_SIG_VERIFY")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg and env var naming doesn't match sigverify vs SIG_VERIFY (should be sig-verify).

Again those little mistakes will bite us in the future. So the PR itself is already showing what kind of maintenance nightmare we're creating for ourselves here.

9, 208, 183, 189, 108, 200, 89, 77, 168, 76, 233, 197, 132, 22, 21, 186,
202, 240, 105, 168, 157, 64, 233, 249, 100, 104, 210, 41, 83, 87,
];
mod cli;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly discourage supporting the validator keypair as a config entry (worst case) or as an arg.

In both cases it will be visible either in a file or in machine logs (args are logged) which is a huge attack vector as it encourages operators to just add them in plain text either to a file or in shell scripts as an arg. Secrets should always only be env vars.

I had good reason to only allow passing this as an env var which are not logged, nor are added to a file.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

49 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +22 to +24
reset: self.reset.unwrap_or(config.ledger.reset),
path: self.path.clone().or(config.ledger.path.clone()),
size: self.size.unwrap_or(config.ledger.size),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Clone is called on Option<String> before or() - more efficient to move and use or_else()

assert_eq!(config.ledger.path, Some(ledger_path.to_string()));
assert_eq!(config.ledger.size, ledger_size.parse::<u64>().unwrap());
assert_eq!(config.programs, vec![]);
assert_eq!(config.validator.fdqn, Some(validator_fqdn.to_string()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: There's a typo in the field name 'fdqn' - should be 'fqdn' (Fully Qualified Domain Name)

.millis_per_slot
.unwrap_or(config.validator.millis_per_slot),
sigverify: self.sigverify.unwrap_or(config.validator.sigverify),
fdqn: self.fdqn.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Unlike other fields, fdqn directly uses clone() without unwrap_or(), breaking the pattern of preserving existing config values when CLI arg is None

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +216 to +217
#[arg(long = "accounts-db-size")]
pub db_size: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Missing environment variable support for accounts-db-size. Should add env = "ACCOUNTS_DB_SIZE" for consistency with other params.

Suggested change
#[arg(long = "accounts-db-size")]
pub db_size: Option<usize>,
#[arg(long = "accounts-db-size", env = "ACCOUNTS_DB_SIZE")]
pub db_size: Option<usize>,

@Dodecahedr0x
Copy link
Contributor Author

Superseded by #447

@bmuddha bmuddha deleted the feat/show-config-options branch July 12, 2025 09:32
Dodecahedr0x added a commit that referenced this pull request Jul 15, 2025
This PR is another attempt to allow layered configuration (default ->
file -> args -> env) while providing users with information on what
parameters they can adjust.

It overrides #346 with the following advantages:
- Automatically derives CLI and env var names to avoid repetitions and
reduce risk of typos
- Single definition of the configuration

The weakness of this approach is that it utilizes small procedural
macros to avoid repetition, which can compromise readability and make
debugging more challenging.

It introduces a breaking change in the format of the remote because Clap
does not accept an enum variant with complex values.

<!-- greptile_comment -->

## Greptile Summary

Major refactoring of configuration management using Clap for CLI
support, introducing layered configuration (default -> file -> args ->
env) with automatic CLI and environment variable derivation.

- Added `magicblock-config-macro` crate with procedural macros
`clap_prefix` and `clap_from_serde` for automatic CLI argument
generation
- Changed remote configuration format from `remote = "devnet"` to
`remote.cluster = "devnet"` across all TOML files (breaking change)
- Consolidated configuration types from individual crates into
centralized `magicblock-config` crate
- Fixed FQDN typo in validator config (renamed from `fdqn` to `fqdn`)
- Added merge functionality to all config structs for layered
configuration support



<!-- /greptile_comment -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat: support -h to show configurations options
5 participants