Skip to content

Jasper-Zhang-A/IP-Competition

Repository files navigation

2025 Lord Kelvin Programming Competition – The Sleep Architect

(Codex-friendly problem spec)

0. What the code generator must do

  • Goal: Implement a single-file program (e.g. main.c) that:

    • Reads one test case from stdin in the format below.
    • Outputs one line: a sequence of actions (a_1,\dots,a_T) that maximizes the expected total reward minus switching penalty.
    • Works for all valid inputs within the constraints, not just the sample.
  • Language: C (C11 / GCC 10).

  • I/O rules (very important):

    • Input: read from stdin only.
    • Output: exactly one line in the format [ + integers separated by ", " + ] e.g. [1, 1, 1, 0, 0]
    • No extra debug prints.

1. Informal problem overview

You control a smart sleep-aid pillow. Each minute (t = 1,2,\dots,T), the brain is in one of (N) discrete states (e.g. Awake, Light Sleep, Deep Sleep, REM, Waked Up).

You can choose one of (M) soundscapes (actions) to play: [ a_t \in {0,1,\dots,M-1} ]

  • The brain’s state transitions stochastically according to transition matrices provided in the input.
  • Each state has a reward (R(s, t)), possibly time-varying (periodic cosine wandering).
  • Changing soundscape from minute (t-1) to (t) incurs a switching penalty depending on the state transition.

Your job is to output a sequence of actions (A = [a_1,\dots,a_T]) that maximizes the expected total sleep score over the session.


2. Mathematical model (symbols & meaning)

Let:

  • (T): total duration (minutes).
  • (N): number of sleep states.
  • (M): number of actions (soundscapes).
  • (S_t \in {S_0,\dots,S_{N-1}}): state at minute (t).
  • (a_t \in {0,\dots,M-1}): chosen action at minute (t).
  • (L): cycle length of reward wandering (if (L = 0), rewards are static).
  • (\mu): average reward centroid.
  • (R_{\text{initial}}(s)): base reward of state (s) given in input.
  • (R(s,t)): reward of state (s) at time (t).
  • (P^{(a)}_{i,j}): probability of transitioning from state (i) to state (j) under action (a).
  • (C_{i,j}): switching penalty when the brain transitions from state (i) to state (j) and we changed action at this step.
  • (I(\cdot)): indicator function (1 if condition holds, else 0).

2.1 Reward wandering

  • If (L = 0): [ R(s,t) = R_{\text{initial}}(s) ]
  • If (L > 0): reward wanders periodically: [ \mu = \frac{1}{N} \sum_{i=0}^{N-1} R_{\text{initial}}(S_i) ] [ R(s,t) = \mu + (R_{\text{initial}}(s) - \mu)\cdot \cos\left(\frac{2\pi t}{L}\right) ]

2.2 Switching penalty

At time (t \ge 2), if the action changes ((a_t \neq a_{t-1})), a penalty is applied that depends on the state transition ((S_{t-1} \to S_t)):

[ \text{Penalty}t = I(a_t \neq a{t-1}) \cdot C_{S_{t-1},S_t} ]

At (t=1), treat (a_0 = 0) (the “initial” action) for the indicator.

Uniform penalty mode: In the actual input format used here, Type=0 means “no switching cost”: all (C_{i,j}=0). Type=1 means a general penalty matrix with values from the input.

2.3 Objective function (what is maximized)

Given an action sequence (A=[a_1,\dots,a_T]), the scoring system computes the expected total sleep score:

[ J(A) = \mathbb{E}\left[\sum_{t=1}^{T} \left( R(S_t,t) - \text{Penalty}_t \right)\right] ]

The expectation is over the random Markov transitions defined by the matrices (P^{(a)}).

Your algorithm does not need to simulate randomness for scoring. You just need to output (A); the judge will compute (J(A)) using exact matrix operations.


3. Input format (one test case via stdin)

All data is read from standard input.

High-level structure:

T N M L Type
R0 R1 ... R(N-1)
[Switching Penalty Matrix]          // N x N integers
[Transition Matrix for Action 0]    // N x N doubles
[Transition Matrix for Action 1]    // N x N doubles
...
[Transition Matrix for Action M-1]  // N x N doubles

3.1 Line 1: header

T N M L Type
  • T – int

    • Total duration (minutes).
    • Constraints: (1 \le T \le 10000).
  • N – int

    • Number of states.
    • Constraints: (1 \le N \le 10).
  • M – int

    • Number of actions.
    • Constraints: (1 \le M \le 500).
  • L – int

    • Reward wandering cycle length.
    • If L = 0, rewards are static: (R(s,t) = R_{\text{initial}}(s)).
  • Type – int (mode identifier)

    • Type = 0: uniform penalty mode (effectively no switching penalty, all (C_{i,j}=0)).
    • Type = 1: general matrix penalty mode (use full (C_{i,j}) from input).

3.2 Line 2: base rewards

R0 R1 ... R(N-1)
  • N integers.
  • Rk is the base reward (R_{\text{initial}}(S_k)) for state (S_k).

3.3 Switching penalty matrix (always present)

Next is an (N \times N) integer matrix:

// N lines follow:
c00 c01 ... c0(N-1)
c10 c11 ... c1(N-1)
...
c(N-1)0 ...         c(N-1)(N-1)
  • If Type = 1:

    • C[i][j] = cij is the cost of switching soundscape when the brain transitions from state i to state j.
  • If Type = 0:

    • All entries are zero.
    • Matrix is still provided for format consistency, but your algorithm can treat all penalties as zero.

3.4 Transition matrices (probabilities)

Then we have M blocks. Each block is an (N \times N) matrix of doubles, giving the transition probabilities for one action:

For each action a = 0, 1, ..., M-1:

// block for action a, consists of N lines:
p00 p01 ... p0(N-1)   // from state 0
p10 p11 ... p1(N-1)   // from state 1
...
p(N-1)0 ...           p(N-1)(N-1)   // from state N-1
  • pij is (P^{(a)}{i,j} = P(S_t = j | S{t-1} = i, a_t = a)).
  • All pij are floating-point numbers, typically in [0.0, 1.0].
  • For each row (fixed i), the sum over j is guaranteed to be 1.0.

4. Output format

You must print a single line:

[a1, a2, ..., aT]
  • The line must:

    • Start with [ and end with ].
    • Contain exactly T integers.
    • Each integer is an action ID in the range 0 to M-1.
    • Between two integers, use exactly ", " (comma + space).
  • Example (for T = 20):

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0]

Any illegal output (e.g. action ID < 0 or ≥ M, wrong format) will cause that test case’s score to be 0.


5. Constraints & test case categories

The online judge (e.g. CodeGrade) will run your program on 10 different test inputs. You only see the sample; the rest are hidden.

Three groups:

  1. Basic cases (30%)

    • Type = 0 (no penalty).
    • L = 0 (static rewards).
    • Rough size: T ≤ 50, M = 10, N = 5.
  2. Performance cases (40%)

    • Type = 1 (matrix penalty).
    • L = 0 (static rewards).
    • Rough size: 1000 ≤ T ≤ 3000, M = 80, N = 40.
  3. Ultimate cases (30%)

    • Type = 1 (matrix penalty).
    • L = T (rewards fully time-variant over the whole horizon).
    • Rough size: 1000 ≤ T ≤ 3000, M = 80, N = 40.

Your single program must handle all of these ranges robustly (time and memory).


6. Scoring and runtime

For each of the 10 hidden test cases (i=1,...,10):

  • The judge computes:

    • ( \text{ScoreRaw}_i = J(A_i) ) (expected total reward of your sequence (A_i)).

    • There is a standard baseline score (\text{ScoreStandard}_i) for each test.

    • Your base score on that test: [ \text{ScoreBase}_i = \min(\text{ScoreRaw}_i, \text{ScoreStandard}_i) ]

    • If you beat the baseline ((\text{ScoreRaw}_i > \text{ScoreStandard}_i)), you also get a bonus that depends on:

      • How much you exceed the standard.
      • Your program’s running time (T^{\text{use}}_i) on that test.
  • Your overall competition score is the sum over all 10 test cases of base + bonus.

Key implications for the code generator:

  • Correctness first: invalid output or runtime error ⇒ 0 on that test.
  • Quality of policy matters: try to genuinely maximize expected reward.
  • Speed matters: asymptotically faster algorithms can earn more bonus on large tests.

7. Implementation checklist (for Codex)

When generating the C solution, obey the following:

  1. Input parsing

    • Use scanf / fscanf(stdin, ...) to read:

      • T, N, M, L, Type.
      • N integers for base rewards.
      • N x N integers for penalty matrix (store or ignore if Type == 0).
      • M blocks of N x N doubles for transition matrices.
    • Allocate arrays dynamically based on N and M (e.g. malloc).

  2. Data structures

    • Suggested:

      • double P[M][N][N] or flattened double *P if stack is insufficient.
      • int C[N][N] for penalties.
      • int baseReward[N].
    • For planning, you may maintain state distributions double dist[N] and/or DP arrays.

  3. Algorithm (high-level only)

    • This is a finite-horizon stochastic control / MDP problem.

    • Naive brute-force over all action sequences is impossible (M^T).

    • The algorithm should:

      • Use the transition matrices and rewards to evaluate expected value of candidate policies.
      • Consider switching penalty when changing actions.
      • Exploit the relatively small N (≤ 10) to do dynamic programming or clever approximate planning.
  4. Output

    • After computing your action sequence a[0..T-1], print it once as:

      printf("[");
      for (int t = 0; t < T; ++t) {
          if (t > 0) printf(", ");
          printf("%d", a[t]);
      }
      printf("]");
    • Optionally append a newline.

  5. No extra output

    • Do not print debugging lines, labels, or spaces outside the required format.

8. Sample (from Appendix)

8.1 Sample input (Raw_Input.txt)

20 5 3 20 1
0 20 50 100 0
0 10 20 50 0
10 0 10 20 0
20 10 0 10 0
50 20 10 0 0
0 0 0 0 0
1.0 0.0 0.0 0.0 0.0
0.9 0.1 0.0 0.0 0.0
0.5 0.5 0.0 0.0 0.0
0.1 0.1 0.8 0.0 0.0
0.0 0.0 0.1 0.9 0.0
0.5 0.5 0.0 0.0 0.0
0.1 0.8 0.1 0.0 0.0
0.0 0.2 0.7 0.1 0.0
0.0 0.0 0.2 0.8 0.0
1.0 0.0 0.0 0.0 0.0
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
1.0 0.0 0.0 0.0 0.0
  • T=20, N=5, M=3, L=20, Type=1.
  • Next 5 lines: base rewards and penalty matrix (see full PDF if needed).
  • Next 3 blocks of 5x5 doubles: transition matrices for actions 0, 1, 2.

8.2 Example output

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0]

This is just an example action sequence. The judge will compute its expected score using the given model.

About

2025 Lord Kelvin Programming Competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages