2025 Lord Kelvin Programming Competition – The Sleep Architect

(Codex-friendly problem spec)

0. What the code generator must do

Goal: Implement a single-file program (e.g. main.c) that:
- Reads one test case from stdin in the format below.
- Outputs one line: a sequence of actions (a_1,\dots,a_T) that maximizes the expected total reward minus switching penalty.
- Works for all valid inputs within the constraints, not just the sample.
Language: C (C11 / GCC 10).
I/O rules (very important):
- Input: read from stdin only.
- Output: exactly one line in the format [ + integers separated by ", " + ] e.g. [1, 1, 1, 0, 0]
- No extra debug prints.

1. Informal problem overview

You control a smart sleep-aid pillow. Each minute (t = 1,2,\dots,T), the brain is in one of (N) discrete states (e.g. Awake, Light Sleep, Deep Sleep, REM, Waked Up).

You can choose one of (M) soundscapes (actions) to play: [ a_t \in {0,1,\dots,M-1} ]

The brain’s state transitions stochastically according to transition matrices provided in the input.
Each state has a reward (R(s, t)), possibly time-varying (periodic cosine wandering).
Changing soundscape from minute (t-1) to (t) incurs a switching penalty depending on the state transition.

Your job is to output a sequence of actions (A = [a_1,\dots,a_T]) that maximizes the expected total sleep score over the session.

2. Mathematical model (symbols & meaning)

Let:

(T): total duration (minutes).
(N): number of sleep states.
(M): number of actions (soundscapes).
(S_t \in {S_0,\dots,S_{N-1}}): state at minute (t).
(a_t \in {0,\dots,M-1}): chosen action at minute (t).
(L): cycle length of reward wandering (if (L = 0), rewards are static).
(\mu): average reward centroid.
(R_{\text{initial}}(s)): base reward of state (s) given in input.
(R(s,t)): reward of state (s) at time (t).
(P^{(a)}_{i,j}): probability of transitioning from state (i) to state (j) under action (a).
(C_{i,j}): switching penalty when the brain transitions from state (i) to state (j) and we changed action at this step.
(I(\cdot)): indicator function (1 if condition holds, else 0).

2.1 Reward wandering

If (L = 0): [ R(s,t) = R_{\text{initial}}(s) ]
If (L > 0): reward wanders periodically: [ \mu = \frac{1}{N} \sum_{i=0}^{N-1} R_{\text{initial}}(S_i) ] [ R(s,t) = \mu + (R_{\text{initial}}(s) - \mu)\cdot \cos\left(\frac{2\pi t}{L}\right) ]

2.2 Switching penalty

At time (t \ge 2), if the action changes ((a_t \neq a_{t-1})), a penalty is applied that depends on the state transition ((S_{t-1} \to S_t)):

[ \text{Penalty}t = I(a_t \neq a{t-1}) \cdot C_{S_{t-1},S_t} ]

At (t=1), treat (a_0 = 0) (the “initial” action) for the indicator.

Uniform penalty mode: In the actual input format used here, Type=0 means “no switching cost”: all (C_{i,j}=0). Type=1 means a general penalty matrix with values from the input.

2.3 Objective function (what is maximized)

Given an action sequence (A=[a_1,\dots,a_T]), the scoring system computes the expected total sleep score:

[ J(A) = \mathbb{E}\left[\sum_{t=1}^{T} \left( R(S_t,t) - \text{Penalty}_t \right)\right] ]

The expectation is over the random Markov transitions defined by the matrices (P^{(a)}).

Your algorithm does not need to simulate randomness for scoring. You just need to output (A); the judge will compute (J(A)) using exact matrix operations.

3. Input format (one test case via stdin)

All data is read from standard input.

High-level structure:

T N M L Type
R0 R1 ... R(N-1)
[Switching Penalty Matrix]          // N x N integers
[Transition Matrix for Action 0]    // N x N doubles
[Transition Matrix for Action 1]    // N x N doubles
...
[Transition Matrix for Action M-1]  // N x N doubles

3.1 Line 1: header

T N M L Type

T – int
- Total duration (minutes).
- Constraints: (1 \le T \le 10000).
N – int
- Number of states.
- Constraints: (1 \le N \le 10).
M – int
- Number of actions.
- Constraints: (1 \le M \le 500).
L – int
- Reward wandering cycle length.
- If L = 0, rewards are static: (R(s,t) = R_{\text{initial}}(s)).
Type – int (mode identifier)
- Type = 0: uniform penalty mode (effectively no switching penalty, all (C_{i,j}=0)).
- Type = 1: general matrix penalty mode (use full (C_{i,j}) from input).

3.2 Line 2: base rewards

R0 R1 ... R(N-1)

N integers.
Rk is the base reward (R_{\text{initial}}(S_k)) for state (S_k).

3.3 Switching penalty matrix (always present)

Next is an (N \times N) integer matrix:

// N lines follow:
c00 c01 ... c0(N-1)
c10 c11 ... c1(N-1)
...
c(N-1)0 ...         c(N-1)(N-1)

If Type = 1:
- C[i][j] = cij is the cost of switching soundscape when the brain transitions from state i to state j.
If Type = 0:
- All entries are zero.
- Matrix is still provided for format consistency, but your algorithm can treat all penalties as zero.

3.4 Transition matrices (probabilities)

Then we have M blocks. Each block is an (N \times N) matrix of doubles, giving the transition probabilities for one action:

For each action a = 0, 1, ..., M-1:

// block for action a, consists of N lines:
p00 p01 ... p0(N-1)   // from state 0
p10 p11 ... p1(N-1)   // from state 1
...
p(N-1)0 ...           p(N-1)(N-1)   // from state N-1

pij is (P^{(a)}{i,j} = P(S_t = j | S{t-1} = i, a_t = a)).
All pij are floating-point numbers, typically in [0.0, 1.0].
For each row (fixed i), the sum over j is guaranteed to be 1.0.

4. Output format

You must print a single line:

[a1, a2, ..., aT]

The line must:
- Start with [ and end with ].
- Contain exactly T integers.
- Each integer is an action ID in the range 0 to M-1.
- Between two integers, use exactly ", " (comma + space).
Example (for T = 20):

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0]

Any illegal output (e.g. action ID < 0 or ≥ M, wrong format) will cause that test case’s score to be 0.

5. Constraints & test case categories

The online judge (e.g. CodeGrade) will run your program on 10 different test inputs. You only see the sample; the rest are hidden.

Three groups:

Basic cases (30%)
- Type = 0 (no penalty).
- L = 0 (static rewards).
- Rough size: T ≤ 50, M = 10, N = 5.
Performance cases (40%)
- Type = 1 (matrix penalty).
- L = 0 (static rewards).
- Rough size: 1000 ≤ T ≤ 3000, M = 80, N = 40.
Ultimate cases (30%)
- Type = 1 (matrix penalty).
- L = T (rewards fully time-variant over the whole horizon).
- Rough size: 1000 ≤ T ≤ 3000, M = 80, N = 40.

Your single program must handle all of these ranges robustly (time and memory).

6. Scoring and runtime

For each of the 10 hidden test cases (i=1,...,10):

The judge computes:
- ( \text{ScoreRaw}_i = J(A_i) ) (expected total reward of your sequence (A_i)).
- There is a standard baseline score (\text{ScoreStandard}_i) for each test.
- Your base score on that test: [ \text{ScoreBase}_i = \min(\text{ScoreRaw}_i, \text{ScoreStandard}_i) ]
- If you beat the baseline ((\text{ScoreRaw}_i > \text{ScoreStandard}_i)), you also get a bonus that depends on:
  - How much you exceed the standard.
  - Your program’s running time (T^{\text{use}}_i) on that test.
Your overall competition score is the sum over all 10 test cases of base + bonus.

Key implications for the code generator:

Correctness first: invalid output or runtime error ⇒ 0 on that test.
Quality of policy matters: try to genuinely maximize expected reward.
Speed matters: asymptotically faster algorithms can earn more bonus on large tests.

7. Implementation checklist (for Codex)

When generating the C solution, obey the following:

Input parsing
- Use scanf / fscanf(stdin, ...) to read:
  - T, N, M, L, Type.
  - N integers for base rewards.
  - N x N integers for penalty matrix (store or ignore if Type == 0).
  - M blocks of N x N doubles for transition matrices.
- Allocate arrays dynamically based on N and M (e.g. malloc).
Data structures
- Suggested:
  - double P[M][N][N] or flattened double *P if stack is insufficient.
  - int C[N][N] for penalties.
  - int baseReward[N].
- For planning, you may maintain state distributions double dist[N] and/or DP arrays.
Algorithm (high-level only)
- This is a finite-horizon stochastic control / MDP problem.
- Naive brute-force over all action sequences is impossible (M^T).
- The algorithm should:
  - Use the transition matrices and rewards to evaluate expected value of candidate policies.
  - Consider switching penalty when changing actions.
  - Exploit the relatively small N (≤ 10) to do dynamic programming or clever approximate planning.

Output

After computing your action sequence a[0..T-1], print it once as:

printf("[");
for (int t = 0; t < T; ++t) {
    if (t > 0) printf(", ");
    printf("%d", a[t]);
}
printf("]");

Optionally append a newline.

No extra output
- Do not print debugging lines, labels, or spaces outside the required format.

8. Sample (from Appendix)

8.1 Sample input (Raw_Input.txt)

20 5 3 20 1
0 20 50 100 0
0 10 20 50 0
10 0 10 20 0
20 10 0 10 0
50 20 10 0 0
0 0 0 0 0
1.0 0.0 0.0 0.0 0.0
0.9 0.1 0.0 0.0 0.0
0.5 0.5 0.0 0.0 0.0
0.1 0.1 0.8 0.0 0.0
0.0 0.0 0.1 0.9 0.0
0.5 0.5 0.0 0.0 0.0
0.1 0.8 0.1 0.0 0.0
0.0 0.2 0.7 0.1 0.0
0.0 0.0 0.2 0.8 0.0
1.0 0.0 0.0 0.0 0.0
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
1.0 0.0 0.0 0.0 0.0

T=20, N=5, M=3, L=20, Type=1.
Next 5 lines: base rewards and penalty matrix (see full PDF if needed).
Next 3 blocks of 5x5 doubles: transition matrices for actions 0, 1, 2.

8.2 Example output

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0]

This is just an example action sequence. The judge will compute its expected score using the given model.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
eval_sleep.c		eval_sleep.c
eval_sleep.exe		eval_sleep.exe
readme.md		readme.md
sleep_architect.c		sleep_architect.c
sleep_architect.exe		sleep_architect.exe
sleep_architect_input.txt		sleep_architect_input.txt
solution_clean.txt		solution_clean.txt
solution_raw.txt		solution_raw.txt
zeros.txt		zeros.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

2025 Lord Kelvin Programming Competition – The Sleep Architect

0. What the code generator must do

1. Informal problem overview

2. Mathematical model (symbols & meaning)

2.1 Reward wandering

2.2 Switching penalty

2.3 Objective function (what is maximized)

3. Input format (one test case via stdin)

3.1 Line 1: header

3.2 Line 2: base rewards

3.3 Switching penalty matrix (always present)

3.4 Transition matrices (probabilities)

4. Output format

5. Constraints & test case categories

6. Scoring and runtime

7. Implementation checklist (for Codex)

8. Sample (from Appendix)

8.1 Sample input (Raw_Input.txt)

8.2 Example output

About

Uh oh!

Releases

Packages

Languages

Jasper-Zhang-A/IP-Competition

Folders and files

Latest commit

History

Repository files navigation

2025 Lord Kelvin Programming Competition – The Sleep Architect

0. What the code generator must do

1. Informal problem overview

2. Mathematical model (symbols & meaning)

2.1 Reward wandering

2.2 Switching penalty

2.3 Objective function (what is maximized)

3. Input format (one test case via stdin)

3.1 Line 1: header

3.2 Line 2: base rewards

3.3 Switching penalty matrix (always present)

3.4 Transition matrices (probabilities)

4. Output format

5. Constraints & test case categories

6. Scoring and runtime

7. Implementation checklist (for Codex)

8. Sample (from Appendix)

8.1 Sample input (Raw_Input.txt)

8.2 Example output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages