IOAI Readiness

IOAI Competition Surface

The academy's phases train the judgment IOAI tests. This page trains the surface — how the problem is delivered, what the sandbox actually looks like, what counts as a submission, and how competitors lose when they have the skill but not the format.

Study Plan Decision Clinics Mock Task Track Back To Home

What IOAI Delivers

Problem + Data + Sandbox

You get a problem statement, a dataset, a restricted compute environment, and a submission format. Everything else — model choice, evaluation, stop rule — is yours to build in the window.

Where Competitors Lose

Format, Not Skill

Students who have finished every phase still lose points to silly things: misread submission format, wasted GPU hours, no baseline, chasing the public leaderboard, running out of time with no submission ready.

What This Page Does

Train The Muscle

Problem-reading drills, sandbox hygiene, compute discipline, and a mock task you can run cold. Pair with Phase 6 of the Study Plan.

What An IOAI Task Looks Like¶

A typical IOAI scientific task hands you:

a short problem statement — usually a paragraph describing the domain, the labels, and the evaluation metric
a training set — often a few thousand to a few tens of thousands of examples, sometimes less
a public test set — where you can submit predictions and see a score, usually limited submissions per day
a hidden private test set — scored only at the end, which determines final rank
a sandbox — a browser-based notebook or a provisioned VM with restricted internet, a fixed time window, and a single modest GPU at most
a submission format — usually a CSV or JSON in a strict shape; wrong shape means zero

Tasks vary across modalities (vision, text, tabular, audio, multimodal) and across years. The format above is stable enough that you can practice it.

Problem-Reading Drill¶

Before data loads, spend five minutes reading the problem. Force yourself to answer these seven questions in writing:

What is the label? (classification target, regression target, bounding boxes, tokens, something else?)
What is the metric? (accuracy, F1, AUC, IoU, BLEU, something else?) Does it reward calibration or only ranking?
What is the class balance likely to be? (will accuracy be misleading?)
What is the most obvious source of leakage on this dataset?
What is the cheapest baseline that would give you a number in 15 minutes?
What is the one change you would make after the baseline?
What submission shape is required, exactly?

If you cannot answer these before touching the data, the problem is still unread. A surprising fraction of competitors start coding with gaps in 2, 5, and 7 — and it costs them the whole session.

Sandbox Hygiene¶

IOAI sandboxes are not your laptop. Treat them as hostile by default.

Check the compute once at the start. How much RAM? What GPU, if any? How much scratch disk? Write these down.
Check the time once at the start. When does the window close? Set a visible timer for the last 30 minutes.
Check the submission mechanism once at the start. Can you submit predictions from inside the notebook? Is there a submit button? Is the submission file path fixed?
Do not install heavy packages you did not need. Every minute spent waiting for pip install is a minute not spent inspecting data.
Save a submission early. A baseline submission in the first 30 minutes guarantees a non-zero score. Far too many competitors wait until the end and then hit a format error they cannot debug.
Reset the kernel when something feels off. A corrupted notebook state costs you far more than the 15 seconds of a restart.

Compute Discipline¶

You almost never need the full GPU for the full window. Treat compute as a budget, not a default.

The first baseline should run on CPU only if it can. A dummy classifier, a small sklearn model, a frozen embedding — these give you a comparison point and a working submission pipeline for nearly no compute.
Reserve GPU time for one or two deliberate training runs, not endless fiddling. Each time you hit "run all" you are spending a chunk of your total budget.
Track wall time per experiment. If one experiment ate 40 minutes and moved the score by 0.003, you cannot afford five more like it.
When a training run diverges, kill it. Do not hope it recovers. Return to Phase 4 mentally: what are the first three things to check? Then try again.

Stop-Rule Discipline Under The Clock¶

This is where Phase 6 pays off. A timed run should produce three things in order:

a working submission from a cheap baseline
one deliberate iteration that you predicted would help and can measure
a stop decision — keep this submission, submit a later one, or try a different approach

If the clock hits 30 minutes to close and you do not have (1), stop everything and make (1). A wrong-but-submitted baseline beats a brilliant-but-unsubmitted model.

If you have (1) and (2) but no clear answer on (3), submit the better of the two and write a one-line note to yourself about what you would have tried next. That note is how you train faster next time.

Mock Task — Run This Cold¶

This is a self-imposed IOAI-style run you can do any time. Pair it with Mock Tasks and Timed Workflows.

Setup. Pick a dataset you have not used before — a Kaggle "Getting Started" task, an OpenML classification, a HuggingFace dataset, anything with a held-out test set and a metric. Set a 3-hour timer. No looking up tutorials. No reusing a previous notebook.

Rules:

first 15 minutes: only the problem-reading drill, no code
first 45 minutes: baseline submitted, even if it is trivial
hour 2: one deliberate iteration with a before/after comparison
hour 3: stop decision, final submission, and a written one-paragraph post-mortem

Debrief. After the timer, compare what you did to the "where competitors lose" list above. What did you skip? What cost you more time than it should have? Write those failure modes down and reread them before your next mock run.

Do this once a week during serious prep. Rotate datasets. The point is not to win — it is to make your format discipline boring.

Clinics That Match The Competition¶

Three clinics are especially useful right before a real run — treat them as warm-ups:

Public/Private Restraint — stops the score-chasing reflex
Threshold Under Asymmetric Cost — trains operating-point thinking
Checkpoint Roulette — trains checkpoint selection under pressure

Where To Go From Here¶

Phase 6 of the Study Plan — if time-pressured decisions still feel shaky
Mock Tasks and Timed Workflows — the full track version of the mock task above
Imbalanced Triage and Review Budgets — when the task is operating-point-driven
Timed Checkpoint Sheets — the timed question drills

Common Competition Failure Modes¶

If any of these sound familiar, the matching fix is above:

Failure	What usually went wrong	Fix
No submission at deadline	Waited too long for the "good" model	Baseline submission in first 45 minutes, always
Submission rejected as malformed	Ignored the format spec	Problem-reading drill step 7
Model scored high on public, low on private	Tuned on public leaderboard	Clinic: Public/Private Restraint
GPU time ran out mid-training	No compute budget	Compute discipline above; measure wall time per run
Kept training after it diverged	Hoped it would recover	Kill it; Phase 4 debugging checklist
Could not choose between two models	No stop rule	Phase 6 exit gate; pick the better local-validation one and move on