You are given a labeled image dataset that is known to contain two kinds of corruption:
Your task is to design and train an image classifier that reaches the highest possible accuracy on a held-out, manually verified clean test set. You may use any modeling, pre-processing, or training tricks you wish, but you must work with the raw noisy files as training data; you are not allowed to ask for human relabeling. During the onsite interview you will:
a) Write and justify code that detects which images are corrupted and decides whether to denoise them or to keep them as-is. b) Build a preprocessing + training pipeline that is robust to both kinds of noise and explain why it should work. c) Report the final test accuracy and analyze how much of the improvement came from handling pixel noise vs. label noise.