Practice/xAI/Group Test GPU Nodes
CodingMust
You have n GPU nodes in a cluster, some of which are faulty. You need to identify all faulty nodes using a special group testing function.
You can call a test(S) function that tests a set of nodes:
test(S) returns True if all nodes in S are goodtest(S) returns False if S contains at least one faulty nodeConstraints:
|S| >= 2 — You cannot test a single node by itselfGoal: Design an algorithm to identify all faulty nodes as efficiently as possible.
If there's only 1 good node G among n nodes:
test({G, X}) = Falsetest({X, Y}) = FalseThis is a fundamental limitation of the group testing model with the |S| >= 2 constraint.
If we can find one known-good node G, we can test any other node X by calling test({G, X}):
test({G, X}) = True → X is goodtest({G, X}) = False → X is faultyThis reduces the problem to: How do we efficiently find one good node?
The first challenge is finding at least one good node. Use binary search to find a subset containing only good nodes, then pick any node from that subset as the reference.
` nodes = [0, 1, 2, 3, 4, 5, 6, 7] # Hidden: nodes 2 and 5 are faulty
test({0, 1, 2, 3, 4, 5, 6, 7}) # False (contains faulty nodes)
test({0, 1, 2, 3}) # False (contains node 2)
test({4, 5, 6, 7}) # False (contains node 5)
test({0, 1}) # True! Both are good, pick node 0 as reference `
Once you have a known-good node, test each remaining node against it to classify as good or faulty.
`
good_ref = 0 nodes = [1, 2, 3, 4, 5, 6, 7]
test({0, 1}) # True → node 1 is good test({0, 2}) # False → node 2 is faulty test({0, 3}) # True → node 3 is good test({0, 4}) # True → node 4 is good test({0, 5}) # False → node 5 is faulty test({0, 6}) # True → node 6 is good test({0, 7}) # True → node 7 is good
`