This question tests your understanding of distributed systems intuition and NumPy fundamentals. You are given starter code with a Communicator class that simulates inter-device communication in a distributed system. Your task is to implement the distributed matrix multiplication algorithm using both data parallel and fully sharded data parallel (FSDP) approaches.
A and B into num_devices chunks along the first dimension.A and B across num_devices devices.A and B are square matrices of size n x n.num_devices is the number of devices available for parallel computation.`python
A = np.random.rand(100, 100) B = np.random.rand(100, 100) C = data_parallel_matrix_multiplication(A, B, num_devices=4) print(C)
A = np.random.rand(100, 100) B = np.random.rand(100, 100) C = fsdp_matrix_multiplication(A, B, num_devices=4) print(C) `
split function to divide the matrices into chunks.concatenate function to combine the results from all devices.`python import numpy as np
class Communicator: def init(self, num_devices): self.num_devices = num_devices
def send(self, data, device_id):
# Simulate sending data to a device
return data
def recv(self, device_id):
# Simulate receiving data from a device
pass
def data_parallel_matrix_multiplication(A, B, num_devices): comm = Communicator(num_devices) n = A.shape[0] chunk_size = n // num_devices A_chunks = np.split(A, num_devices, axis=0) B_chunks = np.split(B, num_devices, axis=0)
results = []
for i in range(num_devices):
A_i = A_chunks[i]
B_i = B_chunks[i]
result = np.matmul(A_i, B_i)
results.append(result)
final_result = np.concatenate(results, axis=0)
return final_result
def fsdp_matrix_multiplication(A, B, num_devices): comm = Communicator(num_devices) n = A.shape[0] chunk_size = n // num_devices A_shards = np.array_split(A, num_devices, axis=0) B_shards = np.array_split(B, num_devices, axis=0)
results = []
for i in range(num_devices):
A_i = A_shards[i]
B_i = B_shards[i]
result = np.matmul(A_i, B_i)
results.append(result)
final_result = np.concatenate(results, axis=0)
return final_result
`
This solution demonstrates the basic implementation of data parallel and FSDP matrix multiplication using NumPy. You can further optimize the code by using more efficient communication primitives and parallelization techniques.