Implement a complete PyTorch training loop for a given neural network model and dataset. Your function should take a model, a PyTorch DataLoader for training data, a loss function, an optimizer, and the number of epochs to train. Inside the loop you must: iterate over every batch, perform a forward pass to obtain predictions, compute the loss, back-propagate gradients, update the model parameters, and zero the gradients before the next batch. After each epoch compute and return the average training loss for that epoch. Make sure the model is set to training mode at the start. You should also handle GPU placement if CUDA is available (move both model and data to the current device). No validation loop or checkpointing is required for this exercise; focus only on the core training loop implementation.