You are given a pre-trained Transformer-based language model with frozen weights. Implement a parameter-efficient fine-tuning module using LoRA (Low-Rank Adaptation) for any linear projection inside the model. Specifically, for an original weight matrix W ∈ ℝ^{d×k}, instead of updating W directly, inject a trainable low-rank update ΔW = B A where B ∈ ℝ^{d×r}, A ∈ ℝ^{r×k}, and r ≪ min(d,k). During fine-tuning only A and B receive gradients; W remains frozen. After fine-tuning, allow an optional merge step that permanently adds the scaled low-rank update to W so that inference latency is identical to the original model. Your implementation should expose: (1) a forward pass that returns the adapted output, (2) a merge() method that folds LoRA weights into W, and (3) a state_dict() method that exports only the LoRA parameters (A, B, scaling factor α) so that different task adapters can be swapped in/out at runtime. Support configurable rank r and scaling α. Demonstrate correctness by showing that for the same input, the module produces identical results before and after merge when α=0, and produces measurably different results after training on a toy task.