You are a senior ML engineer at NVIDIA. A product team wants to deploy a transformer-based language model for a customer-support chatbot that must run on a single A100 GPU with 40 GB RAM and answer in <200 ms. You have two candidate checkpoints: (1) a 70B-parameter general-purpose model that achieves 82 % F1 on the customer-support test set, and (2) an 8B-parameter model that, after domain-specific fine-tuning, reaches 84 % F1. Your task is to design and code a complete hyper-parameter tuning and model-selection pipeline that decides which model to ship and what hyper-parameters to use so that the final system meets the latency, memory, and quality targets. The pipeline must automatically explore learning-rate, batch-size, weight-decay, warmup-steps, and quantization choices; it must respect a fixed GPU-hour budget of 100 A100-hours; and it must output the Pareto-optimal configuration(s) together with predicted latency, peak-memory, and F1 score. Implement the pipeline in PyTorch + HuggingFace + Optuna; provide a CLI that accepts the two checkpoints, the customer-support training/validation JSONL files, the GPU-hour budget, and the latency/memory thresholds; and print the final recommendation.