Question-3: You have been assigned with a new NLP (Natural Language Processing task) which does not have enough data for training the model. The work you are assigned is to work with the Microsoft Research Paraphrase Corpus NLP datasets and you need to use BERT language (Bidirectional Encoder Representations from Transformers) uncase base model. Which of the following approach you would use?
- As we don’t have enough data to train the model, so we can directly deploy this BERT model in AWS SageMaker.
- As we don’t have enough data to train the model, we would be using transfer learning approach and fine-tune the BERT NLP model for this specific task.
- We can re-use the code for BERT model and modify it to support the lower training data.
- We need to build a new model from scratch and use the available training data for fine-tune the model. And deploy it into the SageMaker and as on when new data arrived it can be trained and optimized further.
- You would ask AWS SageMaker support team to build a new model for the training data set you have.
Exp: As on regular basis new models are being launched in the field of Natural Language Processing and more and more models are using Advanced Deep Learning to increase the performance. And if we want to use newly developed NLP model, then the best approach would be use the pre-trained language model to a new dataset and fine-tune it for a specific NLP task. This approach is known as transfer learning, which could significantly reduce model training resource requirements, compared to training a model from scratch, and could also produce decent results, even with small amounts of training data.
With the rapid growth of NLP techniques, many NLP frameworks with packaged pre-trained language models have been developed to provide users easy access to transfer learning. For example, ULMFiT by fast.ai and PyTorch-Transformers by Hugging Face are two popular NLP frameworks with pre-trained language models. You can fine-tune NLP models using PyTorch-Transformers in Amazon SageMaker and apply the built-in automatic model-tuning capability. Generally you would be following the below approach
- Run PyTorch-Transformers Git code in Amazon SageMaker that can run on your local machine, using the SageMaker PyTorch framework.
- Optimize hyperparameters using automatic model tuning in Amazon SageMaker.
- View the sensitivity of NLP models to hyperparameter values.