This post will be put aside for now. It is quite easy to use HuggingFace with SageMaker now that Amazon and HuggingFace have made a partnership. You should only use HuggingFace with PyTorch Lightning after you've exhausted the more simple approaches that only involve HuggingFace (or if you really need a custom model, like a molti-modal model).

Understanding Model Deployment on SageMaker

Before I dive into the nitty-gritty details of how to use Hugging Face and PyTorch Lightning in SageMaker, I'm going to give a general overview of SageMaker. In other words, if you only want to know how to use Hugging Face and PyTorch Lightning, skip this section.

Quick Overview of SageMaker

SageMaker allows AWS users to create a notebook instance in the cloud. SageMaker tries to make it easy for data scientists and machine learning engineers to train and deploy their machine learning models in production. Therefore, if you are comfortable using Jupyter Notebooks for data science, SageMaker will be great for you. However, there are some pros and cons.


  • SageMaker has a great surrounding ecosystem that allows you to train, debug, and deploy your models, as well as my other useful tools for machine learning in production.
  • It makes it easier to do some tests and sanity-checks when you are doing model development in the cloud.
  • The SageMaker Python SDK makes it easy to do things like A/B tests. In other words, when you create or update an endpoint, you can easily split the traffic going to one endpoint across several models (and by different percentages/weights). You can easily change the endpoint configuration to switch to the best model over time.
  • The SDK also makes it easy to load and unload models dynamically as needed (multi-model endpoints). If you use this in place of creating an endpoint for every model you put in production, you can save a heck of a lot of money. For example, if you have 1000 clients who all have a different model fine-tuned on their specific data, you can go from 171k to 1k US dollars per month!


  • AWS charges a premium for the usefulness of the notebook, and the surrounding infrastructure.
  • Since you need to use the SageMaker Python SDK to run your code, your local notebooks need to be updated to run in SageMaker. That means, depending on the task, it may be preferable to simply put all your code in scripts, spin up an EC2 instance, and deploy with Docker.

All-in-all, I would say that the best time to use SageMaker is when you are doing a lot of experimentation, you think notebooks will help you better explore the data, and you need to additional cloud compute resources.

AWS also has a lot of sample notebooks for deployment, so it may be worth it to check them (particularly if you are planning to deploy similar models). Building on top of the work of others could also save you some time.