AWS ML Speciality Notes (Part 4.4)

Posted Feb 15, 2023

By Sharat Sachin

3 min read

This post gives a quick review on recommending and implementing the appropriate machine learning services and features for a given problem.

Deploy and operationalize machine learning solutions

use Inference Recommender to deploy your model to a real-time inference endpoint that delivers the best performance at the lowest cost
deploy your model to SageMaker hosting services and get an endpoint
- can be used for inference
- fully managed and support autoscaling
- scoped to an individual AWS account, not public
SageMaker determines the account ID from the authentication token that is supplied by the caller
to set up and deploy a web service that you can call from a client application that is not within the scope of your account, you can call a SageMaker model endpoint using Amazon API Gateway and AWS Lambda
S3 bucket where the model artifacts are stored must be in the same region as the model
- models are stored as model.tar.gz in the S3 bucket specified in OutputDataConfig S3OutputPath parameter of the create_training_job call
- When model.tar.gz is untarred, it contains model_algo-1, which is a serialized Apache MXNet object

Steps :

Create a SageMaker model in SageMaker
Create an endpoint configuration for an HTTPS endpoint
- defines names of models in production (variants) and the ML compute instances that you want SageMaker to launch to host each production variant
- you can configure the endpoint to elastically scale the deployed ML compute instances
- when you specify two or more instances, SageMaker launches them in multiple AZs
Create an HTTPS endpoint

You can:

Host a single model
Host multiple models/variants in one container behind one endpoint
Host multiple models/variants which use different containers behind one endpoint
Host models along with pre-processing logic as serial inference pipeline behind one endpoint
- inference pipeline is a Amazon SageMaker model that is composed of a linear sequence of two to fifteen containers that process requests for inferences on data
- can combine preprocessing, predictions, and post-processing data science tasks
- you can use Spark and scikit-learn preprocessors to transform your data
- fully managed

SageMaker model that is composed of a linear sequence of 2 to 15 containers that process requests for inferences on data
define and deploy any combination of pretrained SageMaker built-in algorithms and your own custom algorithms packaged in Docker containers
combine preprocessing, predictions, and post-processing data science tasks
fully managed
within an inference pipeline model, Amazon SageMaker handles invocations as a sequence of HTTP requests

capability of Amazon SageMaker that enables machine learning models to train once and run anywhere in the cloud and at the edge
currently supports image classification models exported as frozen graphs from TensorFlow, MXNet, or PyTorch, and XGBoost models
three advantages of using Amazon Neo with SageMaker models are:
1. run ML models with up to 2x better performance
2. reduce framework size by 10x
3. run the same ML model on multiple hardware platforms

open-source edge runtime and cloud service for building, deploying, and managing device software
provides pre-built components so you can easily extend edge device functionality without writing code

SageMaker enables you to test multiple models or model versions behind the same endpoint using production variants
you can test ML models that have been trained using different datasets, trained using different algorithms and ML frameworks, or are deployed to different instance type, or any combination of all of these
you can distribute endpoint invocation requests across multiple production variants by:
1. providing the traffic distribution for each variant
2. you can invoke a specific variant directly for each request

SageMakerVariantInvocationsPerInstance = (MAX_RPS * SAFETY_FACTOR) * 60 AWS recommended SAFETY_FACTOR = 0.5

https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-best-practices.html

This post is licensed under CC BY 4.0 by the author.