Building an MLOps strategy from the ground up

Crunch 2022

Isabel Zimmerman, RStudio, PBC

September 20, 2022

from the ground up

Tip

go to isabel.quarto.pub/crunch2022/ to follow along!

  • what is MLOps anyway (and how can I start)? šŸ¶
  • what do I need to keep in mind for tooling? šŸ”Ø

if you develop modelsā€¦

you can operationalize them

if you develop modelsā€¦

you should operationalize them

well, some of them

information -> šŸ¶ -> actions

information -> model -> actions

What is MLOps?

MLOps isā€¦

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()
print(df)
     like_count  funny  show_product_quickly  patriotic  celebrity  danger  \
0        1233.0  False                 False      False      False   False   
1         485.0   True                  True      False       True    True   
2         129.0   True                 False      False      False    True   
3           2.0  False                  True      False      False   False   
4          20.0   True                  True      False      False    True   
..          ...    ...                   ...        ...        ...     ...   
241        10.0   True                 False       True       True   False   
243       572.0  False                  True       True      False   False   
244        14.0   True                 False      False       True    True   
245        12.0   True                 False      False      False    True   
246       334.0  False                 False      False       True   False   

     animals  
0      False  
1      False  
2       True  
3      False  
4       True  
..       ...  
241     True  
243     True  
244    False  
245    False  
246    False  

[225 rows x 7 columns]

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, pipeline, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, pipeline, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, pipeline, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)

rf_pipe = pipeline.Pipeline([('ordinal_encoder',oe), ('random_forest', rf)])

MLOps isā€¦

MLOps isā€¦ versioning

model

model_final

model_final_v2

model_final_v2_ACTUALLY

MLOps isā€¦ versioning

managing change in models

import pins

model_board = pins.board_temp(
    allow_pickle_read = True)

from vetiver import VetiverModel, vetiver_pin_write

v = VetiverModel(rf_pipe, "ads", ptype_data = X_train)

import pins

model_board = pins.board_temp(
    allow_pickle_read = True)

from vetiver import VetiverModel, vetiver_pin_write

v = VetiverModel(rf_pipe, "ads", ptype_data = X_train)

vetiver_pin_write(model_board, v)
library(vetiver)
library(pins)

model_board <- board_temp()

v <- vetiver_model(model, "ads")

model_board %>% 
  vetiver_pin_write(v)

import pins

model_board = pins.board_temp(
    allow_pickle_read = True)

from vetiver import VetiverModel, vetiver_pin_write

v = VetiverModel(rf_pipe, "ads", ptype_data = X_train)

vetiver_pin_write(model_board, v)

model_board.pin_meta("ads")
Meta(title='ads: a pinned Pipeline object', description="Scikit-learn <class 'sklearn.pipeline.Pipeline'> model", created='20221003T221304Z', pin_hash='612d4b523ca8c0ef', file='ads.joblib', file_size=432866, type='joblib', api_version=1, version=Version(created=datetime.datetime(2022, 10, 3, 22, 13, 4), hash='612d4'), name='ads', user={'ptype': '{"funny": true, "show_product_quickly": true, "patriotic": false, "celebrity": false, "danger": false, "animals": false}', 'required_pkgs': ['vetiver', 'scikit-learn']})

MLOps isā€¦ versioning

where are these boards hosted?

import pins

model_board = pins.board_temp(
    allow_pickle_read = True)

import pins

model_board = pins.board_local(
    allow_pickle_read = True)

import pins

model_board = pins.board_s3(
    allow_pickle_read = True)

import pins

model_board = pins.board_gcs(
    allow_pickle_read = True)

import pins

model_board = pins.board_azure(
    allow_pickle_read = True)

MLOps isā€¦ deploying

MLOps isā€¦ deploying

putting a model in production

MLOps isā€¦ deploying

putting a model in production somewhere that is not on your local laptop

MLOps isā€¦ deploying

putting a model in production somewhere that is not on your local laptop

MLOps isā€¦ deploying

putting a model in production somewhere that is not on your local laptop

āœ… using REST APIs

from vetiver import VetiverModel, VetiverAPI, vetiver_pin_write
v = VetiverModel(rf_pipe, "ads", ptype_data = X_train) 

VetiverAPI(v).run()

vetiver.deploy_rsconnect(
    connect_server = connect_server, 
    board = model_board, 
    pin_name = "ads", 
    version = "59869")

vetiver.write_app(board=board, pin_name="ads")

vetiver.write_app(board=board, pin_name="ads")
vetiver.write_docker(app_file="app.py")

MLOps isā€¦ monitoring

MLOps isā€¦ monitoring

MLOps isā€¦ monitoring

import vetiver
from sklearn import metrics
from datetime import timedelta

metric_set = [metrics.mean_absolute_error, metrics.mean_squared_error]

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    metric_set, 
    "like_count", 
    "preds"
    )

import vetiver
from sklearn import metrics
from datetime import timedelta

metric_set = [metrics.mean_absolute_error, metrics.mean_squared_error]

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    metric_set, 
    "like_count", 
    "preds"
    )

m = vetiver.plot_metrics(metrics)

import vetiver
from sklearn import metrics
from datetime import timedelta

metric_set = [metrics.mean_absolute_error, metrics.mean_squared_error]

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    metric_set, 
    "like_count", 
    "preds"
    )

m = vetiver.plot_metrics(metrics)
m.update_yaxes(matches=None)
m.show()

MLOps isā€¦ monitoring

when things go wrong:

  • retrain, retrain, retrain
  • try a new model type
  • remember to version!

MLOps isā€¦ thinking about making good models

MLOps isā€¦ thinking about making good models

vetiver.vetiver_pin_write(board, v)
Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()

MLOps isā€¦ thinking about making good models

vetiver.model_card()

MLOps isā€¦ thinking about making good models

From Mitchell et al. (2019):

Therefore the usefulness and accuracy of a model card relies on the integrity of the creator(s) of the card itself.

from the ground up

  • what is MLOps anyway (and how can I start)? šŸ¶ āœ…
  • what do I need to keep in mind for tooling? šŸ”Ø

Tooling tipsā€¦

Tooling tipsā€¦

you (and your team!) are unique!

Tooling tipsā€¦

  • composable
    • in different environments
    • with other tools

Tooling tipsā€¦

  • composable
  • reproducible
    • open source šŸ’–

Tooling tipsā€¦

  • composable
  • reproducible
  • ergonomic

Tooling tipsā€¦

  • composable
  • reproducible
  • ergonomic

Tooling tipsā€¦

  • composable
  • reproducible
  • ergonomic

(and able to do the MLOps tasks we want)

from the ground up

  • what is MLOps anyway (and how can I start)? šŸ¶ āœ…
  • what do I need to keep in mind for tooling? šŸ”Ø āœ…

MLOps isā€¦

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

  • versioning

  • deploying

  • monitoring

vetiver can help with this for your R and Python models!

Learn more