Building an MLOps strategy from the ground up

Crunch 2022

Isabel Zimmerman, RStudio, PBC

September 20, 2022

from the ground up

Tip

go to isabel.quarto.pub/crunch2022/ to follow along!

  • what is MLOps anyway (and how can I start)? 🐶
  • what do I need to keep in mind for tooling? šŸ”Ø

if you develop models…

you can operationalize them

if you develop models…

you should operationalize them

well, some of them

information -> 🐶 -> actions

information -> model -> actions

What is MLOps?

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()
print(df)
     like_count  funny  show_product_quickly  patriotic  celebrity  danger  \
0        1233.0  False                 False      False      False   False   
1         485.0   True                  True      False       True    True   
2         129.0   True                 False      False      False    True   
3           2.0  False                  True      False      False   False   
4          20.0   True                  True      False      False    True   
..          ...    ...                   ...        ...        ...     ...   
241        10.0   True                 False       True       True   False   
243       572.0  False                  True       True      False   False   
244        14.0   True                 False      False       True    True   
245        12.0   True                 False      False      False    True   
246       334.0  False                 False      False       True   False   

     animals  
0      False  
1      False  
2       True  
3      False  
4       True  
..       ...  
241     True  
243     True  
244    False  
245    False  
246    False  

[225 rows x 7 columns]

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, pipeline, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, pipeline, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)

import pandas as pd
raw = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-02/youtube.csv')
df = raw[["like_count", "funny", "show_product_quickly", "patriotic", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, pipeline, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)

rf_pipe = pipeline.Pipeline([('ordinal_encoder',oe), ('random_forest', rf)])

MLOps is…

MLOps is… versioning

model

model_final

model_final_v2

model_final_v2_ACTUALLY

MLOps is… versioning

managing change in models

import pins

model_board = pins.board_temp(
    allow_pickle_read = True)

from vetiver import VetiverModel, vetiver_pin_write

v = VetiverModel(rf_pipe, "ads", ptype_data = X_train)

import pins

model_board = pins.board_temp(
    allow_pickle_read = True)

from vetiver import VetiverModel, vetiver_pin_write

v = VetiverModel(rf_pipe, "ads", ptype_data = X_train)

vetiver_pin_write(model_board, v)
library(vetiver)
library(pins)

model_board <- board_temp()

v <- vetiver_model(model, "ads")

model_board %>% 
  vetiver_pin_write(v)

import pins

model_board = pins.board_temp(
    allow_pickle_read = True)

from vetiver import VetiverModel, vetiver_pin_write

v = VetiverModel(rf_pipe, "ads", ptype_data = X_train)

vetiver_pin_write(model_board, v)

model_board.pin_meta("ads")
Meta(title='ads: a pinned Pipeline object', description="Scikit-learn <class 'sklearn.pipeline.Pipeline'> model", created='20221003T221304Z', pin_hash='612d4b523ca8c0ef', file='ads.joblib', file_size=432866, type='joblib', api_version=1, version=Version(created=datetime.datetime(2022, 10, 3, 22, 13, 4), hash='612d4'), name='ads', user={'ptype': '{"funny": true, "show_product_quickly": true, "patriotic": false, "celebrity": false, "danger": false, "animals": false}', 'required_pkgs': ['vetiver', 'scikit-learn']})

MLOps is… versioning

where are these boards hosted?

import pins

model_board = pins.board_temp(
    allow_pickle_read = True)

import pins

model_board = pins.board_local(
    allow_pickle_read = True)

import pins

model_board = pins.board_s3(
    allow_pickle_read = True)

import pins

model_board = pins.board_gcs(
    allow_pickle_read = True)

import pins

model_board = pins.board_azure(
    allow_pickle_read = True)

MLOps is… deploying

MLOps is… deploying

putting a model in production

MLOps is… deploying

putting a model in production somewhere that is not on your local laptop

MLOps is… deploying

putting a model in production somewhere that is not on your local laptop

MLOps is… deploying

putting a model in production somewhere that is not on your local laptop

āœ… using REST APIs

from vetiver import VetiverModel, VetiverAPI, vetiver_pin_write
v = VetiverModel(rf_pipe, "ads", ptype_data = X_train) 

VetiverAPI(v).run()

vetiver.deploy_rsconnect(
    connect_server = connect_server, 
    board = model_board, 
    pin_name = "ads", 
    version = "59869")

vetiver.write_app(board=board, pin_name="ads")

vetiver.write_app(board=board, pin_name="ads")
vetiver.write_docker(app_file="app.py")

MLOps is… monitoring

MLOps is… monitoring

MLOps is… monitoring

import vetiver
from sklearn import metrics
from datetime import timedelta

metric_set = [metrics.mean_absolute_error, metrics.mean_squared_error]

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    metric_set, 
    "like_count", 
    "preds"
    )

import vetiver
from sklearn import metrics
from datetime import timedelta

metric_set = [metrics.mean_absolute_error, metrics.mean_squared_error]

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    metric_set, 
    "like_count", 
    "preds"
    )

m = vetiver.plot_metrics(metrics)

import vetiver
from sklearn import metrics
from datetime import timedelta

metric_set = [metrics.mean_absolute_error, metrics.mean_squared_error]

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    metric_set, 
    "like_count", 
    "preds"
    )

m = vetiver.plot_metrics(metrics)
m.update_yaxes(matches=None)
m.show()

MLOps is… monitoring

when things go wrong:

  • retrain, retrain, retrain
  • try a new model type
  • remember to version!

MLOps is… thinking about making good models

MLOps is… thinking about making good models

vetiver.vetiver_pin_write(board, v)
Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()

MLOps is… thinking about making good models

vetiver.model_card()

MLOps is… thinking about making good models

From Mitchell et al. (2019):

Therefore the usefulness and accuracy of a model card relies on the integrity of the creator(s) of the card itself.

from the ground up

  • what is MLOps anyway (and how can I start)? 🐶 āœ…
  • what do I need to keep in mind for tooling? šŸ”Ø

Tooling tips…

Tooling tips…

you (and your team!) are unique!

Tooling tips…

  • composable
    • in different environments
    • with other tools

Tooling tips…

  • composable
  • reproducible
    • open source šŸ’–

Tooling tips…

  • composable
  • reproducible
  • ergonomic

Tooling tips…

  • composable
  • reproducible
  • ergonomic

Tooling tips…

  • composable
  • reproducible
  • ergonomic

(and able to do the MLOps tasks we want)

from the ground up

  • what is MLOps anyway (and how can I start)? 🐶 āœ…
  • what do I need to keep in mind for tooling? šŸ”Ø āœ…

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

  • versioning

  • deploying

  • monitoring

vetiver can help with this for your R and Python models!

Learn more