Friday, March 14, 2025

Journey to Full-Stack Knowledge Scientist: Mannequin Deployment | by Alex Davis | Jan, 2025


First, for our instance, we have to develop a mannequin. Since this text focuses on mannequin deployment, we is not going to fear in regards to the efficiency of the mannequin. As an alternative, we’ll construct a easy mannequin with restricted options to deal with studying mannequin deployment.

On this instance, we’ll predict a knowledge skilled’s wage primarily based on just a few options, reminiscent of expertise, job title, firm dimension, and so on.

See knowledge right here: https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries (CC0: Public Area). I barely modified the information to cut back the variety of choices for sure options.

#import packages for knowledge manipulation
import pandas as pd
import numpy as np

#import packages for machine studying
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder
from sklearn.metrics import mean_squared_error, r2_score

#import packages for knowledge administration
import joblib

First, let’s check out the information.

Picture by Creator

Since all of our options are categorical, we’ll use encoding to remodel our knowledge to numerical. Beneath, we use ordinal encoders to encode expertise stage and firm dimension. These are ordinal as a result of they symbolize some type of development (1 = entry stage, 2 = mid-level, and so on.).

For job title and employment kind, we’ll create a dummy variables for every choice (be aware we drop the primary to keep away from multicollinearity).

#use ordinal encoder to encode expertise stage
encoder = OrdinalEncoder(classes=[['EN', 'MI', 'SE', 'EX']])
salary_data['experience_level_encoded'] = encoder.fit_transform(salary_data[['experience_level']])

#use ordinal encoder to encode firm dimension
encoder = OrdinalEncoder(classes=[['S', 'M', 'L']])
salary_data['company_size_encoded'] = encoder.fit_transform(salary_data[['company_size']])

#encode employmeny kind and job title utilizing dummy columns
salary_data = pd.get_dummies(salary_data, columns = ['employment_type', 'job_title'], drop_first = True, dtype = int)

#drop authentic columns
salary_data = salary_data.drop(columns = ['experience_level', 'company_size'])

Now that we now have remodeled our mannequin inputs, we will create our coaching and check units. We’ll enter these options right into a easy linear regression mannequin to foretell the worker’s wage.

#outline unbiased and dependent options
X = salary_data.drop(columns = 'salary_in_usd')
y = salary_data['salary_in_usd']

#break up between coaching and testing units
X_train, X_test, y_train, y_test = train_test_split(
X, y, random_state = 104, test_size = 0.2, shuffle = True)

#match linear regression mannequin
regr = linear_model.LinearRegression()
regr.match(X_train, y_train)

#make predictions
y_pred = regr.predict(X_test)

#print the coefficients
print("Coefficients: n", regr.coef_)

#print the MSE
print("Imply squared error: %.2f" % mean_squared_error(y_test, y_pred))

#print the adjusted R2 worth
print("R2: %.2f" % r2_score(y_test, y_pred))

Let’s see how our mannequin did.

Picture by Creator

Seems like our R-squared is 0.27, yikes. Much more work would must be carried out with this mannequin. We’d seemingly want extra knowledge and extra data on the observations. However for the sake of this text, we’ll transfer ahead and save our mannequin.

#save mannequin utilizing joblib
joblib.dump(regr, 'lin_regress.sav')

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

PHP Code Snippets Powered By : XYZScripts.com