Our review

This skill enables designing, training, and deploying machine learning models to solve predictive problems.

Strengths

Covers the entire ML pipeline: data prep, modeling, evaluation, and deployment.
Recommends appropriate algorithms based on data characteristics.
Emphasizes robust practices like cross-validation and hyperparameter tuning.

Limitations

Does not cover advanced deep learning architectures.
Assumes tabular data in CSV format.
Lacks guidance on data collection or advanced feature engineering.

When to use it

Use this skill when you need to build a classification or regression model from a structured tabular dataset.

When not to use it

Do not use it for NLP or computer vision tasks that require complex deep neural networks.

Examples

Build a churn prediction model

I have a CSV file with customer data including age, salary, gender, city, and a churn column. Build a machine learning model to predict churn. Include data preprocessing (handle missing values, encode categoricals, scale numeric), train a random forest classifier, and evaluate with precision/recall.

Create a data preprocessing pipeline

Create a reusable data preprocessing pipeline for a dataset with numeric features (age, salary) and categorical features (gender, city). Use sklearn's ColumnTransformer, impute missing values, and scale numeric features. Then split the data into train/test.

name: ml-engineer description: Use this for building machine learning models, feature engineering, training pipelines, and integrating predictions into applications.

Machine Learning Engineer

You design, train, and deploy machine learning models to solve predictive problems.

When to use

"Build a model to predict..."
"Preprocess this data for ML."
"Train a classification/regression model."
"Evaluate model performance."

Instructions

Data Prep:
- Handle categorical variables (One-Hot Encoding, Label Encoding).
- Normalize/scale numerical features (StandardScaler, MinMaxScaler).
- Split data into Training, Validation, and Test sets.
Model Selection:
- Choose appropriate algorithms (e.g., Random Forest, XGBoost, Neural Networks) based on data size and problem type.
- Start simple before moving to complex models.
Training & Tuning:
- Use cross-validation to ensure robustness.
- Tune hyperparameters (GridSearch, RandomSearch) to optimize metrics.
Evaluation:
- Use correct metrics: Accuracy, Precision/Recall, F1-Score, RMSE, ROC-AUC.
- Analyze confusion matrices to understand error types.
Deployment:
- Export models to standard formats (ONNX, Pickle, SavedModel).
- Provide code snippets for loading and running inference.

Examples

1. Data Preprocessing Pipleine

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

# Load data
df = pd.read_csv('data.csv')
X = df.drop('target', axis=1)
y = df['target']

# Define preprocessors
numeric_features = ['age', 'salary']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_features = ['gender', 'city']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

2. Training and Evaluation

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Create pipeline
clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', RandomForestClassifier(n_estimators=100, random_state=42))])

# Train
clf.fit(X_train, y_train)

# Predict
y_pred = clf.predict(X_test)

# Report
print(classification_report(y_test, y_pred))

Machine Learning Engineer

Recommended for

Our review

Strengths

Limitations

Security analysis

Examples

name: ml-engineer description: Use this for building machine learning models, feature engineering, training pipelines, and integrating predictions into applications.

Machine Learning Engineer

When to use

Instructions

Examples

1. Data Preprocessing Pipleine

2. Training and Evaluation

Prompt Engineering

Data Visualization

RAG Architecture Setup