Off To See the Wizard:

Using Yellowbrick for Machine Learning Model Visualization

Patrick Anastasio
5 min readJan 21, 2022

Yellowbrick is a diagnostic visualization platform for machine learning that allows data scientists to steer the model selection process.

Model Selection Through Visualization

Model selection is the single most important decision in the machine learning process. Selecting the proper algorithm to implement is dependent on the specific business use case and the questions that need to be answered. Even given these, there are still many algorithms that may fit the bill. The model selection process is one of multiple iterations of feature engineering, tuning hyper-parameters, and deciding what performance metrics will be used to compare algorithms. This used to be an extremely time consuming and resource intensive process, however, with modern computing power and the vast universe of packages and libraries available, efficient model deployment has become almost a trivial process. Yellowbrick aims to simplify the process even further by transforming this process into a visual one.

“Yellowbrick is a response to the call for open source visual steering tools. For data scientists, Yellowbrick helps evaluate the stability and predictive value of machine learning models and improves the speed of the experimental workflow. For data engineers, Yellowbrick provides visual tools for monitoring model performance in real world applications. For users of models, Yellowbrick provides visual interpretation of the behavior of the model in high dimensional feature space. Finally, for students, Yellowbrick is a framework for understanding a large variety of algorithms and methods. — [1] Bengfort et al.,

Under the Hood

Yellowbrick has two primary dependencies: scikit-learn and matplotlib.

The primary interface is the Visualizer object, and the workflow is very similar to using a scikit-learn. A visualizer is an object that learns from data to produce a visualization. Visualizers can also wrap scikit-learn models for evaluation, hyperparameter tuning and algorithm selection.

# import your dependencies
from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import ClassificationReport
# instantiate your model
model = LogisticRegression()
# wrap the visualizer
visualizer = ClassificationReport(model)
# learn, score, and visualize it
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()

Yellowbrick creates effective visualizations by wrapping the matplotlib API. Familiarity with matplotlib is not required to use Yellowbrick, however it is strongly recommended and, to be honest, is expected to fully implement all that Yellowbrick has to offer. “[To] customize figures or roll your own visualizers, a strong background in using matplotlib is required.” [2] Yellowbrick even provides a matplotlib tutorial on its website.

You can see how coding a customized matplotlib visualization can quickly get out of hand. The above is only a fraction of the available customization options.

Admittedly, matplotlib can get extremely hairy when customizing your visualizations, and I tend to take a “keep it simple” approach when doing my own work. However, when presenting results to stakeholders and publishing to the world, a high degree of customization is required for aesthetic appeal and interpretability. While this high level of customization is available with Yellowbrick visualizations, they offer a simplified customization process through what they call “oneliners.” These quick methods are “visualizers in a single line of code… which return a fully fitted, finalized visualizer object in only a single line.”… Fitted too? Count me in! No axis labeling, or tick setting, etc… It’s all done for you.

There are oneliners for most common machine learning processes and metrics. There are oneliners for feature analysis, classification, regression, clustering, and target evaluation, among others. “Nearly every Yellowbrick visualizer has an associated [oneliner]!”

For example, let’s look at a simple oneliner for showing the Pearson correlation coefficients for a multivariable dataset. Pearson scores are used to compare one variable with another and determine the collinearity of the variables. The Yellowbrick oneliner for this is rank2d.

# import the relevant packages/libraries:
import pandas as pd
from yellowbrick.features import rank2D
# create your dataframe
df = pd.read_csv('some_dataset.csv')
# execute the oneliner
visualizer = rank2d(df)
The resulting visualization would look something like this.

The full gamut of matplotlib customizations are at your disposal and can easily be implemented into the oneliner. For instance, if you did not like the color scheme you could change the “colormap” parameter, or if you’d like the visualization to score the multicollinearity between variables you could set the “algorithm” parameter to “covariance”:

visualizer = rank2d(df, colormap='tab20c', algorithm='covariance')

My Favorite Oneliners (theres too many)

I’ve always been a proponent of working smarter, not harder. Shorter code, and less processing of data equals more efficiency… and time. Thats what I love about Yellowbrick. Fewer keystrokes keep me more engaged in the work.

Classification Oneliners:

classification_report visualizes precision, recall, and F1 score.

from yellowbrick.classifier import classification_report

visualizer = classification_report(<model>, X, y)

confusion_matrix is a visual description of per-class decision making.

from yellowbrick.classifier import confusion_matrix

visualizer = confusion_matrix(<model>, X, y)

roc_auc shows the receiver operator characteristics (ROC) and area under the curve (AUC).

from yellowbrick.classifier import roc_aucvisualizer = roc_auc(<model>, X, y)

discrimination_threshold can help find the threshold that best separates binary classes.

from yellowbrick.classifier import discrimination_thresholdvisualizer = discrimination_threshold(<model>, X, y)

Regression Oneliners:

residuals_plot shows the difference in residuals between the training and test splits, as well as their corresponding R² scores and distributions.

from yellowbrick.regressor import residuals_plotvisualizer = residuals_plot(<model>, X_train, y_train, X_test,                
y_test, train_color=”black”, test_color=”gold”)

So Many More

If you are interested in incorporating Yellowbrick into your work I implore you to visit the Visualizers and API documentation page and explore all that this amazing resource has to offer and how you can incorporate it into your projects.

Open Source Contributions

Yellowbrick is an open source project and they gladly welcome any contributions. So if you are up for the task you can go here.

The goal of Yellowbrick development is the addition and creation of visualizers that integrate with scikit-learn estimators, transformers, and pipelines. However, there are several other ways you can contribute to the project.

--

--