Machine Learning With
Python Workshop



When we talk about Data Science and the Data Science Pipeline, we are typically talking about the management of data flows for a specific purpose - the modeling of some hypothesis. The models that we construct can then be used in data products as an engine to create more data and actionable results.

Machine learning is the art of training some model by using existing data along with a statistical method to create a parametric representation of a model that fits the data. That’s kind of a mouthful, but what that essentially means is that a machine learning algorithm uses statistical processes to learn from examples, then applies what it has learned to future inputs to predict an outcome.

Machine learning can classically be summarized with two methodologies: supervised and unsupervised learning.

  • In supervised learning, the “correct answers” are annotated ahead of time and the algorithm tries to fit a decision space based on those answers.
  • In unsupervised learning, algorithms try to group like examples together, inferring similarities usually via distance metrics.

These learning types allow us to explore data and categorize them in a meaningful way, predicting where new data will fit into our models.

What You Will Learn

Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.

The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology. For more on Scikit-Learn see: Six Reasons why I recommend Scikit-Learn (O’Reilly Radar).

Course Outline

  • The workshop will cover the following topics:
  • An introduction to machine learning
  • Loading datasets into Scikit-Learn
  • Building models and model persistence
  • Feature extraction from data sets
  • Regressions
  • Classifiers
  • Clustering
  • Model selection and evaluation
  • Building a data pipeline


After this course you should understand the basics of machine learning and how to implement machine learning algorithms on your data sets using Python and Scikit-Learn. In particularly you should understand basic regressions, classifiers, and clustering algorithms and how to fit a model and use it to predict future outcomes.


You must be familiar with Python before participating in this course, and have familiarity with the command line. You must also have all software installed and ready for your particular operating system. Ensure that you perform the following tasks and are familiar with the concepts at the following links.

Instructor: Benjamin Bengfort


Benjamin Bengfort is a Data Scientist who lives inside the beltway but ignores politics (the normal business of DC) favoring technology instead. He is currently working to finish his PhD at the University of Maryland where he studies machine learning and distributed computing. His focus is on highly consistent local distributed storage and visual diagnostics for data modeling. The lab next door does have robots and, much to his chagrin, they seem to constantly arm said robots with knives and tools; presumably to pursue culinary accolades. Having seen a robot attempt to slice a tomato, Benjamin prefers his own adventures in the kitchen where he specializes in fusion French and Guyanese cuisine as well as BBQ of all types. A professional programmer by trade, a Data Scientist by vocation, Benjamin's writing pursues a diverse range of subjects from Natural Language Processing, to Data Science with Python to analytics with Hadoop and Spark.

Saturday, Feb 20th 2016 9am-5pm 

4601 Fairfax Drive
Arlington, VA 22203

(EXPIRES 2/6/2016)

Buy a course bundle and save!

Two Workshop Bundle - Save 25%


Bundle Price: $450
($225 per workshop)


Attend any two workshops and save 25% off the regular price!
Perfect for those looking to skill-up in a couple data science topics.


To purchase this bundle, go to our course bundle registration page.

Three Workshop Bundle - Save 33%


Bundle Price: $600
($200 per workshop)


Attend any three workshops and save 33% off the regular price!
Perfect for those who need a little more exposure to data science.


To purchase this bundle, go to our course bundle registration page.

Four Workshop Bundle - Save 42%


Bundle Price: $700
($175 per workshop)


Attend any four workshops and save 42% off the regular price!
Perfect for those looking to gain exposure to several topics.


To purchase this bundle, go to our course bundle registration page.