2019/03/23
The Probabilistic Programming Workflow
Last week, I gave a presentation about the concept of and intuition behind probabilistic programming and model-based machine learning in front of a general audience. You can read my extended notes here. Drawing on ideas from Winn and Bishop’s “Model-Based Machine Learning” and van de Meent et al.’s “An Introduction to Probabilistic Programming”, I try to show why the combination of a data-generating process with an abstracted inference is a powerful concept by walking through the example of a simple survival model.
2019/02/24
Problem Representations and Model-Based Machine Learning
Back in 2003, Paul Graham, of Viaweb and Y Combinator fame, published an article entitled “Better Bayesian Filtering”. I was scrolling chronologically through his essays archive the other day when this article stuck out to me (well, the “Bayesian” keyword). After reading the first few paragraphs, I was a little disappointed to realize the topic was Naive Bayes rather than Bayesian methods. But it turned out to be a tale of implementing a machine learning solution for a real world application before anyone dared to mention AI in the same sentence.
2018/07/25
SVD for a Low-Dimensional Embedding of Instacart Products
Building on the Instacart product recommendations based on Pointwise Mutual Information (PMI) in the previous article, we use Singular Value Decomposition to factorize the PMI matrix into a matrix of lower dimension (“embedding”). This allows us to identify groups of related products easily. We finished the previous article with a long table where every row measured how surprisingly often two products were bought together according to the Instacart Online Grocery Shopping dataset.
2018/06/17
Pointwise Mutual Information for Instacart Product Recommendations
Using pointwise mutual information, we create highly efficient “customers who bought this item also bought” style product recommendations for more than 8000 Instacart products. The method can be implemented in a few lines of SQL yet produces high quality product suggestions. Check them out in this Shiny app. Back in school, I was a big fan of the Detective Conan anime. For whatever reason, one of the episodes stuck with me.
2017/07/01
Pokémon Recommendation Engine
Using t-SNE, I wrote a Shiny app that recommends similar Pokémon. Try it out here. Needless to say, I was and still am a big fan of the Pokémon games. So I was very excited to see that a lot of the meta data used in Pokémon games is available on Github due to the Pokémon API project. Data on Pokémon’s names, types, moves, special abilities, strengths and weaknesses is all cleanly organized in a few dozen csv files.