Home on Minimize Regret
/
Recent content in Home on Minimize RegretHugo -- gohugo.ioen-usSun, 14 Jun 2020 00:00:00 +0000Embedding Many Time Series via Recurrence Plots
/post/2020/06/14/embedding-many-time-series-via-recurrence-plots/
Sun, 14 Jun 2020 00:00:00 +0000/post/2020/06/14/embedding-many-time-series-via-recurrence-plots/We demonstrate how recurrence plots can be used to embed a large set of time series via UMAP and HDBSCAN to quickly identify groups of series with unique characteristics such as seasonality or outliers. The approach supports exploratory analysis of time series via visualization that scales poorly when combined with large sets of related time series. We show how it works using a Walmart dataset of sales and a Citi Bike dataset of bike rides.Rediscovering Bayesian Structural Time Series
/post/2020/06/07/rediscovering-bayesian-structural-time-series/
Sun, 07 Jun 2020 00:00:00 +0000/post/2020/06/07/rediscovering-bayesian-structural-time-series/This article derives the Local-Linear Trend specification of the Bayesian Structural Time Series model family from scratch, implements it in Stan and visualizes its components via tidybayes. To provide context, links to GAMs and the prophet package are highlighted. The code is available here.
I tried to come up with a simple way to detect “outliers” in time series. Nothing special, no anomaly detection via variational auto-encoders, just finding values of low probability in a univariate time series.Are You Sure This Embedding Is Good Enough?
/post/2020/04/26/are-you-sure-this-embedding-is-good-enough/
Sun, 26 Apr 2020 00:00:00 +0000/post/2020/04/26/are-you-sure-this-embedding-is-good-enough/Suppose you are given a data set of five images to train on, and then have to classify new images with your trained model. Five training samples are in general not sufficient to train a state-of-the-art image classification model, thus this problem is hard and earned it’s own name: few-shot image classification. A lot has been written on few-shot image classification and complex approaches have been suggested.1 Tian et al. (2020), however, suggest it suffices to take an embedding from a different image classification task, extract features from the five images via the embedding, and to train the few-shot model on the embedded version of the five images.The Causal Effect of New Year's Resolutions
/post/2020/01/18/the-causal-effect-of-new-years-resolutions/
Sat, 18 Jan 2020 00:00:00 +0000/post/2020/01/18/the-causal-effect-of-new-years-resolutions/We treat the turn of the year as an intervention to infer the causal effect of New Year’s resolutions on McFit’s Google Trend index. By comparing the observed values from the treatment period against predicted values from a counterfactual model, we are able to derive the overall lift induced by the intervention.
Throughout the year, people’s interest in a McFit gym membership appears quite stable.1 The following graph shows the Google Trend for the search term “McFit” in Germany for April 2017 to until the week of December 17, 2017.satRday Berlin Presentation
/post/2019/06/16/satrday-berlin-presentation/
Sun, 16 Jun 2019 00:00:00 +0000/post/2019/06/16/satrday-berlin-presentation/My satRday Berlin slides on “Modeling Short Time Series” are available here.
This saturday, June 15, Berlin had its first satRday conference. I eagerly followed the hashtags of satRday Amsterdam last year and satRday Capetown the year before that on Twitter. Thanks to Noa Tamir, Jakob Graff, Steve Cunningham, and many others, we got a conference in Berlin as well.
When I saw the call for papers, I jumped at the opportunity to present, trying what it feels like to be on the other side of the microphone; being in the hashtag instead of following it.Modeling Short Time Series with Prior Knowledge
/post/2019/04/16/modeling-short-time-series-with-prior-knowledge/
Tue, 16 Apr 2019 00:00:00 +0000/post/2019/04/16/modeling-short-time-series-with-prior-knowledge/I just published a longer case study, Modeling Short Time Series with Prior Knowledge: What ‘Including Prior Information’ really looks like.
It is generally difficult to model time series when there is insuffient data to model a (suspected) long seasonality. We show how this difficulty can be overcome by learning a seasonality on a different, long related time series and transferring the posterior as a prior distribution to the model of the short time series.The Probabilistic Programming Workflow
/post/2019/03/23/the-probabilistic-programming-workflow/
Sat, 23 Mar 2019 00:00:00 +0000/post/2019/03/23/the-probabilistic-programming-workflow/Last week, I gave a presentation about the concept of and intuition behind probabilistic programming and model-based machine learning in front of a general audience. You can read my extended notes here.
Drawing on ideas from Winn and Bishop’s “Model-Based Machine Learning” and van de Meent et al.’s “An Introduction to Probabilistic Programming”, I try to show why the combination of a data-generating process with an abstracted inference is a powerful concept by walking through the example of a simple survival model.Problem Representations and Model-Based Machine Learning
/post/2019/02/24/problem-representations-and-model-based-machine-learning/
Sun, 24 Feb 2019 00:00:00 +0000/post/2019/02/24/problem-representations-and-model-based-machine-learning/Back in 2003, Paul Graham, of Viaweb and Y Combinator fame, published an article entitled “Better Bayesian Filtering”. I was scrolling chronologically through his essays archive the other day when this article stuck out to me (well, the “Bayesian” keyword). After reading the first few paragraphs, I was a little disappointed to realize the topic was Naive Bayes rather than Bayesian methods. But it turned out to be a tale of implementing a machine learning solution for a real world application before anyone dared to mention AI in the same sentence.Videos from PROBPROG 2018 Conference
/note/2018/11/11/probprog-videos/
Sun, 11 Nov 2018 00:00:00 +0000/note/2018/11/11/probprog-videos/Videos of the talks given at the International Conference on Probabilistic Programming (PROBPROG 2018) back in October were published a few days ago and are now available on Youtube. I have not watched all presentations yet, but a lot of big names attended the conference so there should be something for everyone. In particular the talks by Brooks Paige (“Semi-Interpretable Probabilistic Models”) and Michael Tingley (“Probabilistic Programming at Facebook”) made me curious to explore their topics more.Videos from Exploration in RL Workshop at ICML
/note/2018/09/30/videos-from-exploration-in-rl-icml-workshop/
Sun, 30 Sep 2018 00:00:00 +0000/note/2018/09/30/videos-from-exploration-in-rl-icml-workshop/One of the many fantastic workshops at ICML this year was the Exploration in Reinforcement Learning workshop. All talks were recorded and are now available on Youtube. Highlights include presentations by Ian Osband, Emma Brunskill, and Csaba Szepesvari, among others. You can find the workshop’s homepage here with more information and the accepted papers.SVD for a Low-Dimensional Embedding of Instacart Products
/post/2018/07/25/svd-instacart-product-embedding/
Wed, 25 Jul 2018 00:00:00 +0000/post/2018/07/25/svd-instacart-product-embedding/Building on the Instacart product recommendations based on Pointwise Mutual Information (PMI) in the previous article, we use Singular Value Decomposition to factorize the PMI matrix into a matrix of lower dimension (“embedding”). This allows us to identify groups of related products easily.
We finished the previous article with a long table where every row measured how surprisingly often two products were bought together according to the Instacart Online Grocery Shopping dataset.Pointwise Mutual Information for Instacart Product Recommendations
/post/2018/06/17/instacart-products-bought-together/
Sun, 17 Jun 2018 00:00:00 +0000/post/2018/06/17/instacart-products-bought-together/Using pointwise mutual information, we create highly efficient “customers who bought this item also bought” style product recommendations for more than 8000 Instacart products. The method can be implemented in a few lines of SQL yet produces high quality product suggestions. Check them out in this Shiny app.
Back in school, I was a big fan of the Detective Conan anime. For whatever reason, one of the episodes stuck with me.Understanding the Negative Binomial Distribution
/post/2018/01/04/understanding-the-negative-binomial-distribution/
Thu, 04 Jan 2018 00:00:00 +0000/post/2018/01/04/understanding-the-negative-binomial-distribution/If you’ve ever encountered count data, chances are you’re familiar with the Poisson distribution. The Poisson distribution models the probability with which an event occurs a certain number of times within a fixed time period. For example, count how often a book is sold on Amazon on a given day. Then the Poisson can describe the probability with which the book is sold at least two times. Furthermore, the book might sell 5 times on some days; but it is never sold -3 times or 0.Detection of Abnormal Zero-Sequences in Count Time Series
/post/2017/10/22/detection-abnormal-zero-sequences-count-time-series/
Sun, 22 Oct 2017 00:00:00 +0000/post/2017/10/22/detection-abnormal-zero-sequences-count-time-series/This post introduces a simple method to detect out-of-stock periods in sales time series by computing the probability of such sequences in Poisson random samples.
I recently forecasted sales of hundreds of different products. In contrast to other kinds of time series, sales might move close to zero for any given product if the product isn’t purchased daily. As a result, products have non-negative time series that regularly feature observations with zero sales.Pokémon Recommendation Engine
/post/2017/07/01/pok%C3%A9mon-recommendation-engine/
Sat, 01 Jul 2017 00:00:00 +0000/post/2017/07/01/pok%C3%A9mon-recommendation-engine/Using t-SNE, I wrote a Shiny app that recommends similar Pokémon. Try it out here.
Needless to say, I was and still am a big fan of the Pokémon games. So I was very excited to see that a lot of the meta data used in Pokémon games is available on Github due to the Pokémon API project. Data on Pokémon’s names, types, moves, special abilities, strengths and weaknesses is all cleanly organized in a few dozen csv files.Cost Sensitive Learning with XGBoost
/post/2017/04/14/cost-sensitive-xgboost/
Fri, 14 Apr 2017 00:00:00 +0000/post/2017/04/14/cost-sensitive-xgboost/In a course at university, the professor proposed a challenge: Given customer data from an ecommerce company, we were tasked to predict which customers would return for another purchase on their own (and should not be incentivized additionally through a coupon). This binary prediction problem was posed with an asymmetric cost matrix, the argument being that false negatives are more costly than false positives.
Here, a false negative implies that the company sends a coupon to someone who would have returned anyway.Look At All These Links
/post/2017/01/25/look-at-all-these-links/
Wed, 25 Jan 2017 00:00:00 +0000/post/2017/01/25/look-at-all-these-links/By now, some time has passed since NIPS 2016. Consequently, several recaps can be found on blogs. One of them is this one by Eric Jang.
If you want to make your first steps in putting some of the theory presented at NIPS into practice, why not take a look at this slide deck about reinforcement learning in R?
The RStudio Conference also took place, and apparently has been a blast.Multi-Armed Bandits at Tinder
/note/2016/12/14/multi-armed-bandits-at-tinder/
Wed, 14 Dec 2016 00:00:00 +0000/note/2016/12/14/multi-armed-bandits-at-tinder/In a post on Tinder’s tech blog, Mike Hall presents a new application for multi-armed bandits. At Tinder, they started to use multi-armed bandits to optimize the photo of users that is shown first: While a user can have multiple photos in his profile, only one of them is shown first when another user swipes through the deck of user profiles. By employing an adapted epsilon-greedy algorithm, Tinder optimizes this photo for the “Swipe-Right-Rate”.Look At All These Links
/post/2016/11/06/look-at-all-these-links/
Sun, 06 Nov 2016 00:00:00 +0000/post/2016/11/06/look-at-all-these-links/At Airbnb, the data science team has written their own R packages to scale with the company’s growth. The most basic achievement of the packages is the standardization of the work (ggplot and RMarkdown templates) and reduction of duplicate effort (importing data). New employees are introduced to the infrastructure with extensive workshops.
This reminded me of a presentation by Hilary Parker in April at the New York R Conference on Scaling Analysis Responsibly.Setting Hyperparameters of a Beta Distribution
/post/2016/10/01/setting-hyperparameters-of-a-beta-distribution/
Sat, 01 Oct 2016 00:00:00 +0000/post/2016/10/01/setting-hyperparameters-of-a-beta-distribution/Suppose you’re implementing Bayesian A/B testing in your company. To encourage your colleagues to use sensible prior distributions in testing, you would like to make it easy for them to choose the parameters of the Beta prior. Sure, they could play around with the parameters until the curve looks fine; but there is a better way.
When setting the prior, one would like the prior to cover a certain interval with some probability (\(0.Three Types of Cluster Reproducibility
/post/2016/06/14/three-types-of-cluster-reproducibility/
Tue, 14 Jun 2016 00:00:00 +0000/post/2016/06/14/three-types-of-cluster-reproducibility/Christian Hennig provides a function called clusterboot() in his R package fpc which I mentioned before when talking about assessing the quality of a clustering. The function runs the same cluster algorithm on several bootstrapped samples of the data to make sure that clusters are reproduced in different samples; it validates the cluster stability.
In a similar vein, the reproducibility of clusterings with subsequent use for marketing segmentation is discussed in this paper by Dolnicar and Leisch.Assessing the Quality of a Clustering Solution
/post/2016/05/30/assessing-the-quality-of-a-clustering-solution/
Mon, 30 May 2016 00:00:00 +0000/post/2016/05/30/assessing-the-quality-of-a-clustering-solution/During one of the talks at PyData Berlin, a presenter quickly mentioned a k-means clustering used to group similar clothing brands. She commented that it wasn’t perfect, but good enough and the result you would expect from a k-means clustering.
There remains the question, however, how one can assess whether a clustering is “good enough”. In above case, the number of brands is rather small, and simply by looking at the groups one is able to assess whether the combination of Tommy Hilfiger and Marc O’Polo is sensible.Taxi Pulse of New York City
/post/2015/09/21/taxi-pulse-of-new-york-city/
Mon, 21 Sep 2015 00:00:00 +0000/post/2015/09/21/taxi-pulse-of-new-york-city/I don’t know about you, but I think taxi data is fascinating. There is a lot you can do with the data sets as they usually contain observations on geolocation as well as time stamps besides other information, which makes them unique. Geolocation and timestamps alone, as well as the large number of observations in cities like New York enable you to create stunning visualizations that aren’t possible with any other set of data.Analyzing Taxi Data to Create a Map of New York City
/post/2015/08/26/analyzing-taxi-data-to-create-a-map-of-new-york-city/
Wed, 26 Aug 2015 00:00:00 +0000/post/2015/08/26/analyzing-taxi-data-to-create-a-map-of-new-york-city/Yet another day was spent working on the taxi data provided by the NYC Taxi and Limousine Commission (TLC).
My goal in working with the data was to create a plot that maps the streets of New York using the geolocation data that is provided for the taxis’ pickup and dropoff locations as longitude and latitude values. So far, I had only used the dataset for January of 2015 to plot the locations; also, I hadn’t used the more than 12 million observations in January alone but a smaller sample (100000 to 500000 observations).
/probprog/
Mon, 01 Jan 0001 00:00:00 +0000/probprog/The Probabilistic Programming Workflow code{white-space: pre;} if (window.hljs) { hljs.configure({languages: []}); hljs.initHighlightingOnLoad(); if (document.readyState && document.readyState === "complete") { window.setTimeout(function() { hljs.initHighlighting(); }, 0); } } The Probabilistic Programming Workflow Intuition and Essentials of Model-Based Machine Learning Tim Radtke 2019-03-21 Last week, I gave a presentation about the concept of and intuition behind probabilistic programming and model-based machine learning in front of a general audience. The following are my extended notes.
/short-time-series-prior-knowledge/
Mon, 01 Jan 0001 00:00:00 +0000/short-time-series-prior-knowledge/Modeling Short Time Series with Prior Knowledge code{white-space: pre;} if (window.hljs) { hljs.configure({languages: []}); hljs.initHighlightingOnLoad(); if (document.readyState && document.readyState === "complete") { window.setTimeout(function() { hljs.initHighlighting(); }, 0); } } Modeling Short Time Series with Prior Knowledge What ‘Including Prior Information’ really looks like. Tim Radtke 2019-04-16 It is generally difficult to model time series when there is insuffient data to model a (suspected) long seasonality. Here, we show how this difficulty can be overcome by learning a seasonality on a different, long related time series and transferring the posterior as a prior distribution to the model of the short time series.About Minimize Regret
/about/
Mon, 01 Jan 0001 00:00:00 +0000/about/Hi, my name is Tim. You’re reading Minimize Regret, the site where I write about things I’ve recently learned.
I live in Berlin, where I work as a data scientist. I’m interested in quantifying uncertainty, and optimal decision making under uncertainty. Fittingly, topics such as probabilistic programming, reinforcement learning and stochastic optimal control, as well as time series theory and applications are near and dear to my heart.
I recently finished my master’s in statistics and graduated with honors.