At Airbnb, the data science team has written their own R packages to scale with the company’s growth. The most basic achievement of the packages is the standardization of the work (ggplot and RMarkdown templates) and reduction of duplicate effort (importing data). New employees are introduced to the infrastructure with extensive workshops.
This reminded me of a presentation by Hilary Parker in April at the New York R Conference on Scaling Analysis Responsibly.
Speaking of education within a company, here is Patrick Riley’s most popular advice within Google on the analysis of large, complex data sets:
To answer those questions, I put together a document shared Google-wide which I optimistically and simply titled “Good Data Analysis.” To my surprise, this document has been read more than anything else I’ve done at Google over the last eleven years. Even four years after the last major update, I find that there are multiple Googlers with the document open any time I check.
Ten ways your data project is going to fail.
Given the Twitter thread by John Myles White that I linked to earlier this week, in which he recommends using “aggressive confidence intervals with nominal coverage that’s >= 99.9%”, this short post by Andrew Gelman on 50% uncertainty intervals is even better than it is on its own. Assuming that many humans think of a toin coss whenever they’re talking about uncertainty/chance, this might be the way to go.