Confidence Intervals for Medians and Percentiles

Medians are better than means in most interpretation contexts: they’re not affected by skewed or otherwise non-normal distributions. They give a better sense of the “typical” data point. When the mean and median differ, I prefer to use the median.

One problem with using medians is that you can’t calculate a confidence interval for them the same way as you calculate one for a mean. There’s no “standard error of the median”. However, it turns out there is a way to calculate confidence intervals for them. Continue reading ‘Confidence Intervals for Medians and Percentiles’ »

Defining Cause

It rained today and I didn’t have an umbrella, so I got wet. Why did I get wet? What caused me to get wet?

Suppose I was in LA during a drought and it was a weird, one-off shower. You’d say that I got wet because it rained.

Suppose I was in Seattle during an especially rainy season and, uncharacteristically, I forgot my umbrella. You’d say that I got wet because I didn’t have an umbrella.
Continue reading ‘Defining Cause’ »

Rating podcast listening experiences using Time Scaled Values

Suppose we want to recommend podcast episodes to users. Instead of having users rate each episode, we want to infer from their listening/skipping behavior how much they liked each episode we offer to them.  What we want is some kind of rating value we can infer from their behavior that captures how positive an experience they had with our app.  In turn this helps us offer more of the kinds of things they enjoy.

How much is it worth for a user to listen to an episode? Clearly listening to more of an episode is better than listening to less of an episode (Assumption 1). Almost as clear is the idea that listening to all of a longer episode shows more engagement than listening to all of a shorter episode (Assumption 2). Continue reading ‘Rating podcast listening experiences using Time Scaled Values’ »

How often can Thomas Bayes check the results of his A/B test?

Stopping your A/B test once you reach significance is a great way to find bogus results…if you’re a frequentist.  Checking before you have the statistical power to detect the phenomenon will often lead to false positives if you rely on classical/frequentist methods.  A Bayesian with an informative null-result prior can avoid these problems.  Let’s think about why. Continue reading ‘How often can Thomas Bayes check the results of his A/B test?’ »

When Enough is Enough with your A/B Test

A good A/B test tool should be able to reach the following conclusions:

  1. A beat B or B beat A, so you can stop.
  2. Neither A nor B beat the other, so you can stop.
  3. We can’t conclude #1 or #2 but you’ll need about m more data points to conclude one of them.

The tools I’ve found for analyzing A/B tests can all answer #1.  Some of the better ones can answer #3.  None of the tools I’ve seen will answer #2 and tell you that A and B are not meaningfully different and that you have enough data to be pretty sure about that. Continue reading ‘When Enough is Enough with your A/B Test’ »

Make GitHub R Code Available within R

After gradually migrating most of my workflow from Subversion to GitHub I discovered an itty, bitty, tiny, huge freakin’ problem. Part of my old workflow involved me storing code I would use again and again in a public repository then source-ing the code directly into R as needed. It also made this code easy for me to share with others, especially students and collaborators. No problem.

GitHub is superior to Subversion in notable ways, but that’s not our topic here. GitHub does make it easy to read source code directly from the site as plain text. Here’s an example of an address for a bit of code I use almost daily to give me a clean R session. Continue reading ‘Make GitHub R Code Available within R’ »

Is Algebra Necessary? Yes and No.

Political scientist Andrew Hacker recently asked “Is Algebra Necessary?” and the response has, unfortunately, been predictable.

Those in society’s minority who did well in math courses are “shocked” at the suggestion that we change the typical math curriculum.  The teaching may be “dismal” but algebra is a “foundation stone” in developing critical thinking skills.  “It teaches one how to think.”  It’s a little amusing but mostly disheartening to see folks who claim to support more challenging math standards fall back on strawman arguments, condescension, sarcasm and, my favorite, math errors in their arguments.

Those in society’s majority who did poorly in math tended to respond with relief at the suggestion of dropping algebra, although there are a few PMSD (post-mathematics stress disorder) victims whose career paths were altered by failing math and who still carry the associated baggage and resentment.

Let’s set aside the hysterics (“We are breeding a nation of morons“) and give both sides of this debate a fair shake, shall we? Continue reading ‘Is Algebra Necessary? Yes and No.’ »

Review of “Building Data Science Teams” by DJ Patil

Having recently committed myself to earning my living as a Data Scientist, I’ve been reading anything I can find to guide my self-education. So I just spent the last hour reading and mulling over DJ Patil‘s article/report Building Data Science Teams (BDST henceforth) which is available free from various outlets; I read the Kindle version.  (Disclaimer: DJ is a friend and occasional drinking buddy.) Continue reading ‘Review of “Building Data Science Teams” by DJ Patil’ »

Planned Serendipity

Yesterday I got back from a great APSA in Seattle.  My undergraduate students were despondent at me having to cancel class Thursday so I could attend.  A few were curious about what happens at a scientific conference and asked about the structure.  I explained that there would be several thousand political scientists at this conference and that most of the planned interaction would take place in panels. Continue reading ‘Planned Serendipity’ »

Bayes fixes small n, doesn’t it?

What is a methods-careful practitioner to do when the number of observations (n) is small?  I don’t know how many times I’ve been told by a well-meaning Bayesian some variation of

Bayesian estimation addresses the “small n problem”

This is right and wrong. Continue reading ‘Bayes fixes small n, doesn’t it?’ »