I also tweet as @tslumley

*“The ethanol in alcohol is a group one carcinogen, like asbestos,” *Prof. Doug Sellman, Otago University (July 2013)

Professor Sellman is correct, of course. What’s more, alcohol is even an important cause of cancer. From the viewpoint of rhetoric and risk communication it’s still interesting to see how the effect of the sentence changes when other familiar IARC Group I carcinogens are substituted for ‘asbestos’

*alcohol is a group one carcinogen, like sunlight**alcohol is a group one carcinogen, like birth-control pills**alcohol is a group one carcinogen, like plutonium**alcohol is a group one carcinogen, like tobacco**alcohol is a group one carcinogen, like arsenic,**alcohol is a group one carcinogen, like wood dust*

None of these really has the quite same rhetorical impact; the only one that comes close is ’tobacco’.

Most of them aren’t scary enough: “Oh Noes! Beer is dangerous like sunshine!” 'Plutonium' and 'arsenic' are too scary: the (invalid) risk implication doesn't sound plausible. That's even though 'arsenic' in the IARC sense means low doses in drinking water, not the murder-mystery poison we tend to think of.

In New Zealand, ‘radiata’ and ‘macrocarpa’ are accepted common names for two widely planted non-native conifers: *Pinus radiata* and *Cupressus macrocarpa,*known in their native US as ‘Monterey pine’ and ‘Monterey cypress’ respectively.

It’s unusual for the specific epithet of a plant to become the common name. There are plenty of examples of the generic name becoming the common name, from ‘bougainvillea’ to ‘wisteria’. There are even plenty of examples where a former generic name has stuck as the common name after the botanists have renamed the plant to, eg, *Pelargonium*, *Hippeastrum*, or *Corymbia*.

I don’t think I’ve ever heard another example of the specific name working this way, and I only know of one other example. Henry Reed’s famous poem *Naming of Parts* mentions ‘japonica’, which turns out to be the flowering quince *Chaenomeles japonica,* and by extension, other species and hybrids of *Chaenomeles*.

Are there others?

The R help page for *sin*, *cos*, and *tan*, mentions functions *sinpi, cospi, tanpi*, “accurate for x which are multiples of a half.” This struck someone I know as strange. I’ve been thinking about this sort of thing recently while teaching Stat Computing, so here’s some background.

If you’re a mathematician, $\sin x$ is given by a power series

$$\sin x = x - \frac{x^3}{3!}+\frac{x^5}{5!} -\frac{x^7}{7!} +-\cdots$$

This series converges for all $x$, and so converges uniformly on any finite interval. In fact, it eventually converges faster than exponentially, since $n!\approx (n/e)^n$. At, say, $x=1$, 10 terms gives you 14 digits accuracy. Even better, it’s an alternating series, so once the terms start to decrease, the error in truncating the series is smaller than the first omitted term.

The largest number of terms we could use without getting clever or working with logarithms is 85: at 86 terms, the factorial overflows double-precision storage. According to the real-number maths, that’s still enough to get the error down to the 15th decimal place for $x=52,$ and logarithms are perfectly feasible. It turns out that something else goes wrong first.

Consider $x=20$. We know $\sin x$ is between -1 and 1, but the first few terms of the series are huge: $20^{17}/17!$ is more than 35 million. For the series to converge, there must be almost perfect cancellation between large positive and negative terms. That’s a recipe for massive inflation of rounding error when you’re doing computations to finite precision.

The following R function compares the power series to the built-in $\sin$ function (which uses the one in the C standard library)

culpa = function(x,N=85){ n = 1:N terms = (-1)^(n+1)*x^(2*n-1)/factorial(2*n-1) sum(terms)-sin(x) }

For $x=1$, or 2 or 3, the error is tiny, but it’s creeping up. By the time we get to $x=38$, the error is larger than 2, which counts as completely inaccurate for a function bounded by $\pm 1.$ At $x=38$, the last term used is about $2\times 10^{-18}$, and so by standard results on alternating series, the error should be smaller than that. The real-numbers error bound is wrong by **more than 18 orders of magnitude** when applied to computer numbers — and taking more terms will only make this worse.

So, how does* sin(x)* do it? The C standard, as is its habit, doesn’t specify, but the basic idea is to reduce $x$ modulo $2\pi$ to get the argument into $[-\pi,\pi]$, and then use either the Taylor series or an approximating polynomial or ratio of polynomials. This works well for moderate $x$, but you still find

> sin((10^10)*pi) [1] -2.239363e-06

In a sense, that’s unavoidable. We’ve only got just under 16 digits of precision to work with, so $10^{10}\pi$ is accurate only to six digits after the decimal point. You can’t do better.

Except, sometimes you can. If the formula you are trying to implement involves $\sin \pi x$ rather than $\sin x$, you don’t need to waste time and accuracy multiplying by $\pi$ and then reducing modulo $2\pi$. You can reduce modulo 1 instead, which is faster, easier, and more accurate. The obvious use case is when $x$ is measured in degrees or cycles. If $x$ is in degrees, you need to evaluate $\sin (2\pi x/360)$. It’s more accurate to use *sinpi(x/180)* than to use *sin(pi*x/180)*.

That’s why ISO/IEC TS 18661 proposed *sinpi* and its siblings for a new C standard, and why R supplies an interface and an implementation.

In fixed-effects meta-analysis of a set of trials the goal is to find a weighted average of the true treatment effects in those trials (whatever they might be). The results are summarised by the weighted average and a confidence interval reflecting its sampling uncertainty.

In random-effects meta-analysis the trials are modelled as an exchangeable sample, implying that they can be treated as coming independently from some latent distribution of true treatment effects. That’s attractive in some situations. What doesn’t make sense to me is summarising the results just by the mean of this latent distribution and a confidence interval for that mean.

That is, the model for individual study estimates $\Delta_i$ is

$$\Delta_i\sim N(\mu_i,\sigma^2_i)$$

$$\mu_i\sim N(\mu_0, \tau^2)$$

and we usually report a confidence interval for $\mu_0.$

If you take seriously the idea of modelling heterogeneity in the true treatment effect, a confidence interval for the mean isn’t enough. In order to make decisions you need a prediction interval for the the true treatment effect in a new population that might include you.

The difference between these intervals can be pretty large. Today, I saw a paper (open-access) in the new Nature *Scientific Reports* journal, a meta-analysis of observational studies of vitamin C and lung cancer. Their Table 3 presents study-specific estimates and a random-effects meta-analysis for the risk ratio per extra 100mg/day vitamin C.

The point estimate is 0.93 and the confidence interval is 0.88-0.98, but the $I^2$ heterogeneity statistic is 75%. That is, the heterogeneity in the estimates is about three times the sampling uncertainty. Putting the data into my rmeta package in R I can reproduce their output (apart from their summary $p$-value, which I think must be a typo), and I get an estimate of $\tau=0.23$.

Combining that with the mean, the simple heterogeneity model says that the true effect on the relative risk scale of an extra 100mg/day vitamin C varies enormously depending on context, with 95% limits from 0.58 to 1.47. The true effect is beneficial in 62% of trials and harmful in 48%. This is without adding in the sampling uncertainty, which would expand the bounds slightly for a true prediction interval.

If we take the heterogeneity model seriously, this meta-analysis is telling us we have almost no clue about the effect of vitamin C on lung cancer in a new population that wasn’t one of the studies that went into the analysis. Averaging over all populations, vitamin C is estimated to be slightly beneficial, but in **your** population we can’t tell. Since the data are all observational and are visibly inconsistent, that’s not terribly surprising, and is most likely due to different confounding patterns.

I think reporting suitable upper and lower quantiles of the latent treatment effect distribution in addition to a confidence interval for the mean would be an improvement for random-effects meta-analysis. In particular, it would help with the ‘how much is too much’ question about $I^2$, since a highly-heterogeneous set of studies would always end up with a wide treatment effect distribution.

It would be even better to report confidence intervals for the upper and lower quantiles, but that would take a little theoretical work, and the simple solution is probably good enough.

By and large, it isn’t cost-effective to bribe academics to say things they don’t believe.

Academics who are sufficiently well-known to be worth bribing tend to be *(ipso facto*) both moderately affluent, and willing to have given up chances for greater wealth in exchange for freedom to do basically whatever they are interested in. The sensible strategy is to find academics who already believe what you want said, and to buy them bigger megaphones. Over time, cognitive dissonance may warp their views to be even better aligned with yours, but that’s just a bonus.

Given that, it may be sensible to downweight the level of support for a position when it comes from sponsored research, but it is irrational to say an argument should be *dismissed* just because it comes from someone you view as a shill.

* "Statistics in the media: <sigh>". *Café Scientifique, Horse and Trap pub, Mount Eden, Auckland. August 27, evening.

Short course: “*Elements of R”*, aka the Ken and Thomas show, University of Lausanne, September 3-5.

Short course: *Complex sampling and epidemiologic designs.* University of Milan, September 9-10.

*"Two million t-tests: lessons from GWAS"*, Australasian Epidemiology Association, Auckland, October 9.

*"Measure Everything on Everyone? Substudies and Sampling in CVD **Cohorts” * The Kronmal Symposium: Lessons Learned from Cardiovascular Epidemiology. November 24, University of Washington, Seattle.

There’s a new version, 3.30-3, of the ‘survey’ package for R. It’s got quite a lot of new stuff:

- AIC and BIC for generalised linear models
- Rank tests for more than two groups
- Logrank and generalised logrank tests

Since I’m known for a lack of enthusiasm about any of these techniques, why are they in the package? Am I just enabling?

Well, AIC and BIC are interesting, and I’ll say more below. For all these techniques, though, the lack of versions that account for the sampling design hasn’t stopped people using them. They’re probably even worse off using versions that assume independent sampling than versions that handle weights and clustering. If we’re going to give the Kruskal-Wallis test the unmarked grave it so richly deserves, we clearly aren’t going to do it by restricting implementations. Instead, we need to teach people what it really does, and show them that isn’t what they want.

AIC turns out to be relatively straightforward: Jon Rao and Alastair Scott worked out the sampling distribution of the likelihood ratio back when I was learning how to solve simultaneous equations. The expected value of the log likelihood ratio when the smaller model fits as well as the larger model is the trace of a matrix. That matrix would be the identity under iid sampling and correctly-specified models, giving the $p$ penalty in AIC; under complex sampling the matrix isn’t the identity, but is easily estimated.

BIC is harder, because the likelihood can’t just be a convenience. However, if all the models you consider can be nested in some Big Ugly Model and obtained by setting specific parameters to specific values in that Big Ugly Model, we can use the (approximate) Gaussian likelihood of the parameter estimates in the Big Ugly Model. Each submodel also has a Gaussian likelihood, and we can work out the honest, traditional, entirely Bayesian BIC for those Gaussian models. The result is a penalised Wald statistic, with a penalty that is the number of parametris times a sort of effective log sample size

I should also note that the generalised logrank tests are the work of Kevin Rader and Stuart Lipsitz, who presented them at the Joint Statistical Meetings in San Diego two years ago. They were on my to-do list, but the Harvard folks got to them first, and I just implemented what they presented.

As a simple programming/probability exercise, I fitted a second-order Markov chain to *"Structure and Interpretation of Computer Programs"* and the complete works of H.P. Lovecraft. Adding the King James Bible tends to improve the output, but even in NZ that seems inappropriate for a class example.

Here’s some selected but unedited output: it always finishes at a full stop and starts on the word after one.

*Turing (1912-1954), whose 1936 paper laid the foundations of computing, but rather of a machine for each pair (i, j) that passes through the weed-choked fissure to that moment, I crouched on the floor in front of my strange new impulses seemed subtly connected with if and have to write programs as large-scale data-base-management systems.*

*This has important implications, both theoretical and practical, because we must distinguish different types as being defined in terms of opprobrium.*

*I reached for the special symbol that can be implemented as an element of dy, which will be the result of its midst, and before he could - he was unknown in Cairo, to know terror lives.*

*I turned again to the horror of its still greater was he thinking? To what extent could the expert test of the or expression, and we are given points a and b are equal? if they had a curiously fragmentary, intangible, and heterogeneous army of invasion.*

*Then, bracing himself, he added, had important business elsewhere, and which involved other French settlers of two cubes in two registers with those who inferred from the data base satisfying a specified location.*

*Unify-match is basically the right branch, a set of needed and modified by the generic arithmetic operations, we use assoc to see the rescue made by a sleeper is indeed the best course would involve many additional hazards.*

*Information Retrieval Logic programming excels in incidents and sensations of tense and heated discussion; and before the recursive definition which translates readily into the flat head and end with only general sense, memory, and the mockery of the defenders, and afterward the physician, unable to sing.*

*I could see that it signals an error if the evaluation rule for adding “greats” to a complex number, such as “If x is 1 and the nameless scent was now shaking with their wives testify the bewilderment which their far too common Knowledge to be delayed, we must ensure that our programs at a distance from the all-pervading and almost dreaded the exertion, for his family and Dr.*

*De Marigny recalled the old room of my youth about the matter of duty, though in reverse order.*

*Binger saw me and bade me mark that all the other hand, even toy parsers can be extended to include some germ-tainted crocodile meat as food for the body of a century.*

There’s been a bit of…discussion…about Richard Feynman recently. In one Twitter discussion, Richard Easther said he had been thinking of using Feynman’s commencement address "Cargo Cult Science" with a first-year physics class, and had decided against.

I was a bit surprised. It’s been a long time since I read that piece, but I couldn’t remember anything objectionable in it. So I re-read it. It’s still really good in a lot of ways. But. Yeah. That.

The problems are more obvious in Feynman’s book of autobiographical anecdotes *"Surely You’re Joking, Mr Feynman"*. I read that when it came out, and loved it. I was in high school at the time. It wasn’t that I loved the casual sexism; I just didn’t really notice. I was interested in the science bits of the book, and I didn’t care what he did in bars in Buffalo. It’s harder to re-read it now without wincing.

If you find yourself feeling defensive about Feynman, you might like to read Jo Walton on the Suck Fairy in literature. The Suck Fairy goes through books you used to love, and edits them to make them suck when you read them again. The whole point of the name, of course, is that this isn’t true; the book always did suck and you just didn’t notice when you were younger. The book you remember is still good, it just isn’t real.

The *"Surely You’re Joking.."* I remember, and its author and hero, were great. There’s no need to deny that. The problem is, they weren’t the actual book and the actual physicist. The actual person was a genius, but not really a role model.

The imaginary Feynman, the one who wrote the imaginary book I remember reading, would have wanted us to be honest about the faults of the actual Feynman.

The East-West Center at the University of Hawai’i has one of the most bizarre Internet Acceptable Use Policies I have ever seen. Among other things, it “strictly prohibits”

The distribution, dissemination, storage, copying and/or sale of materials protected by copyright, trademark, trade secret or other intellectual-property right laws.

That bans making this post, downloading the slides for the Summer Institute in Statistical Genetics that I want to revise, installing new R packages, and probably even reading email (unless it was sent by someone who is a US government employee as part of their work). You might think there’s an implied “except with permission of copyright owner”, except that in lots of other places in the policy they make that sort of exception explicit. Also, the policy goes on to say that this includes, but is not limited to, illegal copying and distribution.

More *practically* annoying is the fact that they block ssh and scp, so I can’t upload the files for my course tomorrow. Maybe I’ll go for a walk and see if I can get an eduroam connection.