Illegible Science

Posted on by Owen Lynch

Epistemic status: question, not answer

Giving Up on Understanding the Natural World

Ever since Archimedes, science has been about increasing human understanding of the natural world. For this reason, scientific models have by necessity simple enough for humans to understand. However, there is no rule of nature that says that good scientific models must be simple. The traditional response to this in the sciences has either been to ignore the complexity, and focus on simple laboratory models that we hope point in the right direction, or to just give up and leave the problem to less rigorous disciplines.

However, what if we just gave up on understanding models? Scientists look for relationships between variables, and typically do this by measuring correlation, or descending in to fancier statistics. The traditional scientific process is to form a testable hypothesis, make an experiment, collect data, and see if the hypothesis is rejected or not rejected. The point of generating a hypothesis beforehand is so that you don’t just make something up that fits the data: it’s very easy to rationalize something that you’ve already seen, and very difficult to predict something in advance. In short, we require the hypothesis to be generated by a human in advance as a check on the “contrivedness” of the hypothesis.

The problem with this is the first step. If the data will only be explained by a very complex model, it becomes near-impossible to get a correct hypothesis in advance. What we need is a way of measuring “contrivedness” that can be applied to models that humans don’t understand. Luckily, people have been working on this for a long time in the field of machine learning. There are all sorts of measures of “model complexity” in machine learning, and generally machine learning works by trying to find models for data that fit well but have minimal complexity.

So what if we were to start writing scientific papers that instead of finding correlations in data, found black-box models that fit the data, that could be downloaded and ran. We would no longer have a direct way of knowing whether these models were “reasonable” or not, we would have to blindly trust measures of model complexity to know when we had found a “general principle”.

This would essentially be a redefinition of what we count as human knowledge. Just as nowadays we use Google and Wikipedia as extensions of our brain, we would also have a host of specialized oracles that we could ask about complex programs. Instead of an encyclopedia, a biologist might use a database of trained neural nets, and if she wants to know how a plant will grow in response to specific conditions, she might look up the neural net that someone else has carefully trained on lots of data and ask it.

What I’m proposing here is not really a shift in methodology, though, because use of computers to process data is nothing new. Really this is a shift in philosophy of science. I assert that the core of science is uncovering the hidden order in the world. I don’t think that the hidden order in the world is necessarily human-readable, and we may have to accept that in order to progress, we have to deal with things that our puny monkey brains can’t come close to comprehending.

This website supports webmentions, a standard for collating reactions across many platforms. If you have a blog that supports sending webmentions and you put a link to this post, your blog post will show up as a response here. You can also respond via twitter or respond via mastodon (on your preferred mastodon server); through the magic of all tweets or toots with links to this post will show up below (subject to moderation).

Site proudly generated by Hakyll with stylistic inspiration from Tufte CSS