Sage Commons Congress 2012

I was at the Sage Commons Congress the last few days. Meetings should be full of challenging new ideas and full of spontaneous discussion. I’ve been to a lot of scientific meetings where both those things are absent (shockingly, actually) or where the meeting brings little beyond what could be learned from the literature. This congress was very interesting, driven by the passion of those people taking part to do science in new ways.

Lobby of the Hyatt Regency from the 15th floor

Sage has a mission to understand disease. I had thought that the plan was to assemble and share very large amounts of scientific data. That’s essentially true, but I realise I missed two things. (Many of the talks can be found here – I won’t link to individual talks because the non-skippable sponsor message is too annoying each time.)

1) The word “disease” is becoming a little redundant. The philosophy of the group at this congress, but I’m not sure how far beyond, is that the idea of a single “disease” as a static thing that affects everyone equally, is of limited usefulness. People respond to therapy differently. If you accept this then what’s left is not a disease but a “disease-patient-medicine interaction” – a dataset that includes the patient’s biota (DNA, age, habits etc) and also more nebulous factors (how are they feeling when they’re being given therapy).

Given that patients are volunteering to contribute their data in ever larger numbers, it is important to use such data to understand whether a therapy works and whether a therapy is likely to work for a new patient.

If more patients are sharing their data, then it’s important we are on a firm legal grounding. Synapse, the Sage system demoed at the congress for collaborative data sharing, requires patients thinking of becoming involved to give clear consent for the use of their data, and John Wilbanks described a system he’d been working on called “Portable Legal Consent”.

A related idea is that patients should be involved in research – that researchers should be listening to what patients have to say about what is needed for their disease. From my perspective, as someone who works with early-stage drug discovery, this seemed rather an alien idea. It’s also interesting that the diseases discussed in this way were almost all developed-nation diseases. In practical terms, for malaria and NTDs, patient consultation comes second to a broad morbidity-reducing treatment.


2) A persistent theme at the congress was that we know little about disease, and what we do know is rather too academic. Rick Klausner gave a real demolition of academic understanding of medicine, more or less saying we ought to start from scratch, it seemed. And there were many voices raised in concern at reproducibility. I hadn’t realised this was so serious, but a Nature editorial raised the issue a few weeks ago and the conclusions are not good no matter how you look at them. One really dreadful example came from Jamie Heywood, below.

Synapse is a platform allowing sharing of data, and collaborative tweaking/reuse. Platforms/Apps are important, but there is I think sometimes too much emphasis placed on the importance of collaborative workspaces (several new platforms were announced during the congress). My view is that we allow people to develop whatever tools they need, provided the code is open source, and provided there is interoperability. Standards in data sharing are what’s important. Let the community define the software it needs for a given task, and insist on interoperability (Blue Obelisk is trying to do this for cheminformatics). I think that Synapse is due to be open source, so that people can install their own local versions, which would be a smart way of increasing its prevalence.

Bay Bridge

Three talks were highlights for me (the only times I stopped Tweeting because I was so interested):

1) Adrien Treuille‘s description of Foldit and EteRNA, and the emphasis he placed on the impact participation had on the participants. The other wonderful element of those projects is the reciprocity – that the techniques developed by the humans for solving these problems were incorporated into next-gen algorithms for the computers, but that then the humans, seeing what the computers weren’t good at, started to set problems for the computers which were difficult because they knew (and felt sorry for) the machines’ weaknesses… He paid nice testament to the power of human learning.

2) Larry Lessig’s wonderful rant. It was a call to arms against the corrupting influence of publishers – corrupting not in an evil way, he was clear to say, but because profit-making is in a publisher’s nature. It can no more be blamed for that as a tiger can be blamed for being fierce. But the message, delivered in the final two minutes in response to a question, was that even if there appears to be no solution to a problem, we still work to do something out of love. I suspect this is the motivation behind many in the open access movement, that even if solutions to scalable, sustainable, economically viable open access literature seem too difficult, we still pursue them for love of the corpus of human knowledge.

3) Jamie Heywood. He spoke movingly about the reasons he set up the ALSTDI, and I was fascinated at the supreme success of such a non-profit drug discovery organisation vs. its commercial competitors, and how much effort it must have taken to set that up and keep it running. The awful part of this talk, however, was the reproducibility study he carried out, burning millions of dollars to show that the then-preferred/recommended lab-based therapies had no clinical efficacy. Tragic, awful and jaw-dropping. One can’t help but extrapolate from that study and one is then awed by the dreadfulness of the message.

Cinderella were there (though I could not sense how open the work was that they were doing) as was something called Discovery Network, but I could not understand what that organisation was. Nobody else appeared to be doing open drug discovery of the kind that we are doing, but the focus of the meeting was much more on the translational/late stage side.

A competition was announced called DREAM which will be on finding new algorithms for the analysis of breast cancer data, but it’s so new that there are no links yet. It reminded me of the MATLAB competitions, and I hope that they run with a similar idea, that the intermediate solutions submitted are rewarded as well as open, so that others may build on earlier solutions.

It was a pleasure to meet others, such as Kelly Edwards, Kaitlin Thaney (finally), Elizabeth Iorns, Bas Bloem, Lance Stewart from the Allen Institute for brain science, Sarah Greene from Cancer Commons. Significant kudos to Stephen Friend for organizing (and to Jon Izant who was dealing with things behind the scenes and who was kind enough to invite me to come along to begin with).

The “Hows” rather than the “Whats” of operating distributed communities of science were requested, and Stephen requested we call out people on things we thought were right or wrong. To my mind the idea of openness wasn’t sometimes emphasized as much as it could have been – the Federation of Phase I of Sage, for example, is essentially a large, closed collaboration between existing groups which does not make use of the power of interested strangers. When openness was mentioned it was often within the context only of data, not of workflow, or process. Open data for me is step 1, but then working with people openly is the extra value of open science – describing what one is doing, what one needs, where a project is going, what our failings are as human scientists. This interests people, and informs people of what is needed next. It acts as an education of what science is. A description of the process behind the data acts as the ultimate metadata, in other words.

I am also reminded of how extraordinarily behind the times chemistry is with regards open data compared to the biologists, who just seem to get it. That’s for another post. But if understanding of a disease is being raised as a real problem in medicine, we ought to start talking about the lack of predictability in chemical synthesis. Or to what extent proprietary chemical data will be first against the wall when the revolution comes.

Cameron Neylon’s post on the conference is already up. Thanks to him and Mr Gunn for a hangover that almost got in the way on Day 2. Thanks to Adrien Treuille for making me forget it.