Recent Updates Page 2 Toggle Comment Threads | Keyboard Shortcuts

  • mattoddchem 10:47 pm on November 25, 2011 Permalink | Reply  

    Europe Trip/Conference Report 

    I had an interesting trip a month ago, funded in part by the Australian Academy of Science. They require a report, and I was wondering if a blog post could double as a publicly-readable and transparent report. Given that the entire trip was for the purpose of open science, this seems appropriate. In the end, they have a proforma, so this is supporting information.

    Sydney-Abu Dhabi-Geneva was followed (no pit stop) by a flight to Madrid to visit GlaxoSmithKline at Tres Cantos.

    There is a group of chemists and biologists there who understand the idea that the free sharing of data brings about more collaborations and faster progress in science. This group really is extraordinary. Their deposition of antimalarial hits in the public domain in mid-2010 is worth thinking about for a moment.

    Here is the sharing of bioactivity data and structures of thousands of compounds. These are not just commercial compounds bought in by GSK, but also synthetic compounds, made by GSK as part of various medicinal chemistry campaigns. Javier Gamo, the lead author of the GSK Nature paper, told me the story of how this deposition came to be, and described the extraordinary leap that occurred in releasing the data. We are all used to seeing talks by big pharma in which the structures of active compounds are not included or “R grouped out”. Indeed, it’s usual for the release of the structure of an active to be associated with a frustrating amount of paperwork. That’s even the case in academia, where I would be seriously advised against sharing data on unpatented bioactives. But here there are thousands of actives. Not only are some of the compounds highly potent, but they are whole-cell potent – an important new movement in the discovery of antimalarial compounds.

    Now I remember when the release of data occurred. It was described as GSK going “open source” in many newspapers. That’s inaccurate. Data are just data, unless people work on them, in which case they become a project. Open source describes a process, an activity. To capitalize on the data released by GSK requires people to work on the data. This is such a rich field of hits that GSK don’t have the time to work through all the series alone. They are seeking partners – indeed that was the rationale for releasing the data in the first place. We’re now working with them, in an extreme form since everything we’re doing is freely available, rather than being part of a more traditional bilateral collaboration.

    Our immediate work is to validate the hits. We’ve just finished their resynthesis, plus the synthesis of a few other compounds. These have now arrived at Tres Cantos who will do IC50s and progress the best compounds to rate-of-killing assays. Stuart Ralph and Vicky Avery are also looking at these compounds – we need to be sure of their worth if we’re going to look at them further. These are sensational in-kind contributions to the project.

    The team at Tres Cantos have evaluated their “TCAMS” set and grouped the best compounds into sets. The arylpyrrole set that is the starting point for the open project is one of those identified. There are lots of others. When I was there the team (including Felix Calderon) told me they’d just published another study where a few of the sets had been evaluated. Interestingly the medicinal chemistry campaign was short and sweet – a small number of variations in different parts of the structure had led to shallow SAR (i.e. small changes in bioactivity from changes in various parts of the structure) and this is a negative for the start of a campaign. This series was abandoned because of this, coupled with there being plenty of other hits to go after. (We need to be following this model in what we’re doing in Sydney). What I found particularly interesting was one of the assays in place to determine the likelihood of resistance occurring to a given compound. This is possibly commonplace in medchem/parasitology, but it was new to me. Emergence of resistance to a known drug is apparently quite repeatable. One can effectively run an “evolution emulator” and re-develop resistance to known compounds. It is then possible to run a parallel experiment with a novel compound, and time the development of resistance/tolerance. A rapid onset of resistance essentially kills that series. This is what had happened with GSK’s evaluation of one of the latest TCAMS sets. Javier’s opinion is to include this assay earlier in the evaluation of hit series. It’s something we’re going to want to do with the arylpyrroles when we have a few more analogs ready to go.

    From Madrid I flew back to Geneva. There is a strangely high concentration of important public health people around the airport. The following morning I had breakfast with Piero Olliaro from WHO/TDR to plan what to do next with our schistosomiasis project following the publication of our papers. We’re looking at some new ways of making the molecule. We’re also excited to be able to provide enantiopure PZQ to any groups needing some. Please contact us if you’re in need, though the procedure’s pretty easy and needs no fancy equipment. Over a coffee and a spectacular pastry we also discussed whether there was mileage in a consortium working on open source drug discovery for schisto. I think there is, provided the consortium is big, and is open. The “neglected” in neglected tropical diseases comes partly from there being a lack of interest in the development of new drugs. Interestingly one of the big drivers for new “schisto” drugs is the veterinary sector. Vet drug providers would be less keen on the openness, naturally. However, their customers might take a different view. Imagine a rotational dewormer identified by a robust, open research process, available at cost, funded in part by the increased productivity of the livestock/fisheries sectors.

    Later that morning I went to the Medicines for Malaria Venture (MMV), another visionary organisation, strongly encouraging the sharing of data and a more collaborative approach to drug discovery. MMV are supporting our current project financially, and last week we just heard that we have secured 3 years’ funding for the project from the Australian Government. I gave a talk at MMV and spoke with our project champions Paul Willis, Jeremy Burrows and Tim Wells. They agreed that the key aims of our current project are 1) to conduct a kernel of activity of the project and 2) to leverage help from others, with particular value arising from practical input, e.g. the synthesis of compounds. It really is my vision for a project of this kind to involve other labs around the world in a coordinated effort, where all data are open and publication of milestones is rapid. If you want to come on board, you can.

    That evening I flew to London for the weekend to see friends and family. It was bizarrely hot in London that weekend. Beautiful to see the new Shard going up rapidly, and I rediscovered the near-perfect Royal Oak pub in Borough round the corner from where we used to live.

    On Sunday I flew to Barcelona, to the EU Congress on Tropical Medicine. I gave a talk on open source drug discovery which generated a lot of interest. It was great to meet up again with Jose Gomez-Marquez from MIT who’s taking a very interesting approach to DIY diagnostics, and whom I first met at SciFoo.

    There were interesting talks at this meeting, but it had quite a broad focus, and there were a large number of policy sessions. I worry a little about the proliferation of groups, i.e. consortia, with names and acronyms, which meet and discuss and emit reports. The crucial feature of many of these organisations is that they are not open, i.e. they operate behind closed doors, and then broadcast. Frequently a funding agency and several universities/NGOs will get together to look into something, say the provision of new drugs for TB. But from my perspective there is little to be gained from my knowing that this organisation exists, because it’s unlikely I will ever be part of it, or influence its direction. Hence I don’t get very excited or involved, and I rather tune out, unfortunately. It did feel as though this meeting was particularly heavy on the “new consortia” announcements, none of whom seem to need anything from anyone. I think if we’ve learned anything in the past few years it’s that it’s important to allow people to input into processes and projects at any stage, and to be detailed about the research being conducted, rather than just summarising it at the end. If I were to advise these groups, it would be to release early and release often, and not polish the outputs too much.

    There was a very interesting talk from Robert Jacobs from Scynexis about the development of a boron-containing drug for Human African Tryps (sleeping sickness). I hadn’t heard of a drug containing boron before, and tweeting on this subject led me, via John Overington, to a post on this very subject by Derek Lowe. I would love to know the point, in a med chem. campaign, when someone says “Hey, let’s try boron now”. It makes me think about the hits we’re looking at. Boron, anyone? Phosphorus? I experimented with Storify to put together the correspondence.

    View the story “Boron-containing Drugs” on Storify

    On the Wednesday I flew to Milan, and then took a train to Modena. This beautiful town was host to the EU COST Action meeting on New Drugs for Neglected Diseases. I had time to check into the hotel and walk to the conference venue before the first session began, and I was due to speak, again on open source drug discovery.

    This meeting was more focussed than the Barcelona meeting, but again I had a lot of very supportive comments about the open nature of the research we were doing, and how openness removes many of the thorny problems of traditional research, such as duplication of effort, or roadblocks to progress because a team is not in touch with the external people needed at a given point. The question I received most, both in Modena and in Barcelona, was What about publications? How do you publish something that’s already out there? I was able to point to the two papers we’d just published the week before to say that this was not a problem, and that publication of open projects is extremely important for bringing people up to speed with where the project is at, as well as marking milestones.

    Dihydrofolate reductase and pteridine reductase PTR1 were mentioned on multiple occasions as targets of interest for rational drug design, with the latter being particularly cool for doing two reactions in one active site – wow. And it also looked interesting from the point of view of maybe being able to catalyze the asymmetric reduction of dihydroisoquinolines, which is a separate project we’re doing in my lab.

    There were a lot of nice talks at this meeting. My view of medicinal chemistry is now a little skewed, and I can’t listen to a talk from a closed group of students/academics about making a molecule against a certain target without thinking “Why aren’t you just sharing all your data and working with all the other groups looking at this area in real time, rather than slowly publishing and telling us about the choicest results at meetings?”

    The COST meeting was extremely pleasant socially, too. You can tell you’re at a European conference when the entertainment is an opera recital in an old church.

    And you know you’re in Modena when your pizza is drizzled in sticky vinegar that looks like crude oil. This one had lard as a topping.

    On Friday I returned to Milan on the train, then flew back to Sydney. Always terrible losing a whole day on the return journey. The whole trip was tremendous, for many reasons. I met a lot of new people who were interested in helping out with our open projects. I had time to think about what we’re doing, and to receive advice and well-informed questions about the approach. Mind duly broadened. Thank you Aus Academy of Science and Leopold Flohe, organiser of the COST meeting.

    There are so many things to do now on the open projects. Besides making new antimalarials, evaluating new catalysts and working to improve the electronic lab books we’re using, we need to recruit new labs to be part of the experimental effort and speed things up as much as we can. That’s our main emphasis in the coming months. We need students, undergrad lab directors, anyone interested in making compounds to join us and become part of a larger team.

    In February next year I’ll be talking at the AAAS meeting in Vancouver on what we’ve been doing. We’re then going to be hosting a one-day open source drug discovery meeting on malaria on February 24th, and it’s great that Saman Habib from the malaria OSDD project in India will be attending. The whole thing will be streamed, for those who can’t make it. I need to get on with organising this, and if anyone has any experience in streaming and archiving a conference in which there is meant to be worldwide participation, I’d be delighted for some pointers because at the moment it looks … challenging.

  • mattoddchem 9:07 pm on November 13, 2011 Permalink | Reply  

    Open Science Funding – Government Grants and Cash Incentives 

    We recently started an open source drug discovery project for malaria. Starting with sensational hit compounds from the GSK Tres Cantos dataset, we are trying to convert these hits into good leads by sharing all data and ideas. Every experiment is online. Anyone can take part. In fact one of the main things that’s needed on this project is for people to make compounds. If you’d like to make some, maybe as part of a summer project, or as part of an undergrad thesis (just like Laura, our undergrad, who did just that), or if you are a hotshot synthetic chemist with some time at the weekend, come on board. There are some important compounds that need to be made, and we can get them biologically evaluated and publish the results. We are at the moment directly supported by the Medicines for Malaria Venture, who are providing money and a very high level of intellectual and logistical leadership behind the scenes.

    There are two extremely cool things I want to share.

    One is that last week we found out we were funded on a larger scale by the Aussie government and MMV. This “Linkage” scheme of the Australian Research Council was the way we funded our first open science project with WHO/TDR back in 2008 (this project is still very much active, more of which another time). For the present grant, MMV chipped in cash, and the ARC amplified that up to a full 3-year grant that fully supports a postdoc in the lab to make compounds full time. Depending on resources from other places we may be able to increase that further. We need to – there’s so much to do. Regardless, we will be able to make compounds for 3 years to lead this open source malaria project. I’m blown away by how exciting this is. Open science, funded.

    So my two open science guys and I have been heading over to the pub to talk about the projects (we need to do this more, guys). This is an excellent excuse to drink beer, but we also need to address the central question – how to get people involved. How to leverage and encourage interest from others.

    There are two immediate things. The first is that we’re going to be running an Open Source Drug Discovery for Malaria meeting at Sydney Uni on February 24th. The head of the nascent OSDD Malaria project in India, Saman Habib, is coming. I’ll shortly be advertising this meeting more generally. We’re going to stream and archive it online for those who can’t make it. The aim is to work out how best to do open source drug discovery, plain and simple.

    The second thing is this: when I was writing the malaria grant I was contacted by an organisation I can’t name (they requested to remain anonymous at this early stage) who said they were interested in sponsoring a prize for open source drug discovery. The conversation went something like this:

    Me: “A prize is an interesting idea. That might help create incentives for participation. The reason I’d rejected it is the Gift Relationship – that if you start paying people for things, the quality might go down, because the incentive changes to one of more direct self-interest.”

    Organisation: “Maybe, but maybe not – we could give it a shot.”

    Me: “Sure – maybe we should trial it. We’re scientists, let’s experiment. There’s one problem though – no teams.”

    Organisation: “What?”

    Me: “There can be no teams. If you have teams then people will keep secrets, negating the whole point of open science. Innocentive has teams, meaning people don’t share. It’s just competition between closed groups, which is not open science. It’s an incentive, but it doesn’t change the way things are done. It’s Open Innovation, which is different from open science.”

    Organisation: “How about a prize for the community which is unlocked upon a milestone being reached?”

    Me: [Shocked at the quality of the idea] “I’m shocked at the quality of that idea. In drug discovery there ARE milestones – specific things that can be achieved and quantified. I’d have to ask the malaria and medchem community for what might be appropriate.”

    Organisation: “Sure – consult with them. If you get the grant we’ll try to commit some money. How much? $30K? A million? If the milestone is reached by a certain date, the money will be unlocked. Half could go to the community who played the most active role in the solution. Half could go to a charity treating malaria.”

    Me: “That’s also a good idea. Apportioning the prize money could be decided by the community themselves. We’d have to disqualify anyone who did not play by open science rules. Interesting. Let’s see if we get the grant.”

    Well, we got the grant. So I can now call upon this organisation to pledge the prize and see if they sign off. Accordingly, I need to work out:

    1) Whether a prize (a team-less prize) is a good idea, or whether we should avoid cash incentives altogether. I’m torn. Need advice from open science/crowdsourcing advocates.

    2) If we did have a prize, what kind of milestone should we set? We are starting with nanomolar compounds in a whole-cell assay. These are astonishing hits. What criteria for lead progression should we include in a milestone? Something that is achievable in 18 months. This is the technical medchem/malaria question for which we need advice.

    • Cameron Neylon 9:38 pm on November 13, 2011 Permalink | Reply

      This is brilliant! At some point I’d go even further and suggest open competitive milestone payments. Sure you can go with a closed team but if the open community is moving faster wouldn’t you want to be in on that? Anyway in the shorter term I agree that finding the right balance of target and amount is crucial for making this work effectively. But even just the idea is making my brain work around how I might be able to contribute…

    • Rajarshi Guha 7:56 am on November 14, 2011 Permalink | Reply

      What about defining some sort of property profile (permeability, solubility, some form of toxicity etc)? This would depend on having access to these types of assays. Also, what strain of malaria are you looking at curretnly? Might it be useful to have a milestone that considers (some degree of) cross-strain activity. Have you run cytotox assays on the compounds that you guys have been synthesizing? (I think the original GSK cmpds had cytotox info on them (?))

      The idea of a prize is nice and not wanting individual teams makes sense, if the goal is open science. But having said that, its not clear that the prize itself would encourage participation in open science, given that individuals or groups won’t get the prize. It seems that the primary driver is scientific interest, credit (and the feeling that you’re helping others).

      • Cameron Neylon 7:09 pm on November 14, 2011 Permalink | Reply

        @Rajarshi I don’t know. My response was one of, well if there is a prospect of money to keep the project going then that really got me interested. Maybe I might get some of it but it seems psychologically powerful in building a community…we’re all pushing together towards some key milestone and if we get there then the community can continue. I think it avoids some of the extrinsic/intrinsic motivation and gift relationship issues because the unit being funded is the community not a single person. At the same time the community would need to have a shared notion of how the prize would be used and I think there is probably a requirement for a benevolent and trusted dictator to run the process.

    • mattoddchem 1:44 pm on December 10, 2011 Permalink | Reply

  • mattoddchem 10:14 pm on October 24, 2011 Permalink | Reply  

    Open Access Week 2011 

    Alex Holcombe has led a letter to an organization responsible for running a “Responsible Conduct of Research” course at some US institutions. A few of us found one part of this course odd, in that it appeared to suggest it was irresponsible to suggest blogs could play a role in science. For the full story, see Alex’s original post. The text of the letter is posted below.

    Alex also co-wrote and created the very amusing open access video imagining scientist-meets-publisher. “Your royalty share will be zero percent”.

    I’m giving a talk at the University of Sydney’s open access event, along with Alex, on Friday. This morning, Monday at 4 a.m. I gave a talk on open source drug discovery at the (tremendous) Open Science Summit 2011 that took place in Mountain View, California – I was sitting in Sydney. Daniel Mietchen pointed out that this timezone feature makes me likely the first person to give a talk for Open Access Week 2011. Like the first fireworks of the New Year, only less impressive.

    Dear Professor Braunschweiger (CITI co-founder) and Professor Ed Prentice (CITI Executive Advisory Committee chair):

    We write to challenge the answer to one of the questions in the “Responsible Conduct of Research” online course. The question reads “A good alternative to the current peer review process would be web logs (BLOGS) where papers would be posted and reviewed by those who have an interest in the work”. The answer deemed correct by your system is “False” and the explanation provided includes the assertion that “It is likely that the peer review process will evolve to minimize bias and conflicts of interest”.

    We question these claims for two reasons. First, we see real examples of rigorous science happening outside of the traditional system of journal-based peer review. Second, we believe that the future path of scholarly communication is uncertain, and indicating to young researchers that such an important issue is closed is both inaccurate and unhelpful to informed debate.

    As an example of science that does not fit the mold suggested by the phrase “the current peer review process”, consider the use of the arXiv preprint server in certain areas of astronomy and physics. In these areas, researchers usually begin by posting their manuscripts to the arXiv server. They then receive comments by those who have an interest in the work. Some of those manuscripts subsequently are submitted to journals and undergo traditional peer review, but many working scientists stay abreast of their field chiefly by reading manuscripts in the arXiv before they are accepted by journals.

    Even in areas that are more tightly bound to traditional journals, there are recent examples where both effective peer review of science [1] and science itself [2] have occurred primarily via blogs and other online platforms. In these cases, the online activity appears to have resulted in more rapid progress than would have been possible through the traditional system. A growing body of research suggests that scholars use social media in ways that reflect and produce serious scholarship [3][4][5].

    As for the future path of the current mainstream peer review model, we believe it is speculation to say that “It is likely that the peer review process will evolve to minimize bias and conflicts of interest”. The current peer review process may be under considerable strain [6] and unfortunately there is little evidence that it significantly improves the quality of manuscripts [7]. This raises the possibility that big changes are required, not just modifications to reduce bias and conflicts of interest. Furthermore, the question presupposes that the future entity into which peer review will evolve does not involve blogging. No one can see the future clearly enough to make that assumption.

    We encourage discussion of this important topic, and would be interested in the inclusion in your program of material that sparks such discussion. However, we believe a true/false question on this topic to be inappropriate, as it limits rather than promotes discussion. All of us wish to see the development and optimization of rigorous systems, both new and traditional, for scientific scholarship. Requiring young researchers to adopt a particular position on this controversial, multifaceted issue may hinder open discussion and future progress.


    Bradley Voytek, PhD, University of California, San Francisco Department of Neurology
    Jason Snyder, PhD, National Institutes of Health, USA
    Alex O. Holcombe, PhD, School of Psychology, University of Sydney, Australia
    William G. Gunn, PhD, Mendeley, USA/UK
    Matthew Todd, PhD, School of Chemistry, University of Sydney, Australia
    Daniel Mietchen, PhD, Open Knowledge Foundation Germany
    Jason Priem, School of Library and Information Science, University of North Carolina at Chapel Hill
    Heather Piwowar, PhD, DataONE/NESCent, Canada
    Todd Vision, PhD, Department of Biology, University of North Carolina at Chapel Hill
    Cameron Neylon, PhD, Science and Technology Facilities Council, UK, Editor in Chief, Open Research Computation

    [1] Online experimental peer review of the “Arsenic Life” paper that recently appeared in Science:
    [2] Open Science is a Research Accelerator, M. Woelfle, P. Olliaro and M. H. Todd, Nature Chemistry 2011, 3, 745-748.
    [3] Groth, P., & Gurney, T. (2010). Studying Scientific Discourse on the Web using Bibliometrics: A Chemistry Blogging Case Study. Presented at the WebSci10: Extending the Frontiers of Society On-Line, Raleigh, NC: US. Retrieved from
    [4] Priem, J., & Costello, K. L. (2010). How and why scholars cite on Twitter. Proceedings of the 73rd ASIS&T Annual Meeting. Presented at the American Society for Information Science & Technology Annual Meeting, Pittsburgh PA, USA. doi:10.1002/meet.14504701201
    [5] Weller, K., Dröge, E., & Puschmann, C. (2011). Citation Analysis in Twitter. Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences. Proceedings of Making Sense of Microposts Workshop (# MSM2011). Co-located with Extended Semantic Web Conference, Crete, Greece.
    [6] Smith R. Classical peer review: an empty gun. Breast Cancer Research 2010, 12(Suppl 4):S13
    [7] Jefferson T, Rudin M, Brodney Folse S, Davidoff F. Editorial peer review for improving the quality of reports of biomedical studies. Cochrane Database of Systematic Reviews 2007, 2:MR000016.

  • mattoddchem 9:14 pm on September 18, 2011 Permalink | Reply  

    The Broader Chemical Community’s View of Uploading Data 

    Opening up your research to the world means you a) benefit from the opinions and knowledge of The Many as you’re doing the research (rather than months afterwards), and b) have to get your research into shape because The Many can cast a critical eye on what you’re doing in a never-ending process of peer review. Science benefits from these things.

    Sharing data is the central part of open science. A necessary, not a sufficient, condition, but central none the less. One cannot be selective about which data to share, because that would mean making a value judgement about what’s important. And what’s unimportant today may important tomorrow. So let’s just share data.

    Outside of open science (and our community of zealots) we should also be encouraging people to share data as part of traditional research publications. Many of us do, as PDFs of NMR spectra, for example. This common practice is very useful for the refereeing process, to determine whether the science is valid. Sharing PDFs is less useful for science because the data in a PDF are dead. Live data can be played with, PDFs can’t. Puppy vs. roadkill. Cow vs. hamburger. We should be submitting raw data to journals along with our traditional reviewer-friendly supporting information. And we should be asking journals to keep the data outside the paywall.

    I recently asked a question about how we should share chemical data – i.e. what data formats would be best. There is an IUPAC standard which we’ve not particularly enjoyed, and we’ve been thinking about just sharing data in as raw a state as possible. Other people picked up on this and provided very useful comments and suggestions here, here and here, as well as in comments to the original post. Thanks guys.

    There’s no consensus, though the IUPAC standard does have its fans, and (I didn’t realise) is a data format that can be used for other spectroscopic techniques rather than simply NMR. I won’t pretend to understand how that’s possible, but it’s interesting.

    We’ll keep thinking about this. For our current ELNs we’ll continue to post data and see how we go over time.

    However, I think we need to address a background question: how will any solution scale to the broader chemical community? I’m not talking about the technical issue of file format, or what to share. I’m talking about psychology.

    My theory is: any solution to data sharing that relies on chemists uploading their data to a central point, or in a proscribed way, will not scale.

    Solution: we need to be building solutions that can find chemical data on the web, extract the data and index them. i.e. a solution that involves as little electronic work as possible for the experimentalist.

    I think that this is probably very hard, but can’t judge. I don’t know how you get a bot to understand that there is chemical content on a web page, and extract it automatically. I don’t know how you can trust the results. I don’t know what happens when the source web page dies. But I know science needs tools of this kind, and that this is what we’ll be doing in 20 years’ time.

    Analogy: Google. Imagine if Google had said “Once you’ve created a web page, just send us the details and we’ll put it in our index.”

    If we, for a moment, look outside our Band of Open Source Brothers, we see a vast community of talented researchers in chemistry who spend their time making molecules in the lab. To date this community either does not see or does not agree with the advantages of doing science openly, or has no need/wish to engage with the issues, or does not see the advantage of sharing data in traditional publications in a way other than PDFs. I see those advantages, and many people I talk to see the advantages, but the vast majority of chemists do not, yet, for whatever reason. Why, then, would a chemist, who is already busy with work, life, family, thesis writing and everything else, sit down and start uploading data to the web? Remember that our chemist, representing 95% of chemists out there, does not agree that doing so is worthwhile (or because they’re not allowed to). There is no incentive. For the incentive to take hold requires the world to change, and that’s going to take some time. It’s also the case that the community is not used to it. We’re used to publishing papers, then having the data appear, as if my magic, in SciFinder or Beilstein or whatever. So we have no problem providing the paper and the data, but we expect others to make it searchable.

    Now, I’m a serious fan of Chemspider, and I’ve just come across Figshare. Excellent services. They’re pioneering. They must succeed, and I think to succeed there needs to be a shift from “hoping for user upload” to “bloodthirsty, active data extraction from disparate sites”, however difficult that might be. Anthony, Mark – I’d like to know your thoughts and what I can do to help. I’m whining because I want your work to flourish.

    People I speak to then say sentences that begin “But all you have to do is…” and “But it’s easy – you just…” – no. It’s no good. Expecting chemists to upload their data to a specific place will not scale. If there’s an activation energy barrier for me, there’s a orbital-forbidden transition state for most people.

    Rather, data need to be posted openly somewhere online:

    a) To a lab book if you’re an open scientist

    b) To an institutional repository if you’ve just finished a thesis, or generally want to share

    c) To supporting information files, if you’re the author of a paper in a journal

    whatever is easiest and convenient locally. i.e. there can be a bunch of different solutions.

    We can rely on this happening, because this is easy, and related to what chemists are doing right now. We can say to chemists: “Hey, do the research, post data. Wherever you want – either on your own webpage, or provide the data when you submit publications and ensure that the data are not behind a paywall. Here are some guidelines on file formats, but really just post the data. We’ll find the data. We’ll tag them so that other people can find them, and then you’ll see how great it is that you shared the data.

    If there is a way of doing this, or finding data people post wherever, and automatically making sense of it, we’ll start seeing some big changes to how things are done. People will start to see the benefits of openness in itself, and we’ll start to move towards an astonishing change – chemists collaborating in real time by finding other people who are working on their molecule/reaction right now.

    • Mark Hahnel 7:11 pm on September 19, 2011 Permalink | Reply

      Thanks for asking Matt. I agree to an extent, but our opinions do differ on some things. I agree that the easiest way to make data re-use immediate is “bloodthirsty, active data extraction from disparate sites”. I also believe there is a role, which will grow, for crowdsourcing researcher data. The key here is the carrot /stick analogy. Wheels are in motion and there is more discussion happening in select fields of research with funders with regards to the stick. Researchers have a moral and ethical obligation to make all of their research data if funded by public money. This obviously isn’t enough right now, maybe mandates from funders will provoke some form of response.

      As a former researcher, my personal viewpoint is that researchers need to see the obvious benefits to their career, or the process needs to be so stupidly simple that it trumps their current data management plan. Here at FigShare, we are trying to do the bits that we can in a multi-pronged attack. We are developing away with the aim of making research data sharing fast and simple. If researchers need to be trained how to use your software, the uptake is likely to be low. We are attempting to have conversations with the funders and institutions about how they can do their bit. Any funders or institutions, please get in touch ( Finally, we are doing as you suggest and pulling the research objects out of Open Access publications, making the figures, datasets and videos available as individual citable, sharable and easily searchable research objects. By doing this and linking back to the original papers, we are making researchers previously published research more discoverable. By adding value to the data in this manner, we can provide a service for non OA publishers too. We are yet to start these conversations, but it would be a good way to start linking the data and to show the direct benefits of researchers uploading their data directly. Any other suggestions and feedback is always welcome. For me the most interesting part is the carrot. What incentives do researchers nee before they decide to do this themselves?

    • mattoddchem 8:47 pm on September 19, 2011 Permalink | Reply

      “Stupidly simple” is right. As I was writing this post it occurred to me that we need a button on a browser that says “Share these data” in the way that I can add a paper to Mendeley and it knows what I’m trying to do (most of the time). So I write up an NMR spectrum, and I post the raw data to that web page, as well as an InChI. I then say “OK, share this” and the data are extracted. Sounds simple. Probably horrendously difficult.

      As to showing people what the benefits are: agreed. Let’s lead by example.

  • mattoddchem 10:08 pm on August 7, 2011 Permalink | Reply  

    Raw Data in Organic Chemistry Papers/Open Science 

    Open science is a way of conducting science where anyone can participate and all ideas and data are freely available. It’s a sensational idea for speeding up research. We’re starting to see big projects in several fields around the world, showing the value of opening up the scientific process. We’re doing it, and are on the verge of starting up something in open source drug discovery. The process brings up an important question.

    I’m an organic chemist. If I want people to get involved and share data in my field I have to think about how to best share those data. I’m on the board of more than one chemistry journal that is thinking about this right now, in terms of whether to allow/encourage authors to deposit data with their papers. Rather than my formulating recommendations for how we should share chemical data, I wanted to throw the issue open, since there are some excellent chemistry bloggers out there in my field who may already have well-founded opinions in this area. Yes, I’m talking about you.

    The standard practice in many good organic chemistry journals is not to share raw data, but typically to ask for PDF versions of important spectra, usually for novel compounds. These naturally serve as a useful tool for the peer-review process, in that a reviewer can easily see whether a compound has been made, and say something of its purity. Such reproductions are not ironclad guarantees that a compound has actually been synthesised, nor that it was the reported process that actually gave rise to that sample. Nonetheless, it’s useful to the reviewer.

    Are PDF reproductions useful to science? Well, not really. Peter Murray-Rust talks about PDFs as being “hamburgers”. I think I understand what he means: PDF data are dead – actually very dead, and the cow would be more interesting. You can’t DO anything with a pdf. You can’t take the data and do anything with them. Nobody can re-analyse the spectrum, or zoom in. The spectrum can’t be understood by a machine with any accuracy. Data are lost in conversion.

    With raw data, you allow other people to check the data. You also allow them to re-analyze. You allow computers to take the data and do interesting things. If all data were raw, you could ask the interweb, for example, “Find me examples of compounds containing an AB quartet with a coupling constant above 18 Hz. And the molecule needs to contain nitrogen. And synthesized since 1987. And have a melting point.” Maybe that question’s important, maybe not. But with raw data you can at least ask questions of the data.

    What are the downsides of posting raw data in organic chemistry, either in papers or to lab book posts:

    1) You have to save the data and then upload them. Well, this was a problem in 1995, but not now.

    2) The data files are large. Not really. A 1H NMR spectrum is ca. 200KB.

    3) It’s a pain. Yes, a little. But we must suffer for things we love.

    4) People might find mistakes in my spectra/assignments. Yes. You’re a scientist. This is a Good Thing.

    An important fact: For many papers, supporting information is actually public domain, not behind a paywall along with the rest of the paper. The ACS, for example, would, by posting raw data as SI, allow the free exchange of raw spectroscopic data. That would be neat.

    I wouldn’t advocate stopping PDF reproductions, necessarily, since these are still useful for review, and for the casual reader. We’re likely to keep using PDF for our electronic lab notebooks, but the data need to be there too. Like ortep and cif – picture and data.

    If we can establish that we should be posting raw data, then what kinds of data should we share, and how? This post is meant to outline an answer, and ask for feedback from anyone who’s already thought about this.

    1) X-ray crystallography. This is the exception. Data are routinely deposited raw, and may be downloaded. Not always the case, but XRD blazes a trail here.

    2) NMR spectroscopy. The big one. IUPAC recommends the JCAMP-DX file format. Jean-Claude Bradley has been a proponent of this format, and has demonstrated how it can be used in all kinds of applications. We’ve played with it, and in one of our recent papers we deposited all the NMR data in this format in the SI. We’ve been posting JCAMP-DX files in our online electronic lab notebooks, e.g. here. My opinion of this file format (both generating it, and reading it) has not been great. There are two formats, I understand, and we found that if we saved the data in the wrong format, we couldn’t read the data with certain programs, but could with others. i.e. we had to get the generation of the file just right. That kind of trickiness, though small, just inevitably means people won’t bother to generate or use the files on a mass scale (unless the journals decide to back it). PDF’s popularity is based on the ubiquity of the reader. JCAMP-DX works well with Jspecview, a free, open source NMR data reader. We’ve not enjoyed our experiences with this, either, though it’s a wonderful endeavour. This led us to look at whether there was a need for saving the data in a particular format, or whether we could just save the raw data, and process those data with a free piece of software. After looking at this with our resident NMR guru, Ian Luck, we found that saving raw data is easy (it’s just a copy and paste of what’s produced by the machine) and that the raw data can be read by free software such as Spinworks or ACDLabs, obviously in addition to our in-house software. This seems ideal? Does anyone have the reason IUPAC prefers a derived data format over the raw data, other than JCAMP-DX is a single file? Aren’t raw data likely to be the most generically useful long-term?

    I don’t know if people have experience of this. I was in touch with one of the ACS journals recently, who indicated that their view was that the journal is not a data repository, and that posting of raw data (which was in their view to some extent desirable) should be posted elsewhere, e.g. to an institutional repository. This is an option. I think it’s less convenient. PLoS seem happy to host the data.

    3) IR data. Don’t know if there is a standard. If the file is small, saving raw data could be encouraged. Would allow easy comparisons of fingerprint regions.

    4) Mass spectrometry. It’s not clear to me there is a huge advantage here to sharing raw data, for a typical low res experiment?

    5) HPLC data. Again, the outputs are fairly simple, and I’m not clear about the advantage of raw data (which I’m assuming would be absorbance vs. time table). Would (perhaps) permit verification that traces have not been cropped to remove pesky impurities.

    6) Anything else?

    • Jean-Claude Bradley 11:52 pm on August 7, 2011 Permalink | Reply

      Mat – you can share JCAMP-DX spectra without asking people to download software. Just upload the file to any open server and append the url from service #4 here:

      It uses the non-Java ChemDoodle components so should work on Mac, many smartphones, etc. In your case I believe the issue was spaces in the filename – if you remove those it should work fine – let me know. Click on this link to see what it should look like:

      As for other forms of spectral data you can do pretty much all of them using JCAMP-DX, as shown in our SpectralGame options (C NMR, IR, UV)

      MS can be done too.
      Another advantage of having the NMR in JCAMP-DX is that you can call web services to automatically integrate within a Google Spreadsheet, for calculating solubility for example: See link #3

    • Peter Murray-Rust 12:19 am on August 8, 2011 Permalink | Reply

      Mat, great post – answering various points:

      >>>Open science is a way of conducting science where anyone can participate and all ideas and data are freely available. It’s a sensational idea for speeding up research. We’re starting to see big projects in several fields around the world, showing the value of opening up the scientific process. We’re doing it, and are on the verge of starting up something in open source drug discovery. The process brings up an important question.

      I am exciting about the OSDD effort(s) and think there is a lot of Open technology they can use.

      >>>I’m an organic chemist. If I want people to get involved and share data in my field I have to think about how to best share those data. I’m on the board of more than one chemistry journal that is thinking about this right now, in terms of whether to allow/encourage authors to deposit data with their papers.

      Many already do “require” PDFs. There is no agreed way of doing it, but if what you mean is depositing JCAMPs then YES. The OS community can hack any variants

      >>>1) You have to save the data and then upload them. Well, this was a problem in 1995, but not now.

      agreed – trivial in time and size of files

      2) The data files are large. Not really. A 1H NMR spectrum is ca. 200KB.

      >>> 3) It’s a pain. Yes, a little. But we must suffer for things we love.

      see below

      >>>4) People might find mistakes in my spectra/assignments. Yes. You’re a scientist. This is a Good Thing.

      Yes – and some bad chemistry has been detected and corrected

      >>>An important fact: For many papers, supporting information is actually public domain, not behind a paywall along with the rest of the paper. The ACS, for example, would, by posting raw data as SI, allow the free exchange of raw spectroscopic data. That would be neat.

      The ACS requires CIFs and I congratulate them. If they could just extend that to JCAMPs and computational logfiles that would almost solve everything

      >>>1) X-ray crystallography. This is the exception. Data are routinely deposited raw, and may be downloaded. Not always the case, but XRD blazes a trail here.

      True for all OA journals (but not much crystallography here except IUCr ActaE), RSC, IUCr, ACS require CIFs (Applause). Wiley, Springer, Elsevier do not publish this supplemental data. Only available from CCDC and then not in bulk without subscription.

      >>>2) NMR spectroscopy. The big one. IUPAC recommends the JCAMP-DX file format. Jean-Claude Bradley has been a proponent of this format, and has demonstrated how it can be used in all kinds of applications. We’ve played with it, and in one of our recent papers we deposited all the NMR data in this format in the SI. We’ve been posting JCAMP-DX files in our online electronic lab notebooks, e.g. here. My opinion of this file format (both generating it, and reading it) has not been great. There are two formats, I understand, and we found that if we saved the data in the wrong format, we couldn’t read the data with certain programs, but could with others. i.e. we had to get the generation of the file just right.

      Don’t fully understand this. There are actually several formats but the OpenSource software reads all of them. CML-Spect supports these and is readable by JSpecview. This need not be a problem if people have the will to solve it.

      >>>I don’t know if people have experience of this. I was in touch with one of the ACS journals recently, who indicated that their view was that the journal is not a data repository, and that posting of raw data (which was in their view to some extent desirable) should be posted elsewhere, e.g. to an institutional repository. This is an option. I think it’s less convenient. PLoS seem happy to host the data.

      I have an idea, which I think will fly.

      >>>3) IR data. Don’t know if there is a standard. If the file is small, saving raw data could be encouraged. Would allow easy comparisons of fingerprint regions.

      JCAMP will hack this

      >>>4) Mass spectrometry. It’s not clear to me there is a huge advantage here to sharing raw data, for a typical low res experiment?

      JCAMP will do this for “1-D” spectra (e.g. not involving GC or multiple steps

      >>>5) HPLC data. Again, the outputs are fairly simple, and I’m not clear about the advantage of raw data (which I’m assuming would be absorbance vs. time table). Would (perhaps) permit verification that traces have not been cropped to remove pesky impurities.

      Again it wouldn’t take much to solve this

      >>>6) Anything else?

      I think we should use FigShare (see ) and I’ll explain why in my blog in a day or so

    • Rifleman_82 2:27 am on August 8, 2011 Permalink | Reply

      I’ve recently encountered the problem you mentioned with .jdx files when i tried to upload some spectra to ChemSpider. It’s a shame that the journals are not interested in becoming data repositories of experimental data. Perhaps not “Open Notebook”, but uploading spectra of known compounds to ChemSpider is helpful for other workers. A way to check if whatever you made is authentic, for example. I’m not sure how hard Tony Williams looks at the data. For what it’s worth, he’s an NMR specialist. It’ll be nice if they can have a front end which allows it to act like an open source SciFinder/Reaxys/SDBS.

    • Rifleman_82 2:28 am on August 8, 2011 Permalink | Reply

    • Alex 6:47 pm on August 8, 2011 Permalink | Reply

      >It’s a pain. Yes, a little. But we must suffer for things we love.
      Now I know what I will say to my girlfirend/workers/friends.

    • Richard Kidd 9:19 pm on August 9, 2011 Permalink | Reply

      Hi Matt

      The RSC are more than happy to get the raw data alongside papers and host with the (Open) ESI, with a couple of provisos –

      1. We’d start having difficulties if the files got too big – which I think is where DataCite comes in – but no problem for jcamps, excel files etc
      2. For peer review purposes we do need pdf versions of the table/spectra – not necessarily ideal, and building in the viewers for the data file isn’t impossible – but ease of review is important

      And also – following Rifleman_82’s post – anyone can load up their jcamp spectra against a compound (or add a new compound then attach the spectra) on the RSC’s ChemSpider, and mark it as Open Data.

      Am happy to follow up with

    • Antony Williams, ChemConnector 10:24 pm on August 15, 2011 Permalink | Reply

      Mat, Great post…similar questions are being asked by many people already. I have responded to your comments here I think overall for the problem you are out to solve that RSC ChemSpider is already most of the way there, certainly in terms of the majority of the data you are discussing. We support spectral data and CIFs already. We could manage the raw data files directly (meaning binary file vendor formats as acquired…FIDs for example) but I don’t think most people would care. They would want the processed NMR spectra. But, of course, spectra are better than PDF files. I’d love to get your data collection from the PLoS article to host on ChemSpider. At present I have to download them one at a time, draw the structure and upload one at a time but we can do it in batch if you want to provide the batch of files to us. We’ve done it for hundreds of pairs of spectra and structures before now. Thanks

  • mattoddchem 11:30 pm on June 27, 2011 Permalink | Reply  

    I’m a Scientist Get Me Out of Here! 

    A week or so ago I was a contestant on the inaugural Australian version of I’m a Scientist Get me Out of Here! Scientists were gathered in an online area – 5 in each zone – and peppered with science questions by school kids. The questions could be on anything, and came in directly via the website, or during frantic real-time chat sessions where we’re really interacting with the kids. The event was recently piloted in the UK, but this was the first time it was run elsewhere.

    Naturally kids have access to people with science backgrounds – their teachers, first and foremost. They can also read stuff and watch stuff on TV and read things online. But this competition gives them a chance to interact directly with practicing scientists, and that doesn’t normally happen.

    After a week or so of asking questions the students start voting for which scientist they’d like to stay, and the one with the least votes is evicted. One eviction per day until the winner is declared and awarded $1000 to help fund a science outreach activity. The evictions were pretty brutal. It was interesting that many of us spent a lot of the week describing good evolutionary arguments for various things, but when you’re actually part of a survival of the fittest exercise, suddenly it’s not so great. I lasted till the final two in the Hydrogen Zone, and was pipped to the post by Aimee Parker, a Monash Honours student and budding science communicator with a flair for explaining science of all kinds. Congratulations Aimee, well deserved.

    So how was it? It was a total blast. Any scientists reading this – sign up to get involved next time round.

    The questions would sometimes come in late at night – one batch were released around 11 pm, and I found myself typing away for a couple of hours like some sci-junkie. The addictiveness comes from the fact that it’s a competition, sure, and you want to answer questions first so that you can get your answer in first. But it’s much more the kinds of questions you get (which are on a broad range of things) and partly because you feel that the kids actually want to know the answers. You can also wax lyrical about what science is and what you do in your work. Then you can switch to talking about relativity and GM foods.

    Some highlights included an excellent question on what are the 5 commonest molecules (see how the ambiguity necessitates a long answer), a question on predicting when we’ll be eating synthetic meat, various questions about lightning, a truly awesome question about what happens if you’re in a car going at the speed of light and you switch on the headlights (needed several goes at that one)  and another priceless one that generated a lot of analysis about what it’s like at the centre of the Earth if you dug a hole there.

    There were also lots of solid, sensible questions that it felt good to answer. A lot of questions were Googlable/Wikipediable, but were still asked, which may say something about children’s healthy skepticism of answers on the web, or their over-faith in the authority of scientists to talk on any subject. Interestingly there were quite a few questions (both on the website and in the furious chat sessions) about a) whether the world would end in 2012, and b) evolution/big bang/origin of life. On the first of those, it was interesting that the kids were all asking about the supposed 2012 apocalypse, but that hardly any of them believed it. So the idiotic meme was successful but did not stand up to much thought. On the other hand the questions about evolution and related things indicated that a lot of kids were wrestling with the contrast between religion and science and it wasn’t clear in many cases which way they were going. The questions were often phrased with a hint of disbelief that things could just have “arisen” or “happened” which perhaps suggested that the kids weren’t so happy with all the uncertainties of the current state of scientific origin theories. I was at pains to point out that uncertainty is good, because it makes us ask questions, and that science is about probabilities rather than absolutes. But it’s still a challenge, and it was great to be able to lay those challenges out.

    Thanks to the wonderful team behind the event (Kristin, James, Sarah) for making everything work so well and adeptly fielding the hilarious curveballs that would crop up in the chats, and thank you to all the school children who asked stuff (particularly the ones who voted for me – I love you guys…) The kids made it such a cool event by virtue of their most awesome weapon – curiosity. All power to them.

  • mattoddchem 10:55 pm on March 24, 2011 Permalink | Reply  

    Open Science Student Projects 

    We’re launching a new kind of student project in synthetic organic chemistry. The idea is this: any student anywhere in the world can join in, provided all data are posted openly online. We aim to publish the research with all participants.

    What we’re looking for are students who can actually carry out practical experiments in a lab and upload data. This project is ideally suited to being run as part of a formal undergraduate laboratory course, but any student can join in. We’ve just started this at The University of Sydney this year – one undergraduate, Clara, is working on the project, and she will be joined by others later in semester. Our first partner to sign up is Stanford University, where lab director Charlie Cox has run the project as an option in third year undergrad lab. A few people have contacted me informally about running the project at their own universities, and we’re going to try to secure some money to help run the project in some universities in Africa.

    This post opens up the project to the rest of the world. If you’re reading this, and would like to join in, then yes, you can.

    The project concerns the optimization of our resolution of praziquantel. Though the route is easy to perform, and efficient, we’re looking for ways to improve the route still further to bring the cost down.

    There are a number of things we can look at. We will need to set up some online forums where the project can be discussed. If you have any questions now, you can post them below, or on the Friendfeed room, or get an account on Labtrove (our open source ELN) and post things here, or tweet me, or comment on The Synaptic Leap. There’s also email, which I’m trying to discourage, but please use this if you don’t want to discuss possible involvement in the project in the open.

    The relevant online lab notebooks to which you’d contribute if you took part are here.

    Any student contributing data can then take part in writing the resulting research paper, which is here. Once you’ve contributed an experiment, add your name to the paper, and start making changes to the manuscript. For undergraduates this is exciting because they can take part in real research, generating new data, rather than repeating experiments with known outcomes. This way we also aim to generate a real research publication – very useful for students interested in a career in research.

    Some very important points:

    1. All data generated by students are to be deposited openly on the web. Please don’t take part and not share all data – no point in doing that. Use the ELN like a real lab book – don’t leave things out.

    2. We’ll publish when we’ve reached a significant milestone. What that is depends on what people do, so we can decide this later.

    3. Students who contribute experimental data can be authors and can edit the paper.

    4. All reagents ought to be inexpensive and generally available – this is kind of the point. The starting material itself, praziquantel, is ironically not that cheap from most commercial suppliers. At the outset of the project, we can provide PZQ to labs wanting to take part – we’ll just mail you some. We’re looking for a longer-term solution to this once things get going.

    5. I/my group are starting this up and, for convenience, hosting it, but we don’t own it. If other people work on this project so much they start taking it over and leading the science, that’s perfect. Leadership in open projects is fluid. Thus anyone who takes part works for the project, certainly not “for” me or my group. There is no other incentive to taking part than getting the job done and finding a route to this enantiopure drug that’s viable for scale-up.

    If you’re a student who wants to take part, go hassle your lab director/PI. If you’re a lab director reading this, please consider having a cohort of students try this lab. This is a real optimization of a real process involving a real drug that affects millions of people.

    Background to the science involved can be found here. There’s a pdf there that describes some of the chemistry. Essentially, though: the resolution is several steps, and each needs improvement. There’s an initial hydrolysis of the drug, synthesis of resolving agents, the resolution itself, and then the re-isolation and purification of enantiopure drug. Each step works, but needs to be better. There are lots of very nice crystalline solids throughout. We can’t use chromatography. We need inexpensive reagents, and environmentally benign solvents. We need high yields, and effective recycling strategies. And so on.

    There are other examples of distributed student involvement in science. William Scott and Martin O’Donnell began a related project in 2009 called D3, and there were some papers describing this excellent work. The difference here is that our project is open, in the sense that anyone can participate and all data are freely available as they are acquired. That may make it more chaotic. It may also make it more effective. Part of the innovation here for people taking part is working that out.

    It’s also fitting that this project is being launched during the International Year of Chemistry. We’re trying to use the web not just to share data, but actually to collaborate on a real research question in experimental lab science. If you’ve an interest in trying to solve this problem, you’re free to join a worldwide effort. There’s an interesting “crowdsourcing” experiment being run by the RSC that concerns measuring the pH of water worldwide. In our project we’re not asking for a measurement, we’re actually asking students to perform synthesis, but then also to think about what experiments to try next, and to help write the paper – the full gamut of aspects of a full research project. This is pretty demanding. It’s more reminiscient of the wonderful Biobricks competition, with the difference that our project here is open and web-based, rather than a competition in a specific location.

    What’s the hope here? I hope that students can get excited about working together on a real research problem, and can get a taste for what a mind-bending exercise real research is – research where you’re not even sure what question to ask at the outset, let alone how to answer it. What I’m hoping for is that students can help solve an important problem as a group. I recently went to a meeting organised by a student cohort committed to lobbying universities to take part in research in tropical diseases without necessarily seeking patents and profits, UAEM. The guy then in charge of the group, Ethan Guillen, said at the start: “Students are great allies to have if you’re a professor”. Amen to that.

  • mattoddchem 10:13 pm on October 31, 2010 Permalink | Reply  

    Sabbatical Part 1 – UCSF 

    I was on sabbatical from January 20th till July 12th. Half at Stanford and half at UCSF. All California.

    When I was organizing the trip I asked the (now) Dean of Science at Sydney, Trevor Hambley, what a sabbatical was for. My assumption was that it was about going to a new lab, learning a new skill and building new collaborations. Interestingly, he said “the idea is to re-charge.”

    Having been, and come back, I understand what he meant, and he was right. Life should be like a sabbatical. A new technique, or a new collaboration, may be useful, but what’s really important is to reconnect with why you’re doing science, in case years of local administrative duties have clouded your youthful, pristine vision.

    Why was I on sabbatical? I wanted to visit two labs, one in asymmetric catalyst discovery and one in drug discovery. These labs needed to be the best in their fields. I wanted to see how catalysts and drugs are discovered. My aim was to see how much of the work is screening and how much is design. My assumption before the sabbatical was: it’s mainly screening, since we can’t design drugs or catalysts yet from first principles. If that was the case, I wanted to try to find examples of projects that, if successful, would allow more design and less screening. In the course of this search I also wanted to think about our open science work, and find people who wanted to work this way.

    I’ll write about Stanford later. My hosts at UCSF were James McKerrow and Conor Caffrey. They work at the Mission Bay Campus of UCSF, which is not in the middle of the city with the rest of the campus, but is out on the east side where the old port buildings are. It’s an odd place – a beautiful set of buildings surrounded by a view of the city (north), the water (east), wasteland-then-cool Potrero Hill (south) and outskirts of the city (west). Huge tracts of land are waiting for new buildings. You can smell the sea air. It’s a great place for an apartment, if you don’t need to buy anything. I’d like to live there.

    View of San Francisco from Byers Hall

    Genentech Hall, next door to Byers

    I was a guest in the QB3, specifically at the Sandler Center, which has a focus on finding new medicines for neglected tropical diseases, and which supports open science. I was there due to my interest in schistosomiasis. I had also heard that UCSF Mission Bay was a unusually fertile place, scientifically.

    And so it was. I had a desk and an internet connection in Byers Hall with a nice view that allowed me to contemplate the stately motion of sea freight. I had to do a number of things like writing grants and papers, but in between times I was able to sit in on the McKerrow group meetings and talk to faculty and students in the building. This was pretty much perfect – it was great to talk with Andrej Sali, Brian Shoichet, Jo Derisi, Adam Renslo and others at various points, and to sit next to Joseph Mulvaney (from the SMDC) the whole time whom I must thank for being so quiet (and answering a few dumb questions I had about cheminformatics).

    A few things are really unusual about Mission Bay. The groups have fluid barriers between them. It’s never clear who is working for whom, and people often seem to go to various group meetings because there are so many collaborative projects. The faculty are working on big, interesting problems together in a highly interdisciplinary way. In my experience “interdisciplinary” is a word that people try to tape over an existing department or insert into a planning document. At UCSF it really seemed to be the way people worked.

    The thing that really stood out for me was the intellectual independence of the students in the McKerrow group. At group meetings they had a very good level of understanding not only of their work but also of the context of their work in the field. They really were driving their research. The students also were excellent at questioning each other in group meetings, and not leaving this up to those in charge. Perhaps this is a feature of the US, or that it’s an elite organization, or that the research is in biology, or something. Whatever, Jim seems to be fostering a group or people who are turning into proper scientists, with a high level of control over their intellectual futures. A great place to go to graduate school.

    The building itself was very attractive – Byers Hall is linked to Genentech Hall by a vertiginous atrium, and there was occasionally music there, or some other event, to mix stuff up.

    Looking down the atrium in Byers Hall, with music

    Over the way was a Peasant’s Pies that lots of people went to, maybe for the pies, maybe for the high quality heavy wooden furniture, maybe for the free wifi. Who knows. People were clearly writing their whole theses in there.

    A frequent view towards the UCSF pie shop

    We also witnessed the Byers Bash, where groups of students/faculty got together in a musical competition. People made great music, with food and beer. Jim himself unleashed his inner rock star on guitar and vocals.

    Jim McKerrow takes the room at the Byers Bash, UCSF

    I’d never seen anything quite like this place before. Nor could I imagine seeing a communal arcade machine in an academic building in Sydney. Fishtanks flank it.

    Byers Hall Entertainment

    If anyone in Sydney has an old Pacman machine going spare, we can find a good home for it in my lab.

    So screening vs. design? A big question with a long answer. I’ll come back to it.

  • mattoddchem 10:14 pm on October 25, 2010 Permalink | Reply  

    SciFoo (and how about some more Science Unconferences) 

    I went to SciFoo this year. This is an invitation-only, 200-person event at Google HQ in Mountain View, organized by O’Reilly and Nature. I’d been invited before, only for it to clash badly with the start of the Sydney semester and the start of my teaching, cascading over me like a waterfall of incomplete handouts and practical demonstrations. Previously I’d caved. This time I didn’t and I flew to San Francisco just for the weekend. Oh man – am I glad I went.

    Gathering at the Googleplex


    48 hours at the Googleplex. An intellectual lock-in. Nobody goes home. Nobody else comes through. It’s just them – and me. We all assembled, described ourselves to the crowd in three phrases, then wrote names of sessions we wanted to lead on post-it notes and stuck them on a grid of places and times. You grab a drink, and start talking. You stop talking when you can’t perceive your own soul anymore and need to sleep. I was jetlagged with a terrible cold that made me sound like my tonsils were made of sandpaper. But no matter.

    SciFoo Schedule Board


    So this is an “unconference” – something perhaps not familiar to many mainstream scientists. There is no pre-arranged agenda. You just get people together and let them talk about whatever they want. The schedule is made on the fly and can change. People with similar interests aggregate naturally. Through random chance or curiosity people who know nothing about the content of a session will show up.

    This was the most inspiring meeting I’ve ever been to. The chemical content was next to zero. The science content was, well, it was just Proper Science, as it should be. Childish, naive, reaching for the stars. A succession of things that make you go “oooh that’s nice.” It was eyewateringly exciting. The naivety could have been annoying were it not for the fact that people invited are doing work so thoroughly marinaded in cool sauce and topped with awesomes and thousands.

    The key to this success is simplicity – you just get good people. Organisers Tim O’Reilly, Timo Hannay and Chris DiBona know this very well. People the world over are doing the coolness – you just have to get them together. They then mutually remind each other why they got into science in the first place – it’s like motivational autocatalysis.

    The tone was set on the first evening when Larry Page, at the end of his welcoming remarks said “You know, if what you’re doing isn’t going to change the world, then maybe you should do something else.” People kept referring to this over the next 48 hours, and it’s lodged in my brain ever since, gradually working its way to the centre.

    Emily Brodsky talked for ten minutes in a Lightning Talks session about why one earthquake can trigger others. David Eagleman described how he’d dropped people off a crane to determine whether bullet-time was real. Noah Hutton gave a late-night session about a film he’s making on the Blue Brain Project, an attempt by Henry Markram to model a human brain in a computer in the next five years. Peter Singer caused a lot of discussion with his website that attempts to motivate people to donate money to charity. And so on and so on. The frustrating thing is not being able to go to all the parallel sessions. I still regret missing Yves Rossy talk about jet-propelling himself over the Channel.

    The challenge when you’re there is to be able to explain how what you’re doing is going to change the world. It makes you think about your own work, and the work of the people around you. You’ve been invited because someone important (it’s never clear who) thinks that what you’re doing might well change the world. This forces you to forget about the detail we so often talk about, take a huge step back and confront the big picture lurking somewhere around you. Often at specialist science meetings, when challenged to talk about your work you might say “Well, I’m working in a calixarene-based sensor for [insert molecule]” or “I’m trying to make grantotoxin faster than this other guy” or “I do sesquiterpenes.” Technical, small answers to a technical crowd. That won’t do at a meeting like this.

    On the first evening I was having a drink with a guy and Sergei Brin sidles up and asks me what I work on. My answer piqued his interest because I said “We’re working on making a drug needed in Africa by doing the science in the open, on the web, so that everyone can help us out and make the science go faster”. I got a lot of practice at permuting this kind of answer over the course of the next 50 times I was asked it. Everyone I spoke to liked the answer. Some people are doing similar things, like the truly wonderful Galaxyzoo project – their lead tech guy Arfon Smith was present. It was great to meet Michael Nielsen who made me aware that someone had already framed the concept of cognitive surplus – I’d been thinking about this ever since watching people play Tetris on their phones on Sydney buses and wishing that level of sustained concentration couldn’t be directed to a more meaningful goal.

    I’ll never forget the first session. A few of us gathered in a room. I was sitting next to Will Noel, a guy putting ancient manuscripts online for the Walters Art Museum in Baltimore. The session was on “The Future of Space Travel”. An unassuming guy led it (can’t find who), and nobody was quite sure what to expect. He began by saying “So when I was flying the space shuttle…” and the room kind of changed – we all became little kids wanting to know what that was like.

    It was a real pleasure to meet Derek Lowe at this meeting. Derek is a bellweather of pharma, and organic chemistry quite generally. A wise, considerate and articulate guy with a huge range of interests. We attended a session together where Lee Smolin explained what quantum gravity was. Waves (or particles) of Physics Envy crashed into me (again). Such big problems, concerning the nature of reality. Derek and I co-hosted a session on the possibility of Open Source Drug Discovery. A fascinating hour. I was able to brief everyone on what we are doing at the Synaptic Leap, where we’re trying to show we can open source process chemistry. The discussion turned to the rest of the drug discovery process, enormously facilitated by Derek’s wide-ranging expertise. The people in the room, from Creative Commons, from the White House, from industry were quick to clarify the thorny issues – but they all seemed to want the idea of OSDD to work – they all acknowledged something radical had to change in the coming years.  I’ll have to return to this in a future post but the session was so inspiring. (I seem to share a lot of interests with Esther Dyson, who was in a lot of the sessions I went to. She was able to ask the best disruptive questions whilst spending most of the time apparently approving/rejecting friends on Facebook. Whatever works…)

    Derek Lowe (centre) discusses quantum gravity with Lee Smolin (right)


    Scifoo made me think about chemistry conferences a lot. I’ve been to a large number of chemistry conferences. I went to the American Chemical Society (ACS) meeting in San Francisco in March this year. I was at the conference venue the whole time the conference was on. I didn’t speak since I missed the abstract submission deadline that was sometime back in 1998 I think. I sat in sessions the whole time, I mingled and met all the people I wanted to, as well as a few people I hadn’t expected to. I had beer at the poster session and beers with people in the evenings. Was this conference a good use of my time? Apart some excellent beer conversations, not really.

    There’s a separate post that’s needed here about where organic chemistry is, and where it’s going – a few people have been posting on this recently. But just in terms of the ACS meeting itself: with a few very notable exceptions the talks I saw were a) presented in a dull Powerpoint-heavy series of slides with verbal commentary about what was on the slides where even the presenter was visibly bored with what they were saying and b) on published material that was c) way too predictable and incremental. So both the presentational style and the content were disappointing. So many talks at the ACS would have been more interesting if the speaker had simply given out paper copies of their latest paper and given us 10 minutes to read it in silence then 10 minutes to talk about it. Now of course specialism necessitates incrementalism in content, but it’s no good if the meeting becomes a chore to sit and listen to. Nor is it good if the talks come out of the Powerpoint Machine (the genius of the “Chicken Talk” is that you can kind of follow the talk structure without listening to the content – it sounds exactly like most academic talks right up to the last supplementary slide in response to the second question at the end). In maybe 80% of the talks I attended nobody asked questions, or nobody was allowed to, or people asked “pity questions” just to break the awkward silence, but which were in no way interesting in themselves. So, constructive solutions:

    1) people should be excited about what they’re presenting (there were a few excellent talks at the ACS I should add, by both faculty and students). If they’re not excited, they should sit back down.

    2) conferences with little slots for questions, or where there are no questions, are of no real interest at all (particularly now that you can just listen to the talks online – a great move by the ACS)

    3) how about we just scrap the schedule and allow people to talk on whatever they want on the day. Sessions have a title, which is a question or a hypothesis, and people come to discuss that without any pre-made slides. This removes the inanity of talks entitled “Recent Developments in X”

    The ACS can deal with 2) and 3), not 1). While considering this, you may want to examine this picture of the control panel on the Google toilet cubicle wall.

    Google toilet control panel


    and maybe this shot of the Google campus



    Unconferences are the way forward. I hear that the Burning Man can be like this too (though from a look at the WP page it’s now huge). As can Maker Faire. So who’s on for a chemistry unconference, or maybe a chemistry/biology or chemistry/physics or chemistry/software unconference? (Gregynog and Gordon Conferences are close, but not quite there). Get good people together for a weekend. If you don’t want to actively participate, go somewhere else. Screw the usual formalities and just allow the day to pan out. There are no conference proceedings, and since you don’t actually present a series of slides, there’s nothing to put on your CV. Let the people who want to talk, talk, and see how the sessions define themselves, while insisting that sessions are framed around hypotheses. I wonder what would happen. Let’s try it. Sydney or London or New York or someplace nice.

    • Antony Williams 10:42 pm on October 25, 2010 Permalink | Reply

      Mat….I’m all for it! Yes, yes, yes. I;ve done Scifoo twice and truly enjoyed it and came away asking for more chemistry. Fortunately both times I was there I ended up hanging around with collaborators as well as meeting great new people. Many of the questions I carried with me to the conference remain unanswered though and I think a collective audience of chemists could really help. What can I do to help? When do we start ? :-)

    • Jamie 11:03 pm on October 25, 2010 Permalink | Reply

      There are science unconferences… SciBarCamp. I’ve help organize these events in Toronto and Palo Alto. I’ve also heard of a Cambridge scibarcamp is in planning mode.

  • mattoddchem 11:05 pm on October 13, 2010 Permalink | Reply  

    The Ostrom Rules and Online Projects 

    Sydney Uni has an enlightened organisation within it called CHAST that organises science talks of wide general public interest. We recently hosted David Sloan Wilson who spoke on “Evolving the city: using evolution to understand and improve the human condition“. The talk touched on a number of cool ideas. At the end he spoke of self-organising social systems – that often social groups can run very efficiently without the need for excessive top-down regulation. In order for this to work there needed to be certain rules to prevent a system from using up its natural resources and withering. This is a biological argument that he was applying to a social network, such as a city neighbourhood. He referred to Elinor Ostrom‘s 8 principles for resource management. There need to be:

    1. Rules

    2. Reward systems

    3. Collective choice arrangements

    4. Ways to monitor the system (by people who are involved or who have a stake)

    5. Graduated sanctions for bad behaviour

    6. Mechanisms for conflict resolution

    7. Rights to self-organise recognised by a higher authority (not God, people)

    8. Scalabilities – the rules above need to apply also to the relationship between groups

    As I was listening I thought – “These are excellent principles for the operation of any open source project”. I’ve been thinking how to carry out research projects in the open (such as these ones), and how to write papers (such as this one that’s in progress). The Polymath project sought rules for good behaviour which seemed to work. The Ostrom guidelines are a nice take, from a different field, but they articulate something important about productive online communities. It’s interesting to think about whether these rules apply to recent online successes such as Foldit and GalaxyZoo.

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc

Get every new post delivered to your Inbox.