Updates from mattoddchem Toggle Comment Threads | Keyboard Shortcuts

  • mattoddchem 10:54 am on April 19, 2012 Permalink | Reply  

    Reinventing Discovery on the way to the Sage Congress 

    I’m in San Francisco for the Sage Commons Congress, and am excited to be able to contribute. I took a United flight, and there were no TVs. I was also surrounded by loud people, so couldn’t sleep. To salvage the 13 hours I read Michael Nielsen’s Reinventing Discovery, finally. It’s testament to the quality of the writing that I glided through it in one sitting (I did have a window seat). The book makes some wonderful points, and I’d recommend it to anyone. I wonder if Michael’s call to arms about the absurdity of the current model of scientific publishing was more scathing in earlier drafts, and that the original version might be accessible by giving Michael a beer. A few random things I enjoyed:

    1) The optimism of our knowing about collective knowledge from Ostrom and others, and how that might allow us to progress to a change in the way science is done (from the top down, guided by informed decision making). I had previously written about the relevance of Ostrom to open science communities, in a brief post.

    2) The wonderful public hissy fits one can read in open source software project sites, and how that adds humanity and interest to the process of work. It sounds like a similar thing happened in the Kasparov vs. the World chess game. There’s precious little of this in science.

    3) The irony of our current publishing system, that actually inhibits distribution of information, whereas it was set up to do the opposite.

    4) The wonderful section on what computers are good at vs. what humans are good at, and how this leads to a reinterpretation of what “understanding” is or how a “model” is perhaps more than that. I enjoyed the section on the Google Translate project (wasn’t that the DARPA project)? Successful through a statistical modelling of language uninformed by grammar.

    5) I liked the comment that open drug discovery might be a step too far, given the IP ramifications. Well, we’re trying that, so let’s see!

    6) Michael’s comments on the possible limitations of citizen science projects like Galaxyzoo and Foldit were also very interesting. These are spectacular projects, and the comments were addressing whether there were many other projects where the public could reasonably be expected to input productively. My view is: Yes, and I hope so… I think this is a transformative challenge, in that it would be so sensational if the public could contribute to hardcore science through good project design. I, for example, want to see the public able to contribute to projects in the design of new catalysts. The challenge is to design a system where that becomes possible. That’s extraordinarily tough, and hence worth doing.

    I hope Michael will soon be able to write a sequel to the book, Discovery Reinvented.

     
  • mattoddchem 9:26 am on January 30, 2012 Permalink | Reply  

    Wikipedia vs. Academic Papers – a Middle Ground 

    We’re trialling an experiment until the end of February. Can we assemble a review of an area of science on a wiki, allowing anyone to contribute, and then publish that in a peer-reviewed academic journal? (early description of this on G+)

    Wikipedia has a great deal of useful information, but its coverage of some academic areas can be patchy – it naturally depends who has contributed. With many articles the reader will be unable to judge whether the article is complete and accurate according to the current knowledge of the field. By contrast, many academic articles in journals are not open access, reducing the readership. Sometimes articles are written with a significant bias towards the authors’ work, and indeed sometimes that is the explicit purpose of the article. It’s very rare for leaders in a field to get together to co-author a review.

    Wikipedia also has a suspect reputation in academia – we complain about its shortcomings, do little about it, and then all go and use it when nobody is looking.

    What if we could assemble a paper openly, using a consortium of interested people, and then at a point where everyone is happy that the article is complete, we submit it for peer-review and publication in an open-access journal? That would have the disadvantage of killing off future edits (because the paper was reviewed as a static object) but the great advantage of producing a citeable article that would be curated by the journal. One could donate the final version to a relevant Wikipedia page, but the article is guaranteed as peer-reviewed, and has a DOI, etc.

    We attempted this with our resolution paper last year. The research article we wrote was based on public-domain knowledge arising from our open online electronic lab notebooks. It was written on a wiki where anyone could contribute. It was submitted to a journal, reviewed anonymously and then published. In that case nobody outside the core team (the listed authors) contributed to the writing. They could have, but in the end did not.

    The chemistry in question was, to a first approximation, solved. What we’d like to do is find a second solution to the problem that uses more sophisticated science (which is more expensive, perhaps prohibitively, but that’s OK at the moment). The area of science that is relevant is a certain chemical reaction – the catalytic, asymmetric Pictet-Spengler reaction. In the course of doing this research we’ve been looking back over what’s known of this reaction and when one does this one can’t help thinking about writing a review. We thought it would be interesting to generate such a review in the open, allowing anyone else with an interest to participate.

    The current draft is here. The talk page contains what needs to be done and allows a trail of author interactions. Two of my students and I have to date done most of the work. Three other people have expressed firm intentions to edit the document and add sections. We have applied a time limit to the writing process of the end of February because that’s when semester starts in Sydney and it would be nice to be finished with it by then. A Dropbox folder contains all the relevant things – pdfs of relevant papers and the raw files used to generate the diagrams. People working on the paper have access to that.

    This experiment in writing chimes with what a few others have been saying about the review process (my summary with links is here). Tim Gowers, for example, has been talking of a possible website where papers could be deposited as drafts (arXiv style) and then people be allowed to review them for improvement before submission. There is an interesting discussion to be had there about a “currency” of peer-review – that if someone undertook some editing of an article, could their contribution be reciprocated, maybe quantitatively? This goes to forming an incentive to participate in peer review, and also goes to how we could build reputation in academic circles – an attractive idea for younger scientists who would like to engage in writing papers and building careers, but for whom there is no clear way to do so because they lack the usual academic monikers. Currently professors do peer review (or are meant to) yet have no great incentive to do so beyond the better parts of their nature, and graduate students may be involved but receive no recognition outside their groups. Can lessons learned in open source software construction be applied to writing papers? How would the infinite gradations of expertise work? Would there need to be restrictions on permissions to authorship based on one’s current standing in such a system? These are really interesting questions that go to the future of how we write papers, so this experiment is an attempt to look at those things.

    Fundamentally what we’re proposing is simple – people get together to write. Everything can be checked and edited. We have instituted a “quality control” section in the talk page to ensure that everything written is checked by at least one person, who identifies themselves. I have assigned myself as corresponding author to ensure the buck stops with someone. I am keen, for example, for the review to be well-written besides just being correct. I have read a lot of perfectly correct reviews which often require a lot of coffee.

    The challenge of this model is this: How much work constitutes authorship? I don’t know. That’s the purpose of the experiment. At this stage, with a small number of participants, I can do this manually. If I think someone has contributed significantly, I will allow that person to be named. If the contributions are minor, that person will be “acknowledged”. It is possible for a student to be named, but not their professor (very much not what happens in my discipline) but I am making it clear to anyone volunteering that I have to check that with whomever is paying their salary, because I’m polite like that. It is certainly possible that if someone does a very large amount of work on the review, taking it in new directions or rewriting large chunks for the better, that that person should become (a) corresponding author, because being corresponding author means taking ownership of the end result.

    It’s certainly been an interesting process thus far (20 days). Writing in this way feels “live” – edits can come at any time, from anywhere. There is a history of the page so things can be tracked and undone. The writing has gone quickly, apart from in the last five days where my students and I each independently became involved in other things that took us away from writing. But the fundamental aim is: to produce a complete, high-quality review of an area. I am very interested to see if that is what happens.

    Where to submit? I’m considering the small number of options in chemistry. PLoS ONE does not publish reviews.

    I’d also be grateful for any related examples of distributed paper writing where the draft assembled in the open then went to peer review. I am assuming that the papers arising from the Polymath project (rather than just the project itself) were constructed by multiple people, but if anyone knows of other cases, please say.

     
  • mattoddchem 11:02 pm on January 26, 2012 Permalink | Reply  

    Goodbye Elsevier, Goodbye Tet Lett etc 

    I’ve decided to stop refereeing for, and publishing in, Elsevier journals. I was just asked to review for Tet Lett again, and sent notice that I’m out:

    “Apologies, but I have decided to stop refereeing for (and publishing in) Elsevier journals because of 1) the lack of a positive policy towards open access (to all content, not just individual articles) and 2) Elsevier’s aggressive commercialism, in particular its sponsorship of the Research Works Act in the United States which would unquestionably harm science. Please remove me from your list of referees.

    If Elsevier were, in the future, to decide to support full open access to the academic literature I’d be delighted to resume refereeing duties.”

    Over the last few years my interest in open science has grown, and inevitably I’ve had to confront the power of open access literature, which is a necessary condition for open science if we are to avoid the absurdity of research conducted in the open disappearing behind a subscription once it’s done. My doubts about contributing to a system of closed access journals, which totally dominate organic chemistry, were becoming overwhelming when Tim Gowers’ post came along about the need to declare publicly that we would no longer support the system.

    I’m starting with Elsevier. The tipping point was the ridiculousness of the Research Works Act – a squalid little affair that was very little to do with the greater good or the benefit of science. I have been irritated by all the pompous talk of the “value” Elsevier adds to the process of peer review. Over the last ten years or so I have had experience of the peer review system operated by 3-4 organic chem Elsevier journals. I’d like someone to point out something about this “value” that is innovative or surprising and which might need some hefty R&D budget. Is it perhaps the case that simply publishing an article written and reviewed by scientists has become fairly straightforward in this modern age? I have been an editor at PLoS One for a while now – ironically a journal that some people still think has no peer review system. The peer review I have managed for papers there (managed by scientists, backed up by editorial staff) has been rock solid, lengthy and rigorous. I have zero data to back this up, but it feels as though more people reviewing for PLoS One care about what they’re doing than do those reviewing for some of the Elsevier org chem journals. PLoS One is also trying hard to innovate in the area of article-level metrics.

    As a chemist, parting company with Tet Lett in particular causes mixed emotions. The journal has a weak reputation amongst my co-workers and colleagues these days, but of course there are classic, beautiful papers in there, like Corey’s PCC paper, or seminal reports of Sonogashira couplings and Weinreb amides. My last paper there from 2009 has been cited 20 times already. My first paper was published there. I feel like holding a wake. But good science is not the product of a journal, it’s the product of hard work by people. The last thing we should be doing is paying anyone over the odds to access it back or giving anyone copyright over it. A sad day, but times change which is why times are interesting.

    If you want to join the boycott, you can declare yourself here. You’d be in very good company, in case you think this is just a list of naïfs.

    Eventually I will have to take the same stance on other publishers, with the American Chemical Society looming large. I need to consider the welfare of the students in my group, and their CV’s. It’s really very tough in chemistry – people expect papers in certain places. The ACS is technically a learned society, and has a healthy contribution to the blogosphere etc, but something about its control of the literature just doesn’t feel right. If the data in Scifinder were donated to the public domain chemistry would have its Human Genome Moment.

    My last two papers were in ACS journals because these were the most appropriate places for the students’ work, and because the prestige of the journals helps my students. They were both thoroughly reviewed and published quickly. But this just can’t go on, and I suppose I must soon stop interacting with the ACS too. And, I guess the RSC. One step at a time. With the bigger journals that deal with significant papers and publish items beyond research articles the sense of “value added” is perhaps clearer, too, and the discussion becomes economically more complex. Yes, I’m talking about you, Stuart – if Nature Chem went author-pays, it’d be ($ a lot) per article, I seem to remember.

    I’d be interested to hear from other chemists. It feels our discipline is the most traditional, and almost completely beholden to closed access publishers. It feels we care less about open access than scientists from other disciplines, and that we’re too comfortable with out lot. Comfort is the death knell of academia. We perceive the transformative benefits of open access to data too little, in particular the re-use and mining of large open data sets: the immense power of tinkering, re-mixing, playing. The lack of unrestricted play with the accumulated knowledge of chemical reaction outcomes is one of the key weaknesses of the way we are doing organic chemistry today. For that we need open data. That means open access to the literature.

     
    • antonywilliams 12:46 am on January 27, 2012 Permalink | Reply

      Mat…do you know the RSC’s position on Open Science? Your comments welcome: http://www.rsc.org/Publishing/Journals/OpenScience/FAQ.asp

      • Peter Murray-Rust 3:05 am on January 27, 2012 Permalink | Reply

        Tony, in answer to your question…

        As far as I can see authors can pay 1600 GBP for “RSC Open Science” http://www.rsc.org/images/GeneralLicence-OpenScience_tcm18-64482.pdf. This licence is nowhere near BOAI compliant and requires depositions in a specific repository for non-commercial purposes. It would, for example, prohibit Chemspider from re-using any part of it (before Chemspider became part of RSC). It contains phrases such as:

        > the Owner grants to the RSC the exclusive right and licence throughout the world to edit, adapt, translate, reproduce and publish the Paper

        >The Author(s) may make available the accepted version of the submitted Paper via the personal website(s) of the Author(s) or via the Intranet(s) of the organisation(s) where the Author(s) work(s)

        >Persons who receive or access the PDF mentioned above must be notified that this may not be further made available or distributed.

        >No deposited Paper, whether in an Institutional Repository, Funding Body Repository or personal website, may be forwarded to any website collection of research articles without prior agreement from the RSC.

        In other words the RSC has complete control over the re-use and distribution of the paper.

        It effectively denies *anyone except the RSC* the right to:

        re-use diagrams from the paper
        translate spectra into something useful
        translate parts of the paper into a foreign language
        text-mine the paper

        and even download it from the author’s repository and print it (that is unauthorised distribution)

        The licence is RSC-specific (not Creative Commons) so there is an additional overhead in reading it.

        This is not Open Access according to BOAI.

        Please correct me if the facts are wrong. I am not offering my opinions :-)

        .

    • Peter Murray-Rust 1:28 am on January 27, 2012 Permalink | Reply

      Matt,
      It’s not just pricing, grooty journals, arrogance, etc. It’s about our ability to do what we like without research. My group has built software which can extract a large percentage of reactions out of Tet. Lett and Tetrahedron. If I try to do this I’d be sued within minutes. We could build a system that would make CAS and Reaxsys look 10 years out of date (which they are). But the University of Cambridge would be cut off by Elsevier laywers. Elsevier have treated me with total arrogance, disdain and I … words fail me.

      There is a better future and it doesn’t include current publishers. We shall have to build it ourselves. It’s hard but possible.

    • Rich Apodaca 3:40 am on January 27, 2012 Permalink | Reply

      Mat – you’re an inspiration. Bravo!

      I’m curious – given the needs of your students to find work after graduation, which chemistry journals do you currently see as being: (a) most prestigious; and (b) most consistent with free scientific discourse?

      Also, don’t forget that prestige is fickle. Fannie Mae used to be one of the world’s most respected companies. Apple used to be a joke.

      ‘Author-choice’ journals don’t solve the problem, IMO. ACS and RSC – this means you.

      The future belongs to publishers who can innovate (yes, actually do something different and risk failing) to reduce costs, trim bloated, top-heavy organizations, effectively use information technology, and deliver value to the scientific community – while paying the bills.

      Despite plenty of hot air from publishers, I’ve not found a single one that in any way explained to the scientific community how they’ve done all of the above and still can’t:

      1) let authors keep copyright to their works; and
      2) let readers freely distribute and repurpose all journal articles

      Strangely, I see journals like Beilstein Journal of Organic Chemistry (http://www.beilstein-journals.org/bjoc/home/home.htm), which does both of the above and has been running for many years now.

      • mattoddchem 2:39 pm on January 28, 2012 Permalink | Reply

        Rich – In my field it’s all closed – JACS, JOC, Org Lett, Inorg Chem (for some of my stuff), as well as Angewandte, Chem Eur J, Chem Commun, the new Chemical Science as well as Nature Chem, of course (which is publishing a much smaller number of consistently cool papers). PNAS is very good, and as far as I know more committed to open access (after a delay?). The Beilstein journal is excellent – open access and no fee to submit – amazing. I published there (http://dx.doi.org/10.3762/bjoc.5.67). The current open access journals in my field are finding it difficult to get traction, but are trying hard – Chem Central J, for example. PLoS ONE will take chemistry, but the uptake by the community is slow. I’m on the board of the very new ChemistryOpen from Wiley, and we will see how that goes. For me the ability to pay something to make one article open just won’t do it – the rest of the journal is closed – I’m really interested in the bigger point about the ability to mine and re-use data freely as a spur to innovation. The journal becomes a formal repository – very useful and important as a portal to organised data.

    • Rebecca Guenard 6:16 am on January 27, 2012 Permalink | Reply

      Wow, good for you! That’s huge!

    • Heather Morrison 10:02 am on January 27, 2012 Permalink | Reply

      Bravo! As for past papers, a reminder that Elsevier is green for self-archiving – please liberate your works, get them to your local IR for open access!

    • Egon Willighagen 4:18 pm on January 27, 2012 Permalink | Reply

      Good luck! (Organic) Chemistry is not particularly know for caring about any of this (nor semantics, nor computing)… but I second Rich’ pointer to the Beilstein Journal of Organic Chemistry. And there is Chemistry Central too. What other gold OA options are there for chemists? There is Molecules as ChemComm replacement… others?

    • Tyrosine 4:21 pm on January 28, 2012 Permalink | Reply

      It’s a pretty easy decision for an org chemist to dump Elsevier. Tet Lett and Tetrahedron are not the journals they once were, and there’s plenty of quality competition. I’m not publishing there ever again. Opting out of ACS and RSC is a much bigger call. Who knows maybe BJOC will be the new JOC one day – and getting in early is a good thing.

    • Jan Jensen 8:04 pm on January 28, 2012 Permalink | Reply

      “My last two papers were in ACS journals because these were the most appropriate places for the students’ work”

      I have been thinking a bit about this lately. With journals like PLoS ONE and the fact that most people find articles through search engines, what do we as chemists mean by “appropriate journals”? Why not send everything to PLoS ONE? I don’t do this myself, but why not?

      The only fundamental issue that I see is the prestige factor. But once the paper is not in JACS or Nature Chemistry or 1-2 more, I don’t really see the any real gain in prestige. I am not an organic chemist, but is Inorg Chem and JOC really considered more prestigious than PLoS ONE? Or is it that most chemists think the latter is that new “like” button on Google? ;)

      • mattoddchem 8:21 am on January 29, 2012 Permalink | Reply

        Yes, it’s a nebulous idea “prestige”. I guess it comes down to where people i) trust, ii) browse and iii) are likely to publish, or have published, related work. For those last two papers from my group the journal had recently published related work, which we cited, meaning that those authors are likely to read our paper too. It sounds ludicrous when I say it, but it’s like a common area for a reading group. Before impact factors I think this was even more important – people would submit to the journal that had published the most relevant work recently. There is no need for this today. And yes, why not just submit everything to PLoS ONE? I don’t know.

        One of the things that bothers me about journal hierarchy is that it’s essentially a time-saver for the reader/assessor. If a paper is published in a certain journal we, as a reader, can form an opinion about the likely importance of the paper without having read it. Usually a paper in a top-tier journal has been refereed by 2-3 people. For something like Angewandte, I know from experience that a good report and a mediocre report is enough for a rejection. So let’s say two people will be responsible for a paper getting into a top-tier journal. Is the kind of reassurance we need about a paper’s quality? Have we not all read papers in big journals dealing with a very specialised area and thought “How did THAT get in?” because we are outside that field, perhaps? So we are relying on two people we don’t know, to judge something we are not familiar with and who may have very human political reasons for their decision, and yet we still somehow form a judgement of quality. It’s odd. Nevertheless the brand of certain journals provides us with an opinion of an individual paper. I suspect that will continue, because we as a species like labour-saving devices. So yes, some “top-tier” places will remain and thrive while other journals expand their efforts at article-level metrics and other devices for assessing an individual paper’s importance, rather than having to inductively judge that from an impact factor.

        • Jan Jensen 5:54 pm on January 29, 2012 Permalink | Reply

          I agree with you. But that’s top-tier journals like JACS and Angewandte. And I would submit that a CV filled with only such publications would be impressive.

          But what about the journals where *most* chemists publish *most* of their articles? JOC, J Phys Chem, etc? Do they really add more prestige to a CV than PLoS ONE? That’s one question I am struggling with. (The other is whether it is disingenuous to semi-boycott non-open access journals). Any thoughts?

          Imagining my publication list where all non-top tier publications have been published in PLoS ONE, I am left with another gut reaction: narrow research scope. Generally, publication in a diverse set of journals signals multidisciplinarity. However, this is a non-issue for PLoS ONE.

          • mattoddchem 9:23 pm on January 29, 2012 Permalink | Reply

            Sure, that’s what I was struggling to say. Devil take the hindmost – the top tier journals have the least to fear from open access, but time passes and times change. PLoS Biology/Medicine etc have huge impact factors. PLoS One’s impact factor is, I believe, higher than JOC’s at the moment. It’s a difficult transitional period for open access. Some new journals are coming along that might spur things along a bit. But it is still the case that panels looking at applicants scan paper lists for journals in order to save time – quite reasonable thing to do if there are 100 applicants for a job. I have no doubt this occurs when assessing people submitting grants, too – lower-ranked journals gives one a quick excuse to rate the grant badly. Now imagine we just had one journal. The articles would need to be assessed for importance, not the journal.

  • mattoddchem 10:47 pm on November 25, 2011 Permalink | Reply  

    Europe Trip/Conference Report 

    I had an interesting trip a month ago, funded in part by the Australian Academy of Science. They require a report, and I was wondering if a blog post could double as a publicly-readable and transparent report. Given that the entire trip was for the purpose of open science, this seems appropriate. In the end, they have a proforma, so this is supporting information.

    Sydney-Abu Dhabi-Geneva was followed (no pit stop) by a flight to Madrid to visit GlaxoSmithKline at Tres Cantos.

    There is a group of chemists and biologists there who understand the idea that the free sharing of data brings about more collaborations and faster progress in science. This group really is extraordinary. Their deposition of antimalarial hits in the public domain in mid-2010 is worth thinking about for a moment.

    Here is the sharing of bioactivity data and structures of thousands of compounds. These are not just commercial compounds bought in by GSK, but also synthetic compounds, made by GSK as part of various medicinal chemistry campaigns. Javier Gamo, the lead author of the GSK Nature paper, told me the story of how this deposition came to be, and described the extraordinary leap that occurred in releasing the data. We are all used to seeing talks by big pharma in which the structures of active compounds are not included or “R grouped out”. Indeed, it’s usual for the release of the structure of an active to be associated with a frustrating amount of paperwork. That’s even the case in academia, where I would be seriously advised against sharing data on unpatented bioactives. But here there are thousands of actives. Not only are some of the compounds highly potent, but they are whole-cell potent – an important new movement in the discovery of antimalarial compounds.

    Now I remember when the release of data occurred. It was described as GSK going “open source” in many newspapers. That’s inaccurate. Data are just data, unless people work on them, in which case they become a project. Open source describes a process, an activity. To capitalize on the data released by GSK requires people to work on the data. This is such a rich field of hits that GSK don’t have the time to work through all the series alone. They are seeking partners – indeed that was the rationale for releasing the data in the first place. We’re now working with them, in an extreme form since everything we’re doing is freely available, rather than being part of a more traditional bilateral collaboration.

    Our immediate work is to validate the hits. We’ve just finished their resynthesis, plus the synthesis of a few other compounds. These have now arrived at Tres Cantos who will do IC50s and progress the best compounds to rate-of-killing assays. Stuart Ralph and Vicky Avery are also looking at these compounds – we need to be sure of their worth if we’re going to look at them further. These are sensational in-kind contributions to the project.

    The team at Tres Cantos have evaluated their “TCAMS” set and grouped the best compounds into sets. The arylpyrrole set that is the starting point for the open project is one of those identified. There are lots of others. When I was there the team (including Felix Calderon) told me they’d just published another study where a few of the sets had been evaluated. Interestingly the medicinal chemistry campaign was short and sweet – a small number of variations in different parts of the structure had led to shallow SAR (i.e. small changes in bioactivity from changes in various parts of the structure) and this is a negative for the start of a campaign. This series was abandoned because of this, coupled with there being plenty of other hits to go after. (We need to be following this model in what we’re doing in Sydney). What I found particularly interesting was one of the assays in place to determine the likelihood of resistance occurring to a given compound. This is possibly commonplace in medchem/parasitology, but it was new to me. Emergence of resistance to a known drug is apparently quite repeatable. One can effectively run an “evolution emulator” and re-develop resistance to known compounds. It is then possible to run a parallel experiment with a novel compound, and time the development of resistance/tolerance. A rapid onset of resistance essentially kills that series. This is what had happened with GSK’s evaluation of one of the latest TCAMS sets. Javier’s opinion is to include this assay earlier in the evaluation of hit series. It’s something we’re going to want to do with the arylpyrroles when we have a few more analogs ready to go.

    From Madrid I flew back to Geneva. There is a strangely high concentration of important public health people around the airport. The following morning I had breakfast with Piero Olliaro from WHO/TDR to plan what to do next with our schistosomiasis project following the publication of our papers. We’re looking at some new ways of making the molecule. We’re also excited to be able to provide enantiopure PZQ to any groups needing some. Please contact us if you’re in need, though the procedure’s pretty easy and needs no fancy equipment. Over a coffee and a spectacular pastry we also discussed whether there was mileage in a consortium working on open source drug discovery for schisto. I think there is, provided the consortium is big, and is open. The “neglected” in neglected tropical diseases comes partly from there being a lack of interest in the development of new drugs. Interestingly one of the big drivers for new “schisto” drugs is the veterinary sector. Vet drug providers would be less keen on the openness, naturally. However, their customers might take a different view. Imagine a rotational dewormer identified by a robust, open research process, available at cost, funded in part by the increased productivity of the livestock/fisheries sectors.

    Later that morning I went to the Medicines for Malaria Venture (MMV), another visionary organisation, strongly encouraging the sharing of data and a more collaborative approach to drug discovery. MMV are supporting our current project financially, and last week we just heard that we have secured 3 years’ funding for the project from the Australian Government. I gave a talk at MMV and spoke with our project champions Paul Willis, Jeremy Burrows and Tim Wells. They agreed that the key aims of our current project are 1) to conduct a kernel of activity of the project and 2) to leverage help from others, with particular value arising from practical input, e.g. the synthesis of compounds. It really is my vision for a project of this kind to involve other labs around the world in a coordinated effort, where all data are open and publication of milestones is rapid. If you want to come on board, you can.

    That evening I flew to London for the weekend to see friends and family. It was bizarrely hot in London that weekend. Beautiful to see the new Shard going up rapidly, and I rediscovered the near-perfect Royal Oak pub in Borough round the corner from where we used to live.

    On Sunday I flew to Barcelona, to the EU Congress on Tropical Medicine. I gave a talk on open source drug discovery which generated a lot of interest. It was great to meet up again with Jose Gomez-Marquez from MIT who’s taking a very interesting approach to DIY diagnostics, and whom I first met at SciFoo.

    There were interesting talks at this meeting, but it had quite a broad focus, and there were a large number of policy sessions. I worry a little about the proliferation of groups, i.e. consortia, with names and acronyms, which meet and discuss and emit reports. The crucial feature of many of these organisations is that they are not open, i.e. they operate behind closed doors, and then broadcast. Frequently a funding agency and several universities/NGOs will get together to look into something, say the provision of new drugs for TB. But from my perspective there is little to be gained from my knowing that this organisation exists, because it’s unlikely I will ever be part of it, or influence its direction. Hence I don’t get very excited or involved, and I rather tune out, unfortunately. It did feel as though this meeting was particularly heavy on the “new consortia” announcements, none of whom seem to need anything from anyone. I think if we’ve learned anything in the past few years it’s that it’s important to allow people to input into processes and projects at any stage, and to be detailed about the research being conducted, rather than just summarising it at the end. If I were to advise these groups, it would be to release early and release often, and not polish the outputs too much.

    There was a very interesting talk from Robert Jacobs from Scynexis about the development of a boron-containing drug for Human African Tryps (sleeping sickness). I hadn’t heard of a drug containing boron before, and tweeting on this subject led me, via John Overington, to a post on this very subject by Derek Lowe. I would love to know the point, in a med chem. campaign, when someone says “Hey, let’s try boron now”. It makes me think about the hits we’re looking at. Boron, anyone? Phosphorus? I experimented with Storify to put together the correspondence.

    View the story “Boron-containing Drugs” on Storify

    On the Wednesday I flew to Milan, and then took a train to Modena. This beautiful town was host to the EU COST Action meeting on New Drugs for Neglected Diseases. I had time to check into the hotel and walk to the conference venue before the first session began, and I was due to speak, again on open source drug discovery.

    This meeting was more focussed than the Barcelona meeting, but again I had a lot of very supportive comments about the open nature of the research we were doing, and how openness removes many of the thorny problems of traditional research, such as duplication of effort, or roadblocks to progress because a team is not in touch with the external people needed at a given point. The question I received most, both in Modena and in Barcelona, was What about publications? How do you publish something that’s already out there? I was able to point to the two papers we’d just published the week before to say that this was not a problem, and that publication of open projects is extremely important for bringing people up to speed with where the project is at, as well as marking milestones.

    Dihydrofolate reductase and pteridine reductase PTR1 were mentioned on multiple occasions as targets of interest for rational drug design, with the latter being particularly cool for doing two reactions in one active site – wow. And it also looked interesting from the point of view of maybe being able to catalyze the asymmetric reduction of dihydroisoquinolines, which is a separate project we’re doing in my lab.

    There were a lot of nice talks at this meeting. My view of medicinal chemistry is now a little skewed, and I can’t listen to a talk from a closed group of students/academics about making a molecule against a certain target without thinking “Why aren’t you just sharing all your data and working with all the other groups looking at this area in real time, rather than slowly publishing and telling us about the choicest results at meetings?”

    The COST meeting was extremely pleasant socially, too. You can tell you’re at a European conference when the entertainment is an opera recital in an old church.

    And you know you’re in Modena when your pizza is drizzled in sticky vinegar that looks like crude oil. This one had lard as a topping.

    On Friday I returned to Milan on the train, then flew back to Sydney. Always terrible losing a whole day on the return journey. The whole trip was tremendous, for many reasons. I met a lot of new people who were interested in helping out with our open projects. I had time to think about what we’re doing, and to receive advice and well-informed questions about the approach. Mind duly broadened. Thank you Aus Academy of Science and Leopold Flohe, organiser of the COST meeting.

    There are so many things to do now on the open projects. Besides making new antimalarials, evaluating new catalysts and working to improve the electronic lab books we’re using, we need to recruit new labs to be part of the experimental effort and speed things up as much as we can. That’s our main emphasis in the coming months. We need students, undergrad lab directors, anyone interested in making compounds to join us and become part of a larger team.

    In February next year I’ll be talking at the AAAS meeting in Vancouver on what we’ve been doing. We’re then going to be hosting a one-day open source drug discovery meeting on malaria on February 24th, and it’s great that Saman Habib from the malaria OSDD project in India will be attending. The whole thing will be streamed, for those who can’t make it. I need to get on with organising this, and if anyone has any experience in streaming and archiving a conference in which there is meant to be worldwide participation, I’d be delighted for some pointers because at the moment it looks … challenging.

     
  • mattoddchem 9:07 pm on November 13, 2011 Permalink | Reply  

    Open Science Funding – Government Grants and Cash Incentives 

    We recently started an open source drug discovery project for malaria. Starting with sensational hit compounds from the GSK Tres Cantos dataset, we are trying to convert these hits into good leads by sharing all data and ideas. Every experiment is online. Anyone can take part. In fact one of the main things that’s needed on this project is for people to make compounds. If you’d like to make some, maybe as part of a summer project, or as part of an undergrad thesis (just like Laura, our undergrad, who did just that), or if you are a hotshot synthetic chemist with some time at the weekend, come on board. There are some important compounds that need to be made, and we can get them biologically evaluated and publish the results. We are at the moment directly supported by the Medicines for Malaria Venture, who are providing money and a very high level of intellectual and logistical leadership behind the scenes.

    There are two extremely cool things I want to share.

    One is that last week we found out we were funded on a larger scale by the Aussie government and MMV. This “Linkage” scheme of the Australian Research Council was the way we funded our first open science project with WHO/TDR back in 2008 (this project is still very much active, more of which another time). For the present grant, MMV chipped in cash, and the ARC amplified that up to a full 3-year grant that fully supports a postdoc in the lab to make compounds full time. Depending on resources from other places we may be able to increase that further. We need to – there’s so much to do. Regardless, we will be able to make compounds for 3 years to lead this open source malaria project. I’m blown away by how exciting this is. Open science, funded.

    So my two open science guys and I have been heading over to the pub to talk about the projects (we need to do this more, guys). This is an excellent excuse to drink beer, but we also need to address the central question – how to get people involved. How to leverage and encourage interest from others.

    There are two immediate things. The first is that we’re going to be running an Open Source Drug Discovery for Malaria meeting at Sydney Uni on February 24th. The head of the nascent OSDD Malaria project in India, Saman Habib, is coming. I’ll shortly be advertising this meeting more generally. We’re going to stream and archive it online for those who can’t make it. The aim is to work out how best to do open source drug discovery, plain and simple.

    The second thing is this: when I was writing the malaria grant I was contacted by an organisation I can’t name (they requested to remain anonymous at this early stage) who said they were interested in sponsoring a prize for open source drug discovery. The conversation went something like this:

    Me: “A prize is an interesting idea. That might help create incentives for participation. The reason I’d rejected it is the Gift Relationship – that if you start paying people for things, the quality might go down, because the incentive changes to one of more direct self-interest.”

    Organisation: “Maybe, but maybe not – we could give it a shot.”

    Me: “Sure – maybe we should trial it. We’re scientists, let’s experiment. There’s one problem though – no teams.”

    Organisation: “What?”

    Me: “There can be no teams. If you have teams then people will keep secrets, negating the whole point of open science. Innocentive has teams, meaning people don’t share. It’s just competition between closed groups, which is not open science. It’s an incentive, but it doesn’t change the way things are done. It’s Open Innovation, which is different from open science.”

    Organisation: “How about a prize for the community which is unlocked upon a milestone being reached?”

    Me: [Shocked at the quality of the idea] “I’m shocked at the quality of that idea. In drug discovery there ARE milestones – specific things that can be achieved and quantified. I’d have to ask the malaria and medchem community for what might be appropriate.”

    Organisation: “Sure – consult with them. If you get the grant we’ll try to commit some money. How much? $30K? A million? If the milestone is reached by a certain date, the money will be unlocked. Half could go to the community who played the most active role in the solution. Half could go to a charity treating malaria.”

    Me: “That’s also a good idea. Apportioning the prize money could be decided by the community themselves. We’d have to disqualify anyone who did not play by open science rules. Interesting. Let’s see if we get the grant.”

    Well, we got the grant. So I can now call upon this organisation to pledge the prize and see if they sign off. Accordingly, I need to work out:

    1) Whether a prize (a team-less prize) is a good idea, or whether we should avoid cash incentives altogether. I’m torn. Need advice from open science/crowdsourcing advocates.

    2) If we did have a prize, what kind of milestone should we set? We are starting with nanomolar compounds in a whole-cell assay. These are astonishing hits. What criteria for lead progression should we include in a milestone? Something that is achievable in 18 months. This is the technical medchem/malaria question for which we need advice.

     
    • Cameron Neylon 9:38 pm on November 13, 2011 Permalink | Reply

      This is brilliant! At some point I’d go even further and suggest open competitive milestone payments. Sure you can go with a closed team but if the open community is moving faster wouldn’t you want to be in on that? Anyway in the shorter term I agree that finding the right balance of target and amount is crucial for making this work effectively. But even just the idea is making my brain work around how I might be able to contribute…

    • Rajarshi Guha 7:56 am on November 14, 2011 Permalink | Reply

      What about defining some sort of property profile (permeability, solubility, some form of toxicity etc)? This would depend on having access to these types of assays. Also, what strain of malaria are you looking at curretnly? Might it be useful to have a milestone that considers (some degree of) cross-strain activity. Have you run cytotox assays on the compounds that you guys have been synthesizing? (I think the original GSK cmpds had cytotox info on them (?))

      The idea of a prize is nice and not wanting individual teams makes sense, if the goal is open science. But having said that, its not clear that the prize itself would encourage participation in open science, given that individuals or groups won’t get the prize. It seems that the primary driver is scientific interest, credit (and the feeling that you’re helping others).

      • Cameron Neylon 7:09 pm on November 14, 2011 Permalink | Reply

        @Rajarshi I don’t know. My response was one of, well if there is a prospect of money to keep the project going then that really got me interested. Maybe I might get some of it but it seems psychologically powerful in building a community…we’re all pushing together towards some key milestone and if we get there then the community can continue. I think it avoids some of the extrinsic/intrinsic motivation and gift relationship issues because the unit being funded is the community not a single person. At the same time the community would need to have a shared notion of how the prize would be used and I think there is probably a requirement for a benevolent and trusted dictator to run the process.

    • mattoddchem 1:44 pm on December 10, 2011 Permalink | Reply

  • mattoddchem 10:14 pm on October 24, 2011 Permalink | Reply  

    Open Access Week 2011 

    Alex Holcombe has led a letter to an organization responsible for running a “Responsible Conduct of Research” course at some US institutions. A few of us found one part of this course odd, in that it appeared to suggest it was irresponsible to suggest blogs could play a role in science. For the full story, see Alex’s original post. The text of the letter is posted below.

    Alex also co-wrote and created the very amusing open access video imagining scientist-meets-publisher. “Your royalty share will be zero percent”.

    I’m giving a talk at the University of Sydney’s open access event, along with Alex, on Friday. This morning, Monday at 4 a.m. I gave a talk on open source drug discovery at the (tremendous) Open Science Summit 2011 that took place in Mountain View, California – I was sitting in Sydney. Daniel Mietchen pointed out that this timezone feature makes me likely the first person to give a talk for Open Access Week 2011. Like the first fireworks of the New Year, only less impressive.

    ***
    Dear Professor Braunschweiger (CITI co-founder) and Professor Ed Prentice (CITI Executive Advisory Committee chair):

    We write to challenge the answer to one of the questions in the “Responsible Conduct of Research” online course. The question reads “A good alternative to the current peer review process would be web logs (BLOGS) where papers would be posted and reviewed by those who have an interest in the work”. The answer deemed correct by your system is “False” and the explanation provided includes the assertion that “It is likely that the peer review process will evolve to minimize bias and conflicts of interest”.

    We question these claims for two reasons. First, we see real examples of rigorous science happening outside of the traditional system of journal-based peer review. Second, we believe that the future path of scholarly communication is uncertain, and indicating to young researchers that such an important issue is closed is both inaccurate and unhelpful to informed debate.

    As an example of science that does not fit the mold suggested by the phrase “the current peer review process”, consider the use of the arXiv preprint server in certain areas of astronomy and physics. In these areas, researchers usually begin by posting their manuscripts to the arXiv server. They then receive comments by those who have an interest in the work. Some of those manuscripts subsequently are submitted to journals and undergo traditional peer review, but many working scientists stay abreast of their field chiefly by reading manuscripts in the arXiv before they are accepted by journals.

    Even in areas that are more tightly bound to traditional journals, there are recent examples where both effective peer review of science [1] and science itself [2] have occurred primarily via blogs and other online platforms. In these cases, the online activity appears to have resulted in more rapid progress than would have been possible through the traditional system. A growing body of research suggests that scholars use social media in ways that reflect and produce serious scholarship [3][4][5].

    As for the future path of the current mainstream peer review model, we believe it is speculation to say that “It is likely that the peer review process will evolve to minimize bias and conflicts of interest”. The current peer review process may be under considerable strain [6] and unfortunately there is little evidence that it significantly improves the quality of manuscripts [7]. This raises the possibility that big changes are required, not just modifications to reduce bias and conflicts of interest. Furthermore, the question presupposes that the future entity into which peer review will evolve does not involve blogging. No one can see the future clearly enough to make that assumption.

    We encourage discussion of this important topic, and would be interested in the inclusion in your program of material that sparks such discussion. However, we believe a true/false question on this topic to be inappropriate, as it limits rather than promotes discussion. All of us wish to see the development and optimization of rigorous systems, both new and traditional, for scientific scholarship. Requiring young researchers to adopt a particular position on this controversial, multifaceted issue may hinder open discussion and future progress.

    Sincerely,

    Bradley Voytek, PhD, University of California, San Francisco Department of Neurology
    Jason Snyder, PhD, National Institutes of Health, USA
    Alex O. Holcombe, PhD, School of Psychology, University of Sydney, Australia
    William G. Gunn, PhD, Mendeley, USA/UK
    Matthew Todd, PhD, School of Chemistry, University of Sydney, Australia
    Daniel Mietchen, PhD, Open Knowledge Foundation Germany
    Jason Priem, School of Library and Information Science, University of North Carolina at Chapel Hill
    Heather Piwowar, PhD, DataONE/NESCent, Canada
    Todd Vision, PhD, Department of Biology, University of North Carolina at Chapel Hill
    Cameron Neylon, PhD, Science and Technology Facilities Council, UK, Editor in Chief, Open Research Computation

    [1] Online experimental peer review of the “Arsenic Life” paper that recently appeared in Science: http://rrresearch.fieldofscience.com/2010/12/arsenic-associated-bacteria-nasas.html
    [2] Open Science is a Research Accelerator, M. Woelfle, P. Olliaro and M. H. Todd, Nature Chemistry 2011, 3, 745-748. http://www.nature.com/nchem/journal/v3/n10/full/nchem.1149.html
    [3] Groth, P., & Gurney, T. (2010). Studying Scientific Discourse on the Web using Bibliometrics: A Chemistry Blogging Case Study. Presented at the WebSci10: Extending the Frontiers of Society On-Line, Raleigh, NC: US. Retrieved from http://journal.webscience.org/308/
    [4] Priem, J., & Costello, K. L. (2010). How and why scholars cite on Twitter. Proceedings of the 73rd ASIS&T Annual Meeting. Presented at the American Society for Information Science & Technology Annual Meeting, Pittsburgh PA, USA. doi:10.1002/meet.14504701201
    [5] Weller, K., Dröge, E., & Puschmann, C. (2011). Citation Analysis in Twitter. Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences. Proceedings of Making Sense of Microposts Workshop (# MSM2011). Co-located with Extended Semantic Web Conference, Crete, Greece.
    [6] Smith R. Classical peer review: an empty gun. Breast Cancer Research 2010, 12(Suppl 4):S13 http://dx.doi.org/10.1186/bcr2742
    [7] Jefferson T, Rudin M, Brodney Folse S, Davidoff F. Editorial peer review for improving the quality of reports of biomedical studies. Cochrane Database of Systematic Reviews 2007, 2:MR000016. http://dx.doi.org/10.1002/14651858.MR000016.pub3

     
  • mattoddchem 9:14 pm on September 18, 2011 Permalink | Reply  

    The Broader Chemical Community’s View of Uploading Data 

    Opening up your research to the world means you a) benefit from the opinions and knowledge of The Many as you’re doing the research (rather than months afterwards), and b) have to get your research into shape because The Many can cast a critical eye on what you’re doing in a never-ending process of peer review. Science benefits from these things.

    Sharing data is the central part of open science. A necessary, not a sufficient, condition, but central none the less. One cannot be selective about which data to share, because that would mean making a value judgement about what’s important. And what’s unimportant today may important tomorrow. So let’s just share data.

    Outside of open science (and our community of zealots) we should also be encouraging people to share data as part of traditional research publications. Many of us do, as PDFs of NMR spectra, for example. This common practice is very useful for the refereeing process, to determine whether the science is valid. Sharing PDFs is less useful for science because the data in a PDF are dead. Live data can be played with, PDFs can’t. Puppy vs. roadkill. Cow vs. hamburger. We should be submitting raw data to journals along with our traditional reviewer-friendly supporting information. And we should be asking journals to keep the data outside the paywall.

    I recently asked a question about how we should share chemical data – i.e. what data formats would be best. There is an IUPAC standard which we’ve not particularly enjoyed, and we’ve been thinking about just sharing data in as raw a state as possible. Other people picked up on this and provided very useful comments and suggestions here, here and here, as well as in comments to the original post. Thanks guys.

    There’s no consensus, though the IUPAC standard does have its fans, and (I didn’t realise) is a data format that can be used for other spectroscopic techniques rather than simply NMR. I won’t pretend to understand how that’s possible, but it’s interesting.

    We’ll keep thinking about this. For our current ELNs we’ll continue to post data and see how we go over time.

    However, I think we need to address a background question: how will any solution scale to the broader chemical community? I’m not talking about the technical issue of file format, or what to share. I’m talking about psychology.

    My theory is: any solution to data sharing that relies on chemists uploading their data to a central point, or in a proscribed way, will not scale.

    Solution: we need to be building solutions that can find chemical data on the web, extract the data and index them. i.e. a solution that involves as little electronic work as possible for the experimentalist.

    I think that this is probably very hard, but can’t judge. I don’t know how you get a bot to understand that there is chemical content on a web page, and extract it automatically. I don’t know how you can trust the results. I don’t know what happens when the source web page dies. But I know science needs tools of this kind, and that this is what we’ll be doing in 20 years’ time.

    Analogy: Google. Imagine if Google had said “Once you’ve created a web page, just send us the details and we’ll put it in our index.”

    If we, for a moment, look outside our Band of Open Source Brothers, we see a vast community of talented researchers in chemistry who spend their time making molecules in the lab. To date this community either does not see or does not agree with the advantages of doing science openly, or has no need/wish to engage with the issues, or does not see the advantage of sharing data in traditional publications in a way other than PDFs. I see those advantages, and many people I talk to see the advantages, but the vast majority of chemists do not, yet, for whatever reason. Why, then, would a chemist, who is already busy with work, life, family, thesis writing and everything else, sit down and start uploading data to the web? Remember that our chemist, representing 95% of chemists out there, does not agree that doing so is worthwhile (or because they’re not allowed to). There is no incentive. For the incentive to take hold requires the world to change, and that’s going to take some time. It’s also the case that the community is not used to it. We’re used to publishing papers, then having the data appear, as if my magic, in SciFinder or Beilstein or whatever. So we have no problem providing the paper and the data, but we expect others to make it searchable.

    Now, I’m a serious fan of Chemspider, and I’ve just come across Figshare. Excellent services. They’re pioneering. They must succeed, and I think to succeed there needs to be a shift from “hoping for user upload” to “bloodthirsty, active data extraction from disparate sites”, however difficult that might be. Anthony, Mark – I’d like to know your thoughts and what I can do to help. I’m whining because I want your work to flourish.

    People I speak to then say sentences that begin “But all you have to do is…” and “But it’s easy – you just…” – no. It’s no good. Expecting chemists to upload their data to a specific place will not scale. If there’s an activation energy barrier for me, there’s a orbital-forbidden transition state for most people.

    Rather, data need to be posted openly somewhere online:

    a) To a lab book if you’re an open scientist

    b) To an institutional repository if you’ve just finished a thesis, or generally want to share

    c) To supporting information files, if you’re the author of a paper in a journal

    whatever is easiest and convenient locally. i.e. there can be a bunch of different solutions.

    We can rely on this happening, because this is easy, and related to what chemists are doing right now. We can say to chemists: “Hey, do the research, post data. Wherever you want – either on your own webpage, or provide the data when you submit publications and ensure that the data are not behind a paywall. Here are some guidelines on file formats, but really just post the data. We’ll find the data. We’ll tag them so that other people can find them, and then you’ll see how great it is that you shared the data.

    If there is a way of doing this, or finding data people post wherever, and automatically making sense of it, we’ll start seeing some big changes to how things are done. People will start to see the benefits of openness in itself, and we’ll start to move towards an astonishing change – chemists collaborating in real time by finding other people who are working on their molecule/reaction right now.

     
    • Mark Hahnel 7:11 pm on September 19, 2011 Permalink | Reply

      Thanks for asking Matt. I agree to an extent, but our opinions do differ on some things. I agree that the easiest way to make data re-use immediate is “bloodthirsty, active data extraction from disparate sites”. I also believe there is a role, which will grow, for crowdsourcing researcher data. The key here is the carrot /stick analogy. Wheels are in motion and there is more discussion happening in select fields of research with funders with regards to the stick. Researchers have a moral and ethical obligation to make all of their research data if funded by public money. This obviously isn’t enough right now, maybe mandates from funders will provoke some form of response.

      As a former researcher, my personal viewpoint is that researchers need to see the obvious benefits to their career, or the process needs to be so stupidly simple that it trumps their current data management plan. Here at FigShare, we are trying to do the bits that we can in a multi-pronged attack. We are developing away with the aim of making research data sharing fast and simple. If researchers need to be trained how to use your software, the uptake is likely to be low. We are attempting to have conversations with the funders and institutions about how they can do their bit. Any funders or institutions, please get in touch (mark@figshare.com). Finally, we are doing as you suggest and pulling the research objects out of Open Access publications, making the figures, datasets and videos available as individual citable, sharable and easily searchable research objects. By doing this and linking back to the original papers, we are making researchers previously published research more discoverable. By adding value to the data in this manner, we can provide a service for non OA publishers too. We are yet to start these conversations, but it would be a good way to start linking the data and to show the direct benefits of researchers uploading their data directly. Any other suggestions and feedback is always welcome. For me the most interesting part is the carrot. What incentives do researchers nee before they decide to do this themselves?

    • mattoddchem 8:47 pm on September 19, 2011 Permalink | Reply

      “Stupidly simple” is right. As I was writing this post it occurred to me that we need a button on a browser that says “Share these data” in the way that I can add a paper to Mendeley and it knows what I’m trying to do (most of the time). So I write up an NMR spectrum, and I post the raw data to that web page, as well as an InChI. I then say “OK, share this” and the data are extracted. Sounds simple. Probably horrendously difficult.

      As to showing people what the benefits are: agreed. Let’s lead by example.

  • mattoddchem 10:08 pm on August 7, 2011 Permalink | Reply  

    Raw Data in Organic Chemistry Papers/Open Science 

    Open science is a way of conducting science where anyone can participate and all ideas and data are freely available. It’s a sensational idea for speeding up research. We’re starting to see big projects in several fields around the world, showing the value of opening up the scientific process. We’re doing it, and are on the verge of starting up something in open source drug discovery. The process brings up an important question.

    I’m an organic chemist. If I want people to get involved and share data in my field I have to think about how to best share those data. I’m on the board of more than one chemistry journal that is thinking about this right now, in terms of whether to allow/encourage authors to deposit data with their papers. Rather than my formulating recommendations for how we should share chemical data, I wanted to throw the issue open, since there are some excellent chemistry bloggers out there in my field who may already have well-founded opinions in this area. Yes, I’m talking about you.

    The standard practice in many good organic chemistry journals is not to share raw data, but typically to ask for PDF versions of important spectra, usually for novel compounds. These naturally serve as a useful tool for the peer-review process, in that a reviewer can easily see whether a compound has been made, and say something of its purity. Such reproductions are not ironclad guarantees that a compound has actually been synthesised, nor that it was the reported process that actually gave rise to that sample. Nonetheless, it’s useful to the reviewer.

    Are PDF reproductions useful to science? Well, not really. Peter Murray-Rust talks about PDFs as being “hamburgers”. I think I understand what he means: PDF data are dead – actually very dead, and the cow would be more interesting. You can’t DO anything with a pdf. You can’t take the data and do anything with them. Nobody can re-analyse the spectrum, or zoom in. The spectrum can’t be understood by a machine with any accuracy. Data are lost in conversion.

    With raw data, you allow other people to check the data. You also allow them to re-analyze. You allow computers to take the data and do interesting things. If all data were raw, you could ask the interweb, for example, “Find me examples of compounds containing an AB quartet with a coupling constant above 18 Hz. And the molecule needs to contain nitrogen. And synthesized since 1987. And have a melting point.” Maybe that question’s important, maybe not. But with raw data you can at least ask questions of the data.

    What are the downsides of posting raw data in organic chemistry, either in papers or to lab book posts:

    1) You have to save the data and then upload them. Well, this was a problem in 1995, but not now.

    2) The data files are large. Not really. A 1H NMR spectrum is ca. 200KB.

    3) It’s a pain. Yes, a little. But we must suffer for things we love.

    4) People might find mistakes in my spectra/assignments. Yes. You’re a scientist. This is a Good Thing.

    An important fact: For many papers, supporting information is actually public domain, not behind a paywall along with the rest of the paper. The ACS, for example, would, by posting raw data as SI, allow the free exchange of raw spectroscopic data. That would be neat.

    I wouldn’t advocate stopping PDF reproductions, necessarily, since these are still useful for review, and for the casual reader. We’re likely to keep using PDF for our electronic lab notebooks, but the data need to be there too. Like ortep and cif – picture and data.

    If we can establish that we should be posting raw data, then what kinds of data should we share, and how? This post is meant to outline an answer, and ask for feedback from anyone who’s already thought about this.

    1) X-ray crystallography. This is the exception. Data are routinely deposited raw, and may be downloaded. Not always the case, but XRD blazes a trail here.

    2) NMR spectroscopy. The big one. IUPAC recommends the JCAMP-DX file format. Jean-Claude Bradley has been a proponent of this format, and has demonstrated how it can be used in all kinds of applications. We’ve played with it, and in one of our recent papers we deposited all the NMR data in this format in the SI. We’ve been posting JCAMP-DX files in our online electronic lab notebooks, e.g. here. My opinion of this file format (both generating it, and reading it) has not been great. There are two formats, I understand, and we found that if we saved the data in the wrong format, we couldn’t read the data with certain programs, but could with others. i.e. we had to get the generation of the file just right. That kind of trickiness, though small, just inevitably means people won’t bother to generate or use the files on a mass scale (unless the journals decide to back it). PDF’s popularity is based on the ubiquity of the reader. JCAMP-DX works well with Jspecview, a free, open source NMR data reader. We’ve not enjoyed our experiences with this, either, though it’s a wonderful endeavour. This led us to look at whether there was a need for saving the data in a particular format, or whether we could just save the raw data, and process those data with a free piece of software. After looking at this with our resident NMR guru, Ian Luck, we found that saving raw data is easy (it’s just a copy and paste of what’s produced by the machine) and that the raw data can be read by free software such as Spinworks or ACDLabs, obviously in addition to our in-house software. This seems ideal? Does anyone have the reason IUPAC prefers a derived data format over the raw data, other than JCAMP-DX is a single file? Aren’t raw data likely to be the most generically useful long-term?

    I don’t know if people have experience of this. I was in touch with one of the ACS journals recently, who indicated that their view was that the journal is not a data repository, and that posting of raw data (which was in their view to some extent desirable) should be posted elsewhere, e.g. to an institutional repository. This is an option. I think it’s less convenient. PLoS seem happy to host the data.

    3) IR data. Don’t know if there is a standard. If the file is small, saving raw data could be encouraged. Would allow easy comparisons of fingerprint regions.

    4) Mass spectrometry. It’s not clear to me there is a huge advantage here to sharing raw data, for a typical low res experiment?

    5) HPLC data. Again, the outputs are fairly simple, and I’m not clear about the advantage of raw data (which I’m assuming would be absorbance vs. time table). Would (perhaps) permit verification that traces have not been cropped to remove pesky impurities.

    6) Anything else?

     
    • Jean-Claude Bradley 11:52 pm on August 7, 2011 Permalink | Reply

      Mat – you can share JCAMP-DX spectra without asking people to download software. Just upload the file to any open server and append the url from service #4 here:

      http://onswebservices.wikispaces.com/NMR

      It uses the non-Java ChemDoodle components so should work on Mac, many smartphones, etc. In your case I believe the issue was spaces in the filename – if you remove those it should work fine – let me know. Click on this link to see what it should look like:

      http://tinyurl.com/432tdbn

      As for other forms of spectral data you can do pretty much all of them using JCAMP-DX, as shown in our SpectralGame options (C NMR, IR, UV)

      http://spectralgame.com/

      MS can be done too.
      Another advantage of having the NMR in JCAMP-DX is that you can call web services to automatically integrate within a Google Spreadsheet, for calculating solubility for example: See link #3

      http://onswebservices.wikispaces.com/NMR

    • Peter Murray-Rust 12:19 am on August 8, 2011 Permalink | Reply

      Mat, great post – answering various points:

      >>>Open science is a way of conducting science where anyone can participate and all ideas and data are freely available. It’s a sensational idea for speeding up research. We’re starting to see big projects in several fields around the world, showing the value of opening up the scientific process. We’re doing it, and are on the verge of starting up something in open source drug discovery. The process brings up an important question.

      I am exciting about the OSDD effort(s) and think there is a lot of Open technology they can use.

      >>>I’m an organic chemist. If I want people to get involved and share data in my field I have to think about how to best share those data. I’m on the board of more than one chemistry journal that is thinking about this right now, in terms of whether to allow/encourage authors to deposit data with their papers.

      Many already do “require” PDFs. There is no agreed way of doing it, but if what you mean is depositing JCAMPs then YES. The OS community can hack any variants

      >>>1) You have to save the data and then upload them. Well, this was a problem in 1995, but not now.

      agreed – trivial in time and size of files

      2) The data files are large. Not really. A 1H NMR spectrum is ca. 200KB.

      >>> 3) It’s a pain. Yes, a little. But we must suffer for things we love.

      see below

      >>>4) People might find mistakes in my spectra/assignments. Yes. You’re a scientist. This is a Good Thing.

      Yes – and some bad chemistry has been detected and corrected

      >>>An important fact: For many papers, supporting information is actually public domain, not behind a paywall along with the rest of the paper. The ACS, for example, would, by posting raw data as SI, allow the free exchange of raw spectroscopic data. That would be neat.

      The ACS requires CIFs and I congratulate them. If they could just extend that to JCAMPs and computational logfiles that would almost solve everything

      >>>1) X-ray crystallography. This is the exception. Data are routinely deposited raw, and may be downloaded. Not always the case, but XRD blazes a trail here.

      True for all OA journals (but not much crystallography here except IUCr ActaE), RSC, IUCr, ACS require CIFs (Applause). Wiley, Springer, Elsevier do not publish this supplemental data. Only available from CCDC and then not in bulk without subscription.

      >>>2) NMR spectroscopy. The big one. IUPAC recommends the JCAMP-DX file format. Jean-Claude Bradley has been a proponent of this format, and has demonstrated how it can be used in all kinds of applications. We’ve played with it, and in one of our recent papers we deposited all the NMR data in this format in the SI. We’ve been posting JCAMP-DX files in our online electronic lab notebooks, e.g. here. My opinion of this file format (both generating it, and reading it) has not been great. There are two formats, I understand, and we found that if we saved the data in the wrong format, we couldn’t read the data with certain programs, but could with others. i.e. we had to get the generation of the file just right.

      Don’t fully understand this. There are actually several formats but the OpenSource software reads all of them. CML-Spect supports these and is readable by JSpecview. This need not be a problem if people have the will to solve it.

      >>>I don’t know if people have experience of this. I was in touch with one of the ACS journals recently, who indicated that their view was that the journal is not a data repository, and that posting of raw data (which was in their view to some extent desirable) should be posted elsewhere, e.g. to an institutional repository. This is an option. I think it’s less convenient. PLoS seem happy to host the data.

      I have an idea, which I think will fly.

      >>>3) IR data. Don’t know if there is a standard. If the file is small, saving raw data could be encouraged. Would allow easy comparisons of fingerprint regions.

      JCAMP will hack this

      >>>4) Mass spectrometry. It’s not clear to me there is a huge advantage here to sharing raw data, for a typical low res experiment?

      JCAMP will do this for “1-D” spectra (e.g. not involving GC or multiple steps

      >>>5) HPLC data. Again, the outputs are fairly simple, and I’m not clear about the advantage of raw data (which I’m assuming would be absorbance vs. time table). Would (perhaps) permit verification that traces have not been cropped to remove pesky impurities.

      Again it wouldn’t take much to solve this

      >>>6) Anything else?

      I think we should use FigShare (see http://blogs.ch.cam.ac.uk/pmr/2011/08/03/figshare-how-to-publish-your-data-to-write-your-thesis-quicker-and-better/ ) and I’ll explain why in my blog in a day or so

    • Rifleman_82 2:27 am on August 8, 2011 Permalink | Reply

      I’ve recently encountered the problem you mentioned with .jdx files when i tried to upload some spectra to ChemSpider. It’s a shame that the journals are not interested in becoming data repositories of experimental data. Perhaps not “Open Notebook”, but uploading spectra of known compounds to ChemSpider is helpful for other workers. A way to check if whatever you made is authentic, for example. I’m not sure how hard Tony Williams looks at the data. For what it’s worth, he’s an NMR specialist. It’ll be nice if they can have a front end which allows it to act like an open source SciFinder/Reaxys/SDBS.

    • Rifleman_82 2:28 am on August 8, 2011 Permalink | Reply

    • Alex 6:47 pm on August 8, 2011 Permalink | Reply

      >It’s a pain. Yes, a little. But we must suffer for things we love.
      Now I know what I will say to my girlfirend/workers/friends.

    • Richard Kidd 9:19 pm on August 9, 2011 Permalink | Reply

      Hi Matt

      The RSC are more than happy to get the raw data alongside papers and host with the (Open) ESI, with a couple of provisos –

      1. We’d start having difficulties if the files got too big – which I think is where DataCite comes in – but no problem for jcamps, excel files etc
      2. For peer review purposes we do need pdf versions of the table/spectra – not necessarily ideal, and building in the viewers for the data file isn’t impossible – but ease of review is important

      And also – following Rifleman_82’s post – anyone can load up their jcamp spectra against a compound (or add a new compound then attach the spectra) on the RSC’s ChemSpider, and mark it as Open Data.

      Am happy to follow up with

    • Antony Williams, ChemConnector 10:24 pm on August 15, 2011 Permalink | Reply

      Mat, Great post…similar questions are being asked by many people already. I have responded to your comments here http://tinyurl.com/3vngnwd. I think overall for the problem you are out to solve that RSC ChemSpider is already most of the way there, certainly in terms of the majority of the data you are discussing. We support spectral data and CIFs already. We could manage the raw data files directly (meaning binary file vendor formats as acquired…FIDs for example) but I don’t think most people would care. They would want the processed NMR spectra. But, of course, spectra are better than PDF files. I’d love to get your data collection from the PLoS article to host on ChemSpider. At present I have to download them one at a time, draw the structure and upload one at a time but we can do it in batch if you want to provide the batch of files to us. We’ve done it for hundreds of pairs of spectra and structures before now. Thanks

  • mattoddchem 11:30 pm on June 27, 2011 Permalink | Reply  

    I’m a Scientist Get Me Out of Here! 

    A week or so ago I was a contestant on the inaugural Australian version of I’m a Scientist Get me Out of Here! Scientists were gathered in an online area – 5 in each zone – and peppered with science questions by school kids. The questions could be on anything, and came in directly via the website, or during frantic real-time chat sessions where we’re really interacting with the kids. The event was recently piloted in the UK, but this was the first time it was run elsewhere.

    Naturally kids have access to people with science backgrounds – their teachers, first and foremost. They can also read stuff and watch stuff on TV and read things online. But this competition gives them a chance to interact directly with practicing scientists, and that doesn’t normally happen.

    After a week or so of asking questions the students start voting for which scientist they’d like to stay, and the one with the least votes is evicted. One eviction per day until the winner is declared and awarded $1000 to help fund a science outreach activity. The evictions were pretty brutal. It was interesting that many of us spent a lot of the week describing good evolutionary arguments for various things, but when you’re actually part of a survival of the fittest exercise, suddenly it’s not so great. I lasted till the final two in the Hydrogen Zone, and was pipped to the post by Aimee Parker, a Monash Honours student and budding science communicator with a flair for explaining science of all kinds. Congratulations Aimee, well deserved.

    So how was it? It was a total blast. Any scientists reading this – sign up to get involved next time round.

    The questions would sometimes come in late at night – one batch were released around 11 pm, and I found myself typing away for a couple of hours like some sci-junkie. The addictiveness comes from the fact that it’s a competition, sure, and you want to answer questions first so that you can get your answer in first. But it’s much more the kinds of questions you get (which are on a broad range of things) and partly because you feel that the kids actually want to know the answers. You can also wax lyrical about what science is and what you do in your work. Then you can switch to talking about relativity and GM foods.

    Some highlights included an excellent question on what are the 5 commonest molecules (see how the ambiguity necessitates a long answer), a question on predicting when we’ll be eating synthetic meat, various questions about lightning, a truly awesome question about what happens if you’re in a car going at the speed of light and you switch on the headlights (needed several goes at that one)  and another priceless one that generated a lot of analysis about what it’s like at the centre of the Earth if you dug a hole there.

    There were also lots of solid, sensible questions that it felt good to answer. A lot of questions were Googlable/Wikipediable, but were still asked, which may say something about children’s healthy skepticism of answers on the web, or their over-faith in the authority of scientists to talk on any subject. Interestingly there were quite a few questions (both on the website and in the furious chat sessions) about a) whether the world would end in 2012, and b) evolution/big bang/origin of life. On the first of those, it was interesting that the kids were all asking about the supposed 2012 apocalypse, but that hardly any of them believed it. So the idiotic meme was successful but did not stand up to much thought. On the other hand the questions about evolution and related things indicated that a lot of kids were wrestling with the contrast between religion and science and it wasn’t clear in many cases which way they were going. The questions were often phrased with a hint of disbelief that things could just have “arisen” or “happened” which perhaps suggested that the kids weren’t so happy with all the uncertainties of the current state of scientific origin theories. I was at pains to point out that uncertainty is good, because it makes us ask questions, and that science is about probabilities rather than absolutes. But it’s still a challenge, and it was great to be able to lay those challenges out.

    Thanks to the wonderful team behind the event (Kristin, James, Sarah) for making everything work so well and adeptly fielding the hilarious curveballs that would crop up in the chats, and thank you to all the school children who asked stuff (particularly the ones who voted for me – I love you guys…) The kids made it such a cool event by virtue of their most awesome weapon – curiosity. All power to them.

     
  • mattoddchem 10:55 pm on March 24, 2011 Permalink | Reply  

    Open Science Student Projects 

    We’re launching a new kind of student project in synthetic organic chemistry. The idea is this: any student anywhere in the world can join in, provided all data are posted openly online. We aim to publish the research with all participants.

    What we’re looking for are students who can actually carry out practical experiments in a lab and upload data. This project is ideally suited to being run as part of a formal undergraduate laboratory course, but any student can join in. We’ve just started this at The University of Sydney this year – one undergraduate, Clara, is working on the project, and she will be joined by others later in semester. Our first partner to sign up is Stanford University, where lab director Charlie Cox has run the project as an option in third year undergrad lab. A few people have contacted me informally about running the project at their own universities, and we’re going to try to secure some money to help run the project in some universities in Africa.

    This post opens up the project to the rest of the world. If you’re reading this, and would like to join in, then yes, you can.

    The project concerns the optimization of our resolution of praziquantel. Though the route is easy to perform, and efficient, we’re looking for ways to improve the route still further to bring the cost down.

    There are a number of things we can look at. We will need to set up some online forums where the project can be discussed. If you have any questions now, you can post them below, or on the Friendfeed room, or get an account on Labtrove (our open source ELN) and post things here, or tweet me, or comment on The Synaptic Leap. There’s also email, which I’m trying to discourage, but please use this if you don’t want to discuss possible involvement in the project in the open.

    The relevant online lab notebooks to which you’d contribute if you took part are here.

    Any student contributing data can then take part in writing the resulting research paper, which is here. Once you’ve contributed an experiment, add your name to the paper, and start making changes to the manuscript. For undergraduates this is exciting because they can take part in real research, generating new data, rather than repeating experiments with known outcomes. This way we also aim to generate a real research publication – very useful for students interested in a career in research.

    Some very important points:

    1. All data generated by students are to be deposited openly on the web. Please don’t take part and not share all data – no point in doing that. Use the ELN like a real lab book – don’t leave things out.

    2. We’ll publish when we’ve reached a significant milestone. What that is depends on what people do, so we can decide this later.

    3. Students who contribute experimental data can be authors and can edit the paper.

    4. All reagents ought to be inexpensive and generally available – this is kind of the point. The starting material itself, praziquantel, is ironically not that cheap from most commercial suppliers. At the outset of the project, we can provide PZQ to labs wanting to take part – we’ll just mail you some. We’re looking for a longer-term solution to this once things get going.

    5. I/my group are starting this up and, for convenience, hosting it, but we don’t own it. If other people work on this project so much they start taking it over and leading the science, that’s perfect. Leadership in open projects is fluid. Thus anyone who takes part works for the project, certainly not “for” me or my group. There is no other incentive to taking part than getting the job done and finding a route to this enantiopure drug that’s viable for scale-up.

    If you’re a student who wants to take part, go hassle your lab director/PI. If you’re a lab director reading this, please consider having a cohort of students try this lab. This is a real optimization of a real process involving a real drug that affects millions of people.

    Background to the science involved can be found here. There’s a pdf there that describes some of the chemistry. Essentially, though: the resolution is several steps, and each needs improvement. There’s an initial hydrolysis of the drug, synthesis of resolving agents, the resolution itself, and then the re-isolation and purification of enantiopure drug. Each step works, but needs to be better. There are lots of very nice crystalline solids throughout. We can’t use chromatography. We need inexpensive reagents, and environmentally benign solvents. We need high yields, and effective recycling strategies. And so on.

    There are other examples of distributed student involvement in science. William Scott and Martin O’Donnell began a related project in 2009 called D3, and there were some papers describing this excellent work. The difference here is that our project is open, in the sense that anyone can participate and all data are freely available as they are acquired. That may make it more chaotic. It may also make it more effective. Part of the innovation here for people taking part is working that out.

    It’s also fitting that this project is being launched during the International Year of Chemistry. We’re trying to use the web not just to share data, but actually to collaborate on a real research question in experimental lab science. If you’ve an interest in trying to solve this problem, you’re free to join a worldwide effort. There’s an interesting “crowdsourcing” experiment being run by the RSC that concerns measuring the pH of water worldwide. In our project we’re not asking for a measurement, we’re actually asking students to perform synthesis, but then also to think about what experiments to try next, and to help write the paper – the full gamut of aspects of a full research project. This is pretty demanding. It’s more reminiscient of the wonderful Biobricks competition, with the difference that our project here is open and web-based, rather than a competition in a specific location.

    What’s the hope here? I hope that students can get excited about working together on a real research problem, and can get a taste for what a mind-bending exercise real research is – research where you’re not even sure what question to ask at the outset, let alone how to answer it. What I’m hoping for is that students can help solve an important problem as a group. I recently went to a meeting organised by a student cohort committed to lobbying universities to take part in research in tropical diseases without necessarily seeking patents and profits, UAEM. The guy then in charge of the group, Ethan Guillen, said at the start: “Students are great allies to have if you’re a professor”. Amen to that.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
Follow

Get every new post delivered to your Inbox.