The Broader Chemical Community’s View of Uploading Data

Hide threads | Keyboard Shortcuts

mattoddchem 9:14 pm on September 18, 2011 Permalink Reply
The Broader Chemical Community’s View of Uploading Data
Opening up your research to the world means you a) benefit from the opinions and knowledge of The Many as you’re doing the research (rather than months afterwards), and b) have to get your research into shape because The Many can cast a critical eye on what you’re doing in a never-ending process of peer review. Science benefits from these things.

Sharing data is the central part of open science. A necessary, not a sufficient, condition, but central none the less. One cannot be selective about which data to share, because that would mean making a value judgement about what’s important. And what’s unimportant today may important tomorrow. So let’s just share data.

Outside of open science (and our community of zealots) we should also be encouraging people to share data as part of traditional research publications. Many of us do, as PDFs of NMR spectra, for example. This common practice is very useful for the refereeing process, to determine whether the science is valid. Sharing PDFs is less useful for science because the data in a PDF are dead. Live data can be played with, PDFs can’t. Puppy vs. roadkill. Cow vs. hamburger. We should be submitting raw data to journals along with our traditional reviewer-friendly supporting information. And we should be asking journals to keep the data outside the paywall.

I recently asked a question about how we should share chemical data – i.e. what data formats would be best. There is an IUPAC standard which we’ve not particularly enjoyed, and we’ve been thinking about just sharing data in as raw a state as possible. Other people picked up on this and provided very useful comments and suggestions here, here and here, as well as in comments to the original post. Thanks guys.

There’s no consensus, though the IUPAC standard does have its fans, and (I didn’t realise) is a data format that can be used for other spectroscopic techniques rather than simply NMR. I won’t pretend to understand how that’s possible, but it’s interesting.

We’ll keep thinking about this. For our current ELNs we’ll continue to post data and see how we go over time.

However, I think we need to address a background question: how will any solution scale to the broader chemical community? I’m not talking about the technical issue of file format, or what to share. I’m talking about psychology.

My theory is: any solution to data sharing that relies on chemists uploading their data to a central point, or in a proscribed way, will not scale.

Solution: we need to be building solutions that can find chemical data on the web, extract the data and index them. i.e. a solution that involves as little electronic work as possible for the experimentalist.

I think that this is probably very hard, but can’t judge. I don’t know how you get a bot to understand that there is chemical content on a web page, and extract it automatically. I don’t know how you can trust the results. I don’t know what happens when the source web page dies. But I know science needs tools of this kind, and that this is what we’ll be doing in 20 years’ time.

Analogy: Google. Imagine if Google had said “Once you’ve created a web page, just send us the details and we’ll put it in our index.”

If we, for a moment, look outside our Band of Open Source Brothers, we see a vast community of talented researchers in chemistry who spend their time making molecules in the lab. To date this community either does not see or does not agree with the advantages of doing science openly, or has no need/wish to engage with the issues, or does not see the advantage of sharing data in traditional publications in a way other than PDFs. I see those advantages, and many people I talk to see the advantages, but the vast majority of chemists do not, yet, for whatever reason. Why, then, would a chemist, who is already busy with work, life, family, thesis writing and everything else, sit down and start uploading data to the web? Remember that our chemist, representing 95% of chemists out there, does not agree that doing so is worthwhile (or because they’re not allowed to). There is no incentive. For the incentive to take hold requires the world to change, and that’s going to take some time. It’s also the case that the community is not used to it. We’re used to publishing papers, then having the data appear, as if my magic, in SciFinder or Beilstein or whatever. So we have no problem providing the paper and the data, but we expect others to make it searchable.

Now, I’m a serious fan of Chemspider, and I’ve just come across Figshare. Excellent services. They’re pioneering. They must succeed, and I think to succeed there needs to be a shift from “hoping for user upload” to “bloodthirsty, active data extraction from disparate sites”, however difficult that might be. Anthony, Mark – I’d like to know your thoughts and what I can do to help. I’m whining because I want your work to flourish.

People I speak to then say sentences that begin “But all you have to do is…” and “But it’s easy – you just…” – no. It’s no good. Expecting chemists to upload their data to a specific place will not scale. If there’s an activation energy barrier for me, there’s a orbital-forbidden transition state for most people.

Rather, data need to be posted openly somewhere online:

a) To a lab book if you’re an open scientist

b) To an institutional repository if you’ve just finished a thesis, or generally want to share

c) To supporting information files, if you’re the author of a paper in a journal

…whatever is easiest and convenient locally. i.e. there can be a bunch of different solutions.

We can rely on this happening, because this is easy, and related to what chemists are doing right now. We can say to chemists: “Hey, do the research, post data. Wherever you want – either on your own webpage, or provide the data when you submit publications and ensure that the data are not behind a paywall. Here are some guidelines on file formats, but really just post the data. We’ll find the data. We’ll tag them so that other people can find them, and then you’ll see how great it is that you shared the data.“

If there is a way of doing this, or finding data people post wherever, and automatically making sense of it, we’ll start seeing some big changes to how things are done. People will start to see the benefits of openness in itself, and we’ll start to move towards an astonishing change – chemists collaborating in real time by finding other people who are working on their molecule/reaction right now.
Share this:
Facebook
X
Like Loading...

Related
mattoddchem and Mark Hahnel are discussing. Toggle Comments
- Mark Hahnel 7:11 pm on September 19, 2011 Permalink | Reply
  
  Thanks for asking Matt. I agree to an extent, but our opinions do differ on some things. I agree that the easiest way to make data re-use immediate is “bloodthirsty, active data extraction from disparate sites”. I also believe there is a role, which will grow, for crowdsourcing researcher data. The key here is the carrot /stick analogy. Wheels are in motion and there is more discussion happening in select fields of research with funders with regards to the stick. Researchers have a moral and ethical obligation to make all of their research data if funded by public money. This obviously isn’t enough right now, maybe mandates from funders will provoke some form of response.
  
  As a former researcher, my personal viewpoint is that researchers need to see the obvious benefits to their career, or the process needs to be so stupidly simple that it trumps their current data management plan. Here at FigShare, we are trying to do the bits that we can in a multi-pronged attack. We are developing away with the aim of making research data sharing fast and simple. If researchers need to be trained how to use your software, the uptake is likely to be low. We are attempting to have conversations with the funders and institutions about how they can do their bit. Any funders or institutions, please get in touch (mark@figshare.com). Finally, we are doing as you suggest and pulling the research objects out of Open Access publications, making the figures, datasets and videos available as individual citable, sharable and easily searchable research objects. By doing this and linking back to the original papers, we are making researchers previously published research more discoverable. By adding value to the data in this manner, we can provide a service for non OA publishers too. We are yet to start these conversations, but it would be a good way to start linking the data and to show the direct benefits of researchers uploading their data directly. Any other suggestions and feedback is always welcome. For me the most interesting part is the carrot. What incentives do researchers nee before they decide to do this themselves?
- mattoddchem 8:47 pm on September 19, 2011 Permalink | Reply
  
  “Stupidly simple” is right. As I was writing this post it occurred to me that we need a button on a browser that says “Share these data” in the way that I can add a paper to Mendeley and it knows what I’m trying to do (most of the time). So I write up an NMR spectrum, and I post the raw data to that web page, as well as an InChI. I then say “OK, share this” and the data are extracted. Sounds simple. Probably horrendously difficult.
  
  As to showing people what the benefits are: agreed. Let’s lead by example.
Reply Cancel reply
Required fields are marked *

Name *

Email *

Website

Notify me of new comments via email.
Notify me of new posts via email.
Δ

	Rewards for Innovati… on Retrospective Patents as an In…
	Rewards for Innovati… on The Economics of Open Source P…
	Open Source Malaria… on Crowdsourcing Drug Discovery
	Open Source Malaria… on An Example of Open Source Drug…
	Open Source Malaria… on Companies and Open Scienc…

Intermolecular

Recent Comments

Recent Posts

Admin

Misc Pages

Mat’s Twitterstream

mattoddchem 9:14 pm on September 18, 2011 Permalink Reply

The Broader Chemical Community’s View of Uploading Data

Mark Hahnel 7:11 pm on September 19, 2011 Permalink | Reply

mattoddchem 8:47 pm on September 19, 2011 Permalink | Reply

Reply Cancel reply

Intermolecular

Recent Comments

Recent Posts

Admin

Misc Pages

Mat’s Twitterstream

mattoddchem 9:14 pm on September 18, 2011 Permalink Reply

The Broader Chemical Community’s View of Uploading Data

Share this:

Related

Mark Hahnel 7:11 pm on September 19, 2011 Permalink | Reply

mattoddchem 8:47 pm on September 19, 2011 Permalink | Reply

Reply Cancel reply