Interview: Sam Molyneux




ScienceScape.org is organizing science in a way that has never been done before. By drawing from millions of peer reviewed papers and associating with scientific lab teams and institutes, the ScienceScape team is creating an interactive map of the human experience of science throughout history. I talked to Sam Molyneux, one of the co-founders of ScienceScape, about their beta testing, how they've organized an amazing amount of information and their mapping mission!

This might sound like a funny thing to say, but the website is almost in a pre-beta style right now?

Yes, that’s exactly right. We’ve been running a private beta since January 15th. Essentially it was designed to test was if we could operate the site on a tiny subset of the biomedical literature, so we’ve been running it on 30,000 papers. What we wanted to see was whether or not some of our basic assumptions with the stats and math behind it worked, and what we found out was, it does. What we’ve been working on is scaling the product up to all 22 million papers. When we launch we’ll cover the entire history of science literature.

What is ScienceScape.org doing to organize all of this information?

There’s this pervasive problem in science now where there are thousands of papers coming out every day. Something that’s fun to do to experience the scale of is if, subsetting again to life sciences, that’s my background, if you go to Pub Med and set up the query you can see just the things that are published by day.  It’s staggering. There are between 2-4 thousand papers coming out every day.

And that’s just the one area- Life Science- so there are thousands of papers per day per area of science?

That’s right. There’s an estimated one and a half million peer reviewed papers every year.  Somebody has written all of those and they go into a data base and we search them but they are essentially unorganized in the data base. So the goal of the project is to do two things; organize all of the literature that came out from today back in history, and to do that in a way that makes sense. On the other side of the coin is to look at the publication front, the stuff that’s coming out instantaneously, organize it as it comes out and push it to all the researchers that need to see it. Right now there isn’t a good toll that organizes the literature and pushes new papers to you.

That seems like it’s a lot of information! How did you guys get your hands on and organize all that information from most recent papers all the way back?

What makes sense to me, being a researcher- I work in Cancer Genomics, so I’m interested in particular fields and topics. I’m interested in particular institutes and labs that are putting out key work and I want to know when it comes out, even places, for example I know there’s a greats density of Cancer genomisists in Boston, I want to pay attention- so I want to organize the entire literature by the human landscape of the literature. The fields and topics the work is related to, the people, the millions of historical and active authors, and the research teams. One thing we know now is that science is not done by individuals in general. The Myth of the lone genius is a thing of the past. We work in teams and we’re highly collaborative.

We’re organizing in contexts and categories with thousands of entities within them. We’re building a network of pages with just the right interfaces to let you dig into the archive of literature that relates to that context. For example, for Fields and Topics, we’ve created these stock market like graphs; no one’s ever done this before for scientific literature. You can look for peaks on the graph and the peaks are important papers;  for example if you look for a particular gene,  gene you can find the paper that was the cloning of the gene, the initial characterization of the gene, later in the fields history you find this gene associated with this disease- there’s all these patterns in the literature if you tie in the papers in the right context with the right impact data, like citation counts. That’s what we wanted to validate with our private beta. Can we pump the data base with 30,000 citations, pick a particular gene and without knowing anything about the history of the field, can we find all of the important papers? We can.

We’re geotagging millions of papers also based on the location institute that published the paper, so we’ve been able to tie in these timelines with a map of the world. You can drag the timeline back and forth and basically watch a replay of the publishing history of the field pop up on the map of the word.

Is this something you are creating for other scientists and researchers or something the general public will be able to use as well?

Everyone can use it; the site is totally free for the public and academics. We’re hoping the community in general will adopt it. There’s a lot of people that rely on the literature who don’t contribute to the literature, and you have all the people who are publishing and hoping to publish, all the allied medical and doctors relying on the literature for keeping up standard and practices, as well as whole industries built on the literature, for example, the biotech and pharmaceutical industries. Also, if you’re the public and you really want to understand in a field of science what’s happening I guess you can do it, but if you’re not in the field – I mean, I can’t interpret some of the stuff that’s not in my field.

The language is complicated if you’re not in that field of science?

It’s  totally obscure! But it’s meaningful to the people that work on it- we’re trying to organize the information for the people who need to see it.

So you are organizing peer reviewed, scientific team papers- not articles about subjects, but the actual papers themselves?

Right. The goal of the site is to take a data base for each domain of science, starting with life science, and say, Can we organize this all the way back in history, and can we organize them as they come out and get them to people who need to see them.

That sounds like a pretty daunting task- how many people are on the Sciencescape team?

It’s incredibly daunting! We’ve bitten off a lot. I have a background in bioinformatics and dealing with large data sets, and I think over time we can make a map of science that really does reflect the human landscape of research. I founded the company with my sister Amy who is a talented programmer. We’ve been working with about 5-7 people on the team for about two years. We’ve built a platform that can handle this task. We’ve been working with some of the people who built the software framework we’ve created the site in- to their knowledge, this is the largest website ever created in that framework.

5-7 people is a lot less than I thought you were going to say!

We need more! We’re stretched incredibly thin. All the stories you hear about startups running out of money and trying to take on far too much is all true about us. We’ve been running on vapors for about two years- for example, Amy had to move to Russia for various parts of the year. We’ve risked it all but we’re taking on a really important problem and I’m honored to take on this task.

Sam and Amy, founders of ScienceScape

 When do you expect to launch?

Amy just got back from Russia couple days ago, and we’re going to review how the site is running, do a little work on the interface and some final testing.  It’s updating to the current year; we have everything up to 2011, we’re getting 2012, the update is expected to last about 30 days, maybe a little bit longer. We want to see that the site will run for a couple thousand people, eventually we will try to generate a lot of attention and buzz.

How do you envision this website being used, both in terms of scientists using it and the general public?

What you’ll encounter when you first get to the site will direct you to one of two paths. Sciencescape is totally open; you can go in as an anonymous user, being able to explore and look at patterns and citations over history, or you can sign in and you’ll go through a quick wizard that allows you to choose a major discipline and a couple of fields, setting you up with feeds that will come to you. Say you chose immunology or declare yourself as a Cancer Biologist, you’ll be following cancer papers that are coming out in any of the top 15 journals, some of the key institutes in cancer biology, and after that you’ll get a home feed that’s just like a Twitter feed. You have papers coming in, you can save them or mark them for later, you can broadcast them if you are associated with a team on the site and your lab mates will get a broadcast about the paper and ultimately you can share it on Google +, Facebook and Twitter. If you want to find more content or context that relates to your work, you just walk around the map and search for new fields or institutes and click subscribe. If you subscribe, papers associated with that institute will be pushed to your account, to your homefeed and to email alerts. Eventually we’ll have an app so you can interact with the data.

The framework is able to hold thousands of institutes, but we’re not launching with thousands of institutes, we’re launching with a couple hundred. It will be up to users to add an institute if we’re missing it or add a field if they want to see it.

So there is the potential for user interaction on the site in terms of creating fields or drawing your attention to something you don’t have, but on the other side you’re not going in the direction where people are editing the information like Wikipedia; all of the content is peer reviewed science papers.

That’s right; you can’t edit anything to do with the citation itself. You might be able to flag a paper and describe it in plain English and you can participate in the mapping. It’s up to users to correct information, like associating your lab to papers or geotag them- you can augment our efforts.

To our knowledge no one has built an interactive map of science. We’ve been trying to figure out what it should feel like, what it should look like. 

Am I able to get in touch with research teams that have signed up on your site if I’m interested in a paper they’ve published?

We’re trying not to be the Facebook of science, but we have built a messaging module for the site. Like Twitter there will be a mechanism to directly message someone.  If you see a paper pop up in your feed you’ll be able to click through to that paper and you’ll see if it’s been mapped to any particular map. If you click through to that lab you’ll be if they are associated with the site and see their profile and who is involved. 

Find out more information about ScienceScape at ScienceScape.org where you can sign up to become one of the first users of this innovative new way to organize science literature:

http://www.sciencescape.org/
https://twitter.com/#!/sciencescape
http://www.facebook.com/Sciencescape

To find out more about Sam Molyneux:

http://ca.linkedin.com/in/sammolyneux

Interview: Dr. James Kakalios


Dr. James Kakalios is a Physics Professor and author of the Physics of Superheroes. I met him at the USA Science and Engineering Festival where he took some time out of his busy schedule to talk to me about his favorite heroes, the newest Iron Man vs Magneto fight in Avengers vs X-Men, his consulting work on the Amazing Spider-Man movie coming out in July and generally nerded out with me about comic books. Enjoy!