Is the “Data Glut” Blurring the Cutting Edge of Scientific Development?


Is there so much data now on the internet that it’s actually becoming harder to find the information you seek? As scientific research continues, the quantity of information (“data glut”) on the internet expands.  Quality of information is another issue.  Will it become increasingly harder to identify and reach the cutting edge in a given field? If the pace of scientific innovation and the accumulation and integration of knowledge continues to accelerate, as Ray Kurzweil suggests, will it reach a point where groups developing different or similar technologies will become incapable of keeping up with each others’ innovations? Will the research efforts of human society become less efficient, with more duplication of efforts, as we go forward? Is this already occurring?

Back in the days of paper-only publication the information flow was held to a manageable pace, and there was less information to manage. As books and journals proliferated over the past few centuries, and given their relatively slow delivery time compared to the internet, the difficulty of keeping up only gradually increased. Since the advent of electronic communications, and especially the internet, not only has the accessibility of information improved, but the total quantity has increased exponentially. Add to that the increased specialization in the sciences and an increase in the number of research programs, and the total information available has exploded. Are widely separate but similar lines of research developing so rapidly as to create a chaotic state in which researchers in a given field fail to connect with a significant number of their peers, resulting in redundant developments? Or has specialization allowed researchers to focus more tightly on the work that parallels their own so that a relatively coherent “cutting edge” persists in the majority of fields?

The internet has created a tradeoff between increased information availability and data overload. When I worked in the development of high speed image processing computers in the early 1980’s, at a time when the internet was nowhere near as useful as it is today, it seems we were no better off, but perhaps no worse off than today. We were unable then also to see a lot of the results of research funded by governments and corporations, and, in addition, had to put up with publication delays that don’t exist on the internet. At the same time, however, we weren’t overwhelmed by the sheer quantity of information as we are today. Our director was responsible for assessing competing organizations doing similar research, and came from an academic environment. Some of the team read related books and papers. As a young prototype builder I was focused on my work, and was not invited to participate in interactions with other organizations, but I did maintain the technical library for the engineers (99% paper, of course).

Today’s internet searches produce overwhelming quantities of results. If you search on any popular scientific topic, like “nanotechnology” (18.2 million finds on google on August 27th, 2008) or “supercomputer development in 2008” (2.26 million finds on google on September 2nd, 2008), you get an overwhelming number of citations. No person or organization can review all of those finds in a short enough time to keep up with daily additions and developments, and, if Ray Kurzweil is correct, the rate of creation of new information will continue to grow. When language barriers, some governments’ restrictions on internet access, and the secrecy of a lot of the government-funded and corporate-sponsored research are thrown in, to name a few variables, it suggests there already may be considerable redundancy in scientific work worldwide. After all, good ideas often occur to multiple people around the same time in history. Perhaps redundancy doesn’t signify a particularly lower level of overall efficiency when there are so many people and organizations doing the work, and hopefully different groups are discovering different things, so redundant efforts are compensated for by sheer numbers.

Commercialization of search engines affects search results. Search engines have to make money to support the continually expanding demand for their services, enhance their tools, and provide new and higher capacity hardware. One way at least some of them have found to do this is to sell preferred ranking, thus allowing their results to be skewed towards commercial ventures and other interests willing to pay for enhanced visibility. While search engines are the best way to find out what new developments are occurring via the internet, this may make them less than perfect for the job. I wonder how often the information of highest value in an area being researched actually shows up in the first thousand citations.

Are personal contacts and conferences the most effective means of getting to the cutting edge? Personal networking, association with appropriately-selected universities and research labs, and attendance of appropriate conferences still work today, and may still be the best way of reaching the cutting edge in any field of study, but is it an illusion that the internet makes reaching the forefront of a topic easier? I contend that, via the internet, one can only approach currency in a topic, and not get all the way to the cutting edge.

Is the internet data glut severe enough to slow progress or decrease our scientific efficiency as a society? As the internet achieves huge increases in content (as evidenced by search engine results), it only becomes more difficult to find the knowledge we want. As a result, it is possible that research teams in any given field will have increasing trouble connecting with each other. Will the risk that organizations will be unaware of progress by others increase? Will increasing amounts of duplication occur? I wonder if we could see a slowing of scientific progress for the amount of resources being committed, increased blurring of the cutting edge, and a decrease in “scientific efficiency” across human society as a whole.

As always, I welcome your comments. – Tim

Interesting reading:

The Structure of Scientific Revolutions, Thomas Kuhn, 1964

Advertisements

One response to “Is the “Data Glut” Blurring the Cutting Edge of Scientific Development?

  1. 2 Points:

    1. Relevance:
    Even though there is a glut of information available, I think we are already beginning to see a reversing trend on locating information. Even as more information is pouring in, which it will continue to do, the lines are beginning to become more clear at to what is relevant and what is not. this is due to 2 factors… 1) The learning curve. As more people utilize the internet to search for information, they begin to learn how to sort through the information. As they do, the true valid sources of info begin to gain in popularity and float to the top. 2) The search engines are learning too. As more user data is collected, search engines are able to produce a better product for users to locate that info more easily and efficiently.

    The risk: will users take advantage of relevant info or the most popular info.

    2.Education:
    Not formal education, but personal “enlightenment”. “I can learn anything!” Even with access to brick and mortar librarys, resources are still limited in volume and availability. With online resources, an individual can learn any topic, in any media, by any angle, at any time (with a connection of course). Take the nanotechnology example. If I pick one of those 18.1 mil results and its a technical journal, I read the journal and only understand every other word, I know that I can find a different resource on line to explain it the way I can understand and work my way up the complexity chain. This shows that any person can learn any topic to any level that they may not have exposure to any other way.

    Conclusion: Yes, I agree that there is currently a glut of information, but this is do to the metamorphosis of a tool, a valuable tool, growing out of it’s conceptual shell. The internet was born as a tool of the scientific community, but has grown to encompass the world community. The scientific community still has their portion of the web, where research and information is shared, but to that has been added a public nature, where any that want can see. This has removed borders and classes to let anyone to contribute, but more importantly, learn.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s