With thousands of journals and over 15 million abstracts in Medline (and over 2,000 more added daily), it is physically impossible for any scientist to keep current of the research and to explore all of the areas that may provide further insights into their scientific discovery. Most scientists rely on simple information-retrieval techniques to obtain scientific articles pertaining to a topic of interest.
Typically, this type of searching is performed using searching software that scans for terms identical to the query term. Although this approach is fast and returns many articles, the number of useful and insightful abstracts can be quite large and difficult for a human to review.
Sophisticated software programs have been developed to try to understand how the words in a scientific abstract are used and how the words correspond to the query term provided by the user. Unfortunately in a biomedical or biological abstract, it is very common for different words to represent the same biological entity (e.g., LASS1 and LAG1), for the same term to have different biological meanings (e.g., PAP is an alias for PAP, MRPS30, and PAPOLA), or for a single term to mean different ideas with respect to a given discipline (e.g., SCT represents either secretin or stem cell transplant).
In the drug discovery arena, many researchers doing disease-related research do not want to become experts in text-mining techniques, but simply want to find critical, published information about a disease, drug, or gene.
Therefore, what is needed is a well-designed software program that can understand typical language ambiguities, apply these challenging concepts to effectively derive key scientific knowledge from biomedical abstracts, and make the text-mining results easy to navigate and visualize for scientific researchers, rather than computer scientists.