Important tips to bring value to data management in Life Sciences.

This blog is to debate the points we find crucial for the data management's future in Life Sciences.

Data cloud

Computer-aided discovery: the invisible danger

Digital explosion, especially in research on biomedical matters, addresses specific issues in knowledge production attributable to computer-aided discovery. To catch the problem, a parralell can be drawn between computers, on the one hand, and measurement and observational devices used for scientific investigations, on the other hand. For example, microscopes are used to circumvent the physiological limits of human vision and, yet, precise guidelines determine their proper setting, use and further data/image interpretation. In the field of knowledge production, computers are intended for performing a variety of tasks more rapidly and efficiently than human brains could achieve on their own; therefore, extending human brainpower.
From origin to the end of 20th century, Life Sciences were pure experimental, hypothesis-driven sciences; with the development of high-throughput screening and high computational performance, several important domains, including Health Sciences, moved to discovery sciences that basically rely on making large databases of information about all the components of living systems to search data for new motifs using mining approaches. Taken to extremes, C. Anderson has predicted that the data deluge will make the scientific method obsolete (The End of Theory). Without going that far, it would be reasonable to consider computer methods and tools just as any other artefact with constraints and limitations, but used in cognitive processes (instead of sensory processes as any other scientific artefacts).

This brings to mind the wide debate on Artificial Intelligence (AI) and the reasons for AI's failure, due to misunderstanding fed in part by dashes hopes. HAL 9000, the archetype of an autonomous and conscious artificial intelligence in the movie "2001: A Space Odyssey", will probably never exist and the famous IBM's Watson supercomputer mainly results of brute force expansion of computer hardware than computer autonomy.
All of which is to say that even when most sophisticated methods are used, results derived from computers are human-brain, hypothesis-driven outcomes; this is the reason why all assumptions(domain- and/or tool-driven assumptions) must be made explicit; this situation is all the more dramatic given that human beings are under consideration, as it is the case in Health Sciences.

In short, constraints and abstractions (not to mention, simplifications) used for turning data and tools into computer-understandable information, introduce distorsions that may have serious impacts. With this situation in mind, every step of every process from data production and access, pre-processing, analysis and interpretation, has to be deeply explicited to get credible and best valuable outputs; this does not exempt from making these specifications widely available to give access to external controls and to stimulate comments and debates.

logo of the biodataconsulting website

Scientific vs. Business worflows: similarities and differences