Dr. Samantha Kleinberg Awarded NIH Grant for New Methods of Studying Large-Scale Data
July 24, 2013
Research promises to make healthcare monitoring more effective for clinicians
The global capacity to store data has grown astoundingly in what is aptly referred to as the Information Age. A research article in Science estimates that, between 1986 and 2007, general purpose computing capacity grew at an annual rate of 58%, and the amount of stored information grew at an annual rate of 23%. This explosion means that researchers today can meticulously record detailed information. Intensive Care Unit (ICU) patients are monitored every 5 seconds, Twitter streams are updated constantly, and climate data are recorded every 15 minutes for years. These massive datasets provide researchers with tremendous opportunities to gain significant insights into how crucial aspects and factors of human experience work over time, often in situations where experiments are infeasible or unethical. However, these complex, multifaceted datasets do not easily translate into actionable knowledge. Under traditional modes of evaluation, hidden variables obscure understanding, and significant but rare events render probabilistic methods ineffective.
Dr. Samantha Kleinberg and Dr. Adriana Compagnoni of the Department of Computer Science at Stevens Institute of Technology, in collaboration with Columbia University Professor Jan Claassen, have received a grant from the National Institutes of Health (NIH) to help researchers understand why these systems behave as they do and why they change over time. The work has a wide range of applications, with the most direct implications for neurological intensive care unit (NICU) data streams where large volumes of data are generated from continuous recording of patients’ brain activity and physiological signs. The amount of data often overwhelms clinicians’ ability to find complex patterns that can inform treatment in real time. The collaborators are developing systems and algorithms to give clinicians actionable information soon enough to have a substantial impact on patient outcomes. To enable rigorous validation of the algorithms, the researchers are developing a new computational platform for generating simulated NICU time series data.
“These new methods will allow the processes of data analysis to catch up with the ever-increasing capacity to store data,” says Dr. Michael Bruno, Dean of the Charles V. Schaefer, Jr. School of Engineering and Science. “They have the potential to empower decision-making from intensive care units to centers of policymaking and establish a foundation for discovery in areas such as computational social science.”
Among patients who are recovering from a stroke in an NICU setting, a seizure is a rare event which can have a significant impact on outcomes. According to Dr. Kleinberg, “Doctors need to know not just that an ICU patient being treated for a stroke is having a seizure but whether it is causing further injury before they can determine how to treat it.” Existing data mining techniques can detect such rare but significant events but fail to provide the larger context needed to devise effective treatment. Probabilistic methods can reveal causal relationships in data, but relevant probabilities cannot be accurately estimated for rare events. Furthermore, unmeasured variables can obscure the true cause of an event. Current methods for inference that take these latent variables into account cannot handle large data with complex connections across time. “We want to develop a way to give more robust and meaningful alerts so that doctors can quickly formulate a course of action,” says Dr. Kleinberg.
Dr. Kleinberg and her collaborators are leveraging the volume of data and the connection between type causality (comparing general events, e.g. “Do seizures reduce a patient’s chance of survival?”) and token causality (singular events, e.g. “Did this patient’s seizure play a role in his death?”) to infer a model of how a patient’s body normally functions. They then determine whether a rare event explains a deviation from usual behavior. They compare models and observed instances to form a basis for finding hidden variables, seeking to find out how many occurrences of a certain variable are due to influences outside the dataset and further to find shared causes for sets of variables
“The methods developed by Dr. Kleinberg, Dr. Compagnoni, and Dr. Claassen will make it possible to go from complex datasets to knowledge to policy more quickly and aptly than ever before possible by identifying actionable information on causes,” says Dr. Dan Duchamp, Department Director for Computer Science at Stevens. Graduate students working with each of the professors will enter the workforce with crucial knowledge in computational methods and their application to real-world datasets. The team will also make realistic simulations freely available to the research community in order to enable algorithmic development by computational researchers outside of medical centers, thus creating a framework for the validation and comparison of algorithms.
Dr. Kleinberg is an expert on causality, inference from complex data and biomedical informatics. Her work combines these areas, uniting temporal logic and tools from computer science with philosophical theories of causality to solve biomedical problems. She has previously applied these methods to stock return time series as well as political speeches and popularity ratings. Before her appointment at Stevens, she served as a postdoctoral Computing Innovation Fellow at Columbia University, in the Department of Biomedical Informatics. Her book, Causality, Probability, and Time, is now available in print and electronically.
Interested in finding out how you can apply your knowledge to create new modes of discovery? Visit our Computer Science Department and check out the offices of Undergraduate and Graduate Admissions to enroll!