Search the Directory

Data Science Researcher Search Form

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Data Science Researchers

Want to be listed on this page? Add me as a Data Science Researcher (Requires login.)

Gregory Banyay

Assistant Research Professor
University Park

Data science can significantly benefit multiple domains of engineering mechanics, particularly with respect to modeling and simulation. My focus lies primarily in the area of development, deployment, verification & validation, and uncertainty quantification, of digital twins, in support of both industrial and academic endeavors.

Methodologies: Bayesian Methods, Experimental Design, Decision Science, Dynamical Models, Machine Learning, Optimization, Predictive Modeling, Statistical Inference, Statistical Modeling, Time Series Analysis
Applications: Environmental Sciences, Industrial Engineering, Music

Website: https://www.arl.psu.edu/content/fluid-dynamics-acoustics
Email: gab5631@psu.edu

Guido Cervone

Professor of Geography, Meteorology and Atmospheric Science
University Park

My formal background is in Computational Science and Remote Sensing, and my research focuses on the development and application of computational algorithms for the analysis of spatio-temporal remote sensing; numerical modeling; and social media “Big Data” related to environmental hazards and renewable energy. I focus on problems related to the fusion of heterogenous data at different temporal and spatial scales.

Methodologies: Artificial Intelligence, Computational Tools for Data Science, Data Mining, Deep Learning, High-Dimensional Data Analysis, Machine Learning, Spatio-Temporal Data Analysis
Applications: Environmental Sciences

Website:
Email: cervone@psu.edu

Enrique del Castillo

Distinguished Professor of Industrial Engineering and Professor of Statistics
University Park

My broad interests are in Statistics and Machine Learning methods and their application to all of Engineering and to some areas in Science. The “big data” revolution has resulted not only in larger datasets but in data that have a more complex structure. The revolution has been driven by better and faster non-contact sensors in industry, by micro-arrays, better optics, and increasingly more powerful mass spectrometers in science, and by better remote sensing and optical equipment in geophysics and astronomy. In industry, while the traditional paradigm in statistics developed by Fisher, “Student” and Neyman, characterized by small samples obtained in expensive experiments, is very powerful and still of great application today, there is a considerable number of fields in both engineering and science where a response of interest is made of thousands of inexpensive observations, given the wide availability of different type of sensors and scanners.

My research over the years has focused on how to control or optimize an industrial process where large heterogeneous datasets are available. I am interested in building data-based statistical models for the control and optimization of engineering systems or that provide helpful information for scientists. This includes diverse problems in process control (Statistical and Time Series Control), Experimental Design, and Response Surface Optimization methods. In recent years I have worked in these areas dealing with complex, large geometrical (or geometrical-spatial) datasets, specifically, functional, shape and surface data (i.e., data that occurs in 1D or 2D-manifolds), image data (2 and 3D) and general high dimensional data that may be concentrated in lower dimensional manifolds.

Methodologies: Bayesian Methods, Casual Inference, Experimental Design, Dynamical Models, Machine Learning, Spatio-Temporal Data Analysis, Time Series Analysis
Applications: Astronomy and Cosmology, Biological Sciences, Industrial Engineering, Manufacturing, Production

Website: https://sites.google.com/view/ecastillo/home#h.n74gqnov8h53
Email: exd13@psu.edu

Eric B. Ford

Professor of Astronomy & Astrophysics
University Park

Ford’s research integrates planet formation theory and astronomical observations to improve our understanding of planet formation & evolution, both in our Solar System and in general. They develop, adapt and apply Bayesian methods to: (1) improve the detection and characterization of exoplanets, (2) characterize exoplanet populations, and (3) improve the design and efficiency of exoplanet surveys. For example, the Ford group is characterizing the population of planetary architectures based on data from NASA’s Kepler mission by combining Hierarchical Bayesian Modeling, Approximate Bayesian Computing, and Gaussian Process emulators. As another example, the Ford group is researching how radial velocity surveys can distinguish planets from intrinsic stellar variability by applying machine learning to time series of high-resolution stellar spectra.
Ford created a graduate class on High-Performance Scientific Computing for Astrophysics (Astro 528), contributes to advanced summer schools run by the Penn State Center for Astrostatistics, and maintains a mailing list for Julia Language Users at Penn State. Ford is an Institute for CyberScience co-hire, a co-PI for the CyberLAMP cluster, and has served on Penn State’s Data Sciences Major Management Committee.

Methodologies: Bayesian Methods, Computational Tools for Data Science, High-Dimensional Data Analysis, Machine Learning, Predictive Modeling, Statistical Modeling, Time Series Analysis
Applications: Astronomy and Cosmology

Website: http://personal.psu.edu/~ebf11/
Email: ebf11@psu.edu

Lee Giles

David Reese Professor of Information Sciences and Technology
University Park

My research involves the creation and development of various novel search engines and digital libraries that utilize machine learning and information retrieval techniques.

Methodologies: Deep Learning, Information Retrieval, Machine Learning, Natural Language Processing
Applications: Computer Vision, Education

Website: https://clgiles.ist.psu.edu/
Email: giles@ist.psu.edu

Terry P. Harrison

Professor of Supply Chain and Information Systems
University Park

I use optimization to look at large scale production-distribution systems. I also have a focus on the use of optimization to explore the tradeoffs between additive and subtractive manufacturing. Lastly, I am examining the use of blockchain as a method to create more robust and efficient supply chains.

Methodologies: Algorithms, Decision Science, Network Analysis, Optimization
Applications: Business Analytics, Environmental Sciences, Supply chain management

Website:
Email: hbx@psu.edu

Pete Hatemi

Distinguished Professor
University Park

Pete Hatemi is Distinguished Professor of Political Science, Co-fund Microbiology and Biochemistry at Penn State University. He conducts research in the fields of individual differences in preferences, decision-making, and social behaviors on a wide range of topics, including: political behaviors and attitudes, addiction, political violence and terrorism, public health, gender identification, religion, mate selection, and the nature of interpersonal relationships. In so doing he advocates theoretical and methodological pluralism, including but not limited to behavioral experiments, endocrinology, genetics, physiology, neuroscience, and social learning approaches. He works on policy, health care and national defense in the government, private and public sectors.

Methodologies: Experimental Design, Sparse Data Analysis, Biostatistics, quantitative genetics
Applications: Behavioral Science, Biological Sciences, Health Sciences, Political Science, Psychology, Social Sciences

Website: https://scholar.google.com/citations?user=Ci8Ix08AAAAJ&hl=en
Email: pkh11@psu.edu

Louisa Holmes

Assistant Professor of Geography and Demography
University Park

I am a health geographer and demographer with additional training in public health and public policy. My research focuses in three areas – (1) health disparities and the socio-spatial determinants of health; (2) tobacco control and substance use; and (3) quantitative and geospatial research methods, particularly representative survey research and area-level observational studies. In my interdisciplinary work, I seek to understand contexts of health and place as foundational to perpetuating health disparities, as well as opportune for promoting health, through social engagement, built and natural environments, and multi-level policy infrastructures. In recent years, I have increasingly approached my research through the lens of sustainability; sustainable communities are those with equitable access to environments optimal for promoting health and preventing disease.

I have designed and implemented numerous probabilistic household surveys and environmental data collection projects, with which data I have published on topics such as tobacco control, cannabis use, migrant health and biological risk profiles in the context of urban neighborhoods. Presently, I am completing the second wave of a panel study of young adult substance use in the San Francisco Bay Area, which also includes tobacco, vape and cannabis retail data collection, and neighborhood assessments.

At Penn State, I also teach Intro to Spatial Methods and Advanced Spatial Methods, along with special topics courses in Health Geography.

Methodologies: Data Visualization, Spatio-Temporal Data Analysis, Statistical Modeling
Applications: Geographic Information Systems, Health Sciences, Social Sciences

Website: https://www.geog.psu.edu/directory/louisa-m-holmes
Email: lmholmes@psu.edu

Vasant Honavar

Professor and Edward Frymoyer Chair of Information Sciences and Technology, Director, Artificial Intelligence Research Laboratory
University Park

My most recent work in Data Sciences has focused on (i) Scalable algorithms for building predictive models from large, distributed, semantically disparate data (big data), including more recently, linked open data (ii) Algorithms for constructing predictive models from sequence, image, text, multi-relational, graph-structured data; (iii) New approaches to selective sharing of knowledge across autonomous knowledge bases (including knowledge base federation, secrecy-preserving query answering); (iv) Theoretically sound yet practically useful approaches to functional and non-functional specification driven composition of complex services from components; (v) Expressive languages for representing, and model checking approaches to reasoning with, qualitative preferences; (vi) Algorithms for eliciting causal effects from disparate sources of observational and experimental data; (vii) Scalable algorithms and software for comparative analyses of large bio-molecular networks and (6) Machine learning approaches to analysis and prediction of macromolecular interactions and interfaces (including in particular, the first algorithm for partner-specific prediction of protein-protein interface sites and state-of-the-art sequence based protein-RNA interface predictors) that have resulted in several widely used web servers for analysis and prediction of protein-protein, protein-DNA, and protein-RNA interactions and interfaces, B-cell and T-cell epitopes.

My current research focuses on (1) Computational abstractions scientific artifacts (e.g., data, knowledge, hypotheses), and universes of scientific discourse (e.g., biology), and scientific processes (e.g., hypothesis generation, predictive modeling, experimentation, simulation, and hypothesis testing), cognitive tools that augment and extend human intellect; and human-machine infrastructure (including data and computational infrastructure and organizational structures and processes) to accelerate science; (2) Design and analysis of algorithms for predictive modeling from very large, high dimensional, richly structured, multi-modal, longitudinal data; (3) Elucidation of causal relationships from disparate experimental and observational studies; (4) Elucidation of causal relationships from relational, temporal, and temporal-relational data; (5) Design and analyses of accountable, explainable, and fair AI systems; (5) Analysis and prediction of macromolecular interactions, elucidation of complex biological pathways e.g., those involved in immune response, development, and disease; (6) Predictive and causal modeling of individual and population health outcomes from behavioral, biomedical, clinical, environmental, socio-demographic data; (7) Predictive and causal modeling of behavioral and cognitive systems in naturalistic settings; (8) Accelerating materials discovery using machine learning (8) Modeling the structure, activity, and function of brain networks from fMRI and other types of data.

Methodologies: Artificial Intelligence, Casual Inference, Data Mining, Deep Learning, Machine Learning, Network Analysis, Spatio-Temporal Data Analysis
Applications: Bioinformatics, Computer Science, Cyber Security, Health Sciences, Industrial Engineering, Materials Science, Networks, Neuroscience

Website: http://ailab.ist.psu.edu
Email: vuh14@psu.edu

Sharon Xiaolei Huang

Associate Professor, College of Information Sciences and Technology
University Park

Dr. Sharon Huang is a data scientist who works with multimedia data, especially image and video data in the biomedical domain. She focuses on image analysis, machine learning and visual analytics methods for object recognition, image and video segmentation, image and video synthesis, computer-assisted diagnosis and intervention, image registration/matching, and motion tracking. Her broader interests include data science for healthcare, artificial intelligence for medicine, biomedical informatics, computer vision, computer graphics, and human-computer interaction.

Methodologies: Algorithms, Artificial Intelligence, Bayesian Methods, Computational Tools for Data Science, Data Visualization, Deep Learning, High-Dimensional Data Analysis, Image Data Processing and Analysis, Machine Learning, Predictive Modeling, Spatio-Temporal Data Analysis
Applications: Bioinformatics, Biological Sciences, Climate Research, Computer Science, Computer Vision, Health Sciences, Materials Science

Website: sharon-huang.ist.psu.edu
Email: suh972@psu.edu

David Hunter

Professor of Statistics
University Park

My work in statistical optimization algorithms includes coining and and helping to popularize the term “MM algorithms,” which is a class of algorithms that contains the well-known EM algorithms. I also work on statistical models for networks and am a co-creator of the “statnet” suite of packages for network analysis in R. Finally, I work on the theory and computational practice of unsupervised clustering using nonparametric finite mixture models.

Methodologies: Algorithms, Network Analysis
Applications: Networks

Website: http://personal.psu.edu/drh20/
Email: dhunter@stat.psu.edu

Jia Li

Professor of Statistics
University Park

Jia Li’s research interests include statistical/machine learning, probabilistic graph models, image analysis with applications in a variety of disciplines. She has developed fundamental methods and algorithms for machine learning as well as real-time AI systems for image annotation, classification, and composition analysis.

Methodologies: Algorithms, Artificial Intelligence, Computational Tools for Data Science, Data Mining, Data Visualization, Deep Learning, High-Dimensional Data Analysis, Image Data Processing and Analysis, Information Retrieval, Machine Learning, Real-time Data Processing, Spatio-Temporal Data Analysis, Statistical Modeling, Time Series Analysis
Applications: Bioinformatics, Biological Sciences, Climate Research, Computer Science, Computer Vision, Digital Humanities, Electrical Engineering, Materials Science, Psychology

Website: stat.psu.edu/~jiali
Email: jiali@psu.edu

Shaun Mahony

Assistant Professor of Biochemistry & Molecular Biology
University Park

My lab develops machine learning applications for understanding gene regulation. We are particularly interested in regulatory proteins called transcription factors, which recognize particular DNA binding sites in the genome and thereby regulate the cell-specific activities of genes. We develop machine learning approaches to understand the DNA sequence and chromatin patterns that determine transcription factor regulatory events within a given cell type.

Methodologies: Deep Learning, Machine Learning
Applications: Bioinformatics, Biological Sciences

Website: http://mahonylab.org/
Email: sam77@psu.edu

Paul Medvedev

Associate Professor
University Park

Paul Medvedev’s research focus is on developing computer science techniques for analysis of biological data and on answering fundamental biological questions using such methods.

Methodologies: Algorithms, Artificial Intelligence, Computational Tools for Data Science, Machine Learning
Applications: Bioinformatics, Biological Sciences, Computer Science

Website: http://medvedevgroup.com
Email: pzm11@psu.edu

Kevin Munger

Assistant Professor of Political Science and Social Data Analytics
University Park

I do large-scale quantitative analysis of social media trace data, with a focus on video-focused platforms like YouTube and TikTok. I also conduct randomized “field” experiments on social media using Twitter bots. My primary methods are webscraping and quantitative text analysis.

Methodologies: Casual Inference, Computational Tools for Data Science, Experimental Design, Machine Learning
Applications: Political Science

Website: https://polisci.la.psu.edu/people/kmm7999
Email: kmm7999@psu.edu

Rebecca Napolitano

Assistant Professor of Architectural Engineering
University Park

​My research group focuses on hybrid analytics which lies at the intersection of architectural engineering, data science, and historic preservation. Hybrid analytics, a nascent field, is the combination of physics-based modeling and data-driven modeling for the end goal of making real-time predictions and monitoring in the context of Digital Twin a reality. This new field leverages the decipherability and clear-box nature of physics-based modeling, with accuracy and pattern recognition techniques of data-driven machine learning algorithms. More specifically, our research at the intersection with data science focuses on the following aspects for preservation and adaptive reuse of existing and historic structures as a sustainable infrastructure solution: 1) eye tracking and knowledge graphs to analyze bias during a visual inspection, 2) pattern recognition for damage detection and model generation, 3) sensor modality and location optimization, 4) feature learning from monitoring data, 5) predictive modeling of infrastructure using physics-based models, 6) adaptive design of experiments for new construction/repair materials.

Methodologies: Artificial Intelligence, Bayesian Methods, Experimental Design, Data Mining, High-Dimensional Data Analysis, Machine Learning, Predictive Modeling, Real-time Data Processing
Applications: Civic Infrastructure, Materials Science

Website: https://sites.psu.edu/thebeamlab/research/
Email: nap@psu.edu

Becky Passonneau

Professor of Computer Science and Engineering
University Park

My area of research is natural language processing (NLP), with a focus on semantics and pragmatics. I investigate how the same combinations of words have different meanings in different contexts, in spoken or written language. Recently I have been working on NLP applied to educational technology to support reading and writing skills, and on novel adaptive dialogue policies for agents that learn from people through text-based multi-modal dialogue. In the past I have worked on a wide range of topics including summarization of textual and quantitative data, exploration of knowledge graphs, causal models of failures on the electrical grid based on mining structured and unstructured (textual) data, text forecasting from financial news.

Methodologies: Artificial Intelligence, Casual Inference, Data Mining, Deep Learning, Natural Language Processing
Applications: Computational Linguistics

Website: https://www.nlplab.psu.edu/
Email: rjp49@psu.edu

Wesley Reinhart

Assistant Professor of Materials Science & Engineering
University Park

My research platform takes advantage of lessons learned from the traditional materials design approach to develop efficient and robust inverse design workflows based on both physics-based modeling and data-driven paradigms, including GPU accelerated computing, hybrid simulation methods accelerated by machine learning, and generative models which require no simulation at all. We seek to capitalize on advances in both data science and machine learning, including the increasingly popular deep learning but also methods based on Gaussian Process, Optimal Transport, and other related methods, as well as high-performance physics simulation to predict the thermodynamic, electromagnetic, and mechanical responses of materials. The increasingly close coupling of these topics with materials synthesis and characterization will undoubtedly unlock new and improved functionalities in a wide variety of materials applications.

Methodologies: Artificial Intelligence, Data Mining, Deep Learning, High-Dimensional Data Analysis, Machine Learning, Optimization, Predictive Modeling
Applications: Chemistry and Chemical Engineering, Materials Science, Nanotechnology

Website: https://sites.psu.edu/reinhartgroup/
Email: reinhart@psu.edu

Shomir Wilson

Assistant Professor of Information Sciences and Technology
University Park

My research brings together natural language processing (NLP), privacy, and artificial intelligence. I direct the Human Language Technologies Lab at Penn State.

I am interested in solving problems to enable computers to do meaningful work with large volumes of natural language text. My lab develops new methods for NLP and applies them to a variety of domains, including privacy, online social networks, web science, and digital libraries. I am particularly interested in breaking down technology’s “walls of text”, i.e., situations where a human user or decision-maker is expected to consume a large quantity of text to take action while lacking sufficient resources (time, expertise) to properly understand what they have been given. I have applied this paradigm to privacy policies, scholarly manuscripts, documents from the world wide web, and historical texts, and I am always interested in new domains to work with.

Methodologies: Artificial Intelligence, Data Security and Privacy, Decision Science, Deep Learning, Machine Learning, Natural Language Processing
Applications: Behavioral Science, Business Analytics, Computational Linguistics, Computer Science, Cyber Security, Digital Humanities

Website: https://shomir.net
Email: shomir@psu.edu

Lingzhou Xue

Associate Professor of Statistics
University Park

My research focuses on the development and application of advanced statistical methods, theory, and computational algorithms for analyzing complex, high-dimensional data, with a special emphasis on the variable selection, network analysis, high-dimensional hypothesis testing, and nonconvex statistical learning.

Methodologies: Deep Learning, High-Dimensional Data Analysis, Machine Learning, Network Analysis, Optimization, Statistical Inference, Statistical Modeling
Applications: Bioinformatics, Biological Sciences, Business Analytics, Environmental Sciences, Finance Research, Networks

Website: https://stat.psu.edu/people/lingzhou-xue
Email: lzxue@psu.edu

Christopher Zorn

Liberal Arts Professor of Political Science
University Park

Christopher Zorn is the Liberal Arts Professor of Political Science, Professor of Sociology and Criminology (by courtesy), and Affiliate Professor of Law at Pennsylvania State University. He holds a Ph.D. in political science from Ohio State University (1997) and a B.A. in political science and philosophy from Truman State University (1991). Prior to coming to Penn State, he was Professor of Political Science at the University of South Carolina (2005-2007), a Visiting Scientist and Program Director for the Law and Social Science Program at the National Science Foundation (2003-2005), and Winship Distinguished Research Professor of Political Science at Emory University, where he taught from 1996 to 2003. His research focuses on judicial politics and on statistics for the social and behavioral sciences. Professor Zorn is the recipient of eight grants from the NSF, as well as numerous other fellowships and awards. His current research interests include unsupervised learning methods for text, measurement models and data reduction, and data visualization for group decision making.

Methodologies: Data Mining, Data Visualization, Decision Science, Natural Language Processing, Spatio-Temporal Data Analysis, Statistical Inference, Statistical Modeling
Applications: Behavioral Science, Business Analytics, Law, Political Science, Social Sciences

Website: http://goo.gl/20mBf/
Email: cuz10@psu.edu