Search the Directory
Data Science Researchers
Want to be listed on this page? Add me as a Data Science Researcher (Requires login.)
Gregory Banyay
Assistant Research Professor
University Park
Data science can significantly benefit multiple domains of engineering mechanics, particularly with respect to modeling and simulation. My focus lies primarily in the area of development, deployment, verification & validation, and uncertainty quantification, of digital twins, in support of both industrial and academic endeavors.
Methodologies: Bayesian Methods, Experimental Design, Decision Science, Dynamical Models, Machine Learning, Optimization, Predictive Modeling, Statistical Inference, Statistical Modeling, Time Series Analysis
Applications: Environmental Sciences, Industrial Engineering, Music
Website: https://www.arl.psu.edu/content/fluid-dynamics-acoustics
Email: gab5631@psu.edu
Guido Cervone
Professor of Geography, Meteorology and Atmospheric Science
University Park
My formal background is in Computational Science and Remote Sensing, and my research focuses on the development and application of computational algorithms for the analysis of spatio-temporal remote sensing; numerical modeling; and social media “Big Data” related to environmental hazards and renewable energy. I focus on problems related to the fusion of heterogenous data at different temporal and spatial scales.
Methodologies: Artificial Intelligence, Computational Tools for Data Science, Data Mining, Deep Learning, High-Dimensional Data Analysis, Machine Learning, Spatio-Temporal Data Analysis
Applications: Environmental Sciences
Website:
Email: cervone@psu.edu
Enrique del Castillo
Distinguished Professor of Industrial Engineering and Professor of Statistics
University Park
My broad interests are in Statistics and Machine Learning methods and their application to all of Engineering and to some areas in Science. The “big data” revolution has resulted not only in larger datasets but in data that have a more complex structure. The revolution has been driven by better and faster non-contact sensors in industry, by micro-arrays, better optics, and increasingly more powerful mass spectrometers in science, and by better remote sensing and optical equipment in geophysics and astronomy. In industry, while the traditional paradigm in statistics developed by Fisher, “Student” and Neyman, characterized by small samples obtained in expensive experiments, is very powerful and still of great application today, there is a considerable number of fields in both engineering and science where a response of interest is made of thousands of inexpensive observations, given the wide availability of different type of sensors and scanners.
My research over the years has focused on how to control or optimize an industrial process where large heterogeneous datasets are available. I am interested in building data-based statistical models for the control and optimization of engineering systems or that provide helpful information for scientists. This includes diverse problems in process control (Statistical and Time Series Control), Experimental Design, and Response Surface Optimization methods. In recent years I have worked in these areas dealing with complex, large geometrical (or geometrical-spatial) datasets, specifically, functional, shape and surface data (i.e., data that occurs in 1D or 2D-manifolds), image data (2 and 3D) and general high dimensional data that may be concentrated in lower dimensional manifolds.
Methodologies: Bayesian Methods, Casual Inference, Experimental Design, Dynamical Models, Machine Learning, Spatio-Temporal Data Analysis, Time Series Analysis
Applications: Astronomy and Cosmology, Biological Sciences, Industrial Engineering, Manufacturing, Production
Website: https://sites.google.com/view/ecastillo/home#h.n74gqnov8h53
Email: exd13@psu.edu
Eric B. Ford
Professor of Astronomy & Astrophysics
University Park
Ford’s research integrates planet formation theory and astronomical observations to improve our understanding of planet formation & evolution, both in our Solar System and in general. They develop, adapt and apply Bayesian methods to: (1) improve the detection and characterization of exoplanets, (2) characterize exoplanet populations, and (3) improve the design and efficiency of exoplanet surveys. For example, the Ford group is characterizing the population of planetary architectures based on data from NASA’s Kepler mission by combining Hierarchical Bayesian Modeling, Approximate Bayesian Computing, and Gaussian Process emulators. As another example, the Ford group is researching how radial velocity surveys can distinguish planets from intrinsic stellar variability by applying machine learning to time series of high-resolution stellar spectra.
Ford created a graduate class on High-Performance Scientific Computing for Astrophysics (Astro 528), contributes to advanced summer schools run by the Penn State Center for Astrostatistics, and maintains a mailing list for Julia Language Users at Penn State. Ford is an Institute for CyberScience co-hire, a co-PI for the CyberLAMP cluster, and has served on Penn State’s Data Sciences Major Management Committee.
Methodologies: Bayesian Methods, Computational Tools for Data Science, High-Dimensional Data Analysis, Machine Learning, Predictive Modeling, Statistical Modeling, Time Series Analysis
Applications: Astronomy and Cosmology
Website: http://personal.psu.edu/~ebf11/
Email: ebf11@psu.edu
Lee Giles
David Reese Professor of Information Sciences and Technology
University Park
My research involves the creation and development of various novel search engines and digital libraries that utilize machine learning and information retrieval techniques.
Methodologies: Deep Learning, Information Retrieval, Machine Learning, Natural Language Processing
Applications: Computer Vision, Education
Website: https://clgiles.ist.psu.edu/
Email: giles@ist.psu.edu
Terry P. Harrison
Professor of Supply Chain and Information Systems
University Park
I use optimization to look at large scale production-distribution systems. I also have a focus on the use of optimization to explore the tradeoffs between additive and subtractive manufacturing. Lastly, I am examining the use of blockchain as a method to create more robust and efficient supply chains.
Methodologies: Algorithms, Decision Science, Network Analysis, Optimization
Applications: Business Analytics, Environmental Sciences, Supply chain management
Website:
Email: hbx@psu.edu
Pete Hatemi
Distinguished Professor
University Park
Pete Hatemi is Distinguished Professor of Political Science, Co-fund Microbiology and Biochemistry at Penn State University. He conducts research in the fields of individual differences in preferences, decision-making, and social behaviors on a wide range of topics, including: political behaviors and attitudes, addiction, political violence and terrorism, public health, gender identification, religion, mate selection, and the nature of interpersonal relationships. In so doing he advocates theoretical and methodological pluralism, including but not limited to behavioral experiments, endocrinology, genetics, physiology, neuroscience, and social learning approaches. He works on policy, health care and national defense in the government, private and public sectors.
Methodologies: Experimental Design, Sparse Data Analysis, Biostatistics, quantitative genetics
Applications: Behavioral Science, Biological Sciences, Health Sciences, Political Science, Psychology, Social Sciences
Website: https://scholar.google.com/citations?user=Ci8Ix08AAAAJ&hl=en
Email: pkh11@psu.edu
Louisa Holmes
Assistant Professor of Geography and Demography
University Park
I am a health geographer and demographer with additional training in public health and public policy. My research focuses in three areas – (1) health disparities and the socio-spatial determinants of health; (2) tobacco control and substance use; and (3) quantitative and geospatial research methods, particularly representative survey research and area-level observational studies. In my interdisciplinary work, I seek to understand contexts of health and place as foundational to perpetuating health disparities, as well as opportune for promoting health, through social engagement, built and natural environments, and multi-level policy infrastructures. In recent years, I have increasingly approached my research through the lens of sustainability; sustainable communities are those with equitable access to environments optimal for promoting health and preventing disease.
I have designed and implemented numerous probabilistic household surveys and environmental data collection projects, with which data I have published on topics such as tobacco control, cannabis use, migrant health and biological risk profiles in the context of urban neighborhoods. Presently, I am completing the second wave of a panel study of young adult substance use in the San Francisco Bay Area, which also includes tobacco, vape and cannabis retail data collection, and neighborhood assessments.
At Penn State, I also teach Intro to Spatial Methods and Advanced Spatial Methods, along with special topics courses in Health Geography.
Methodologies: Data Visualization, Spatio-Temporal Data Analysis, Statistical Modeling
Applications: Geographic Information Systems, Health Sciences, Social Sciences
Website: https://www.geog.psu.edu/directory/louisa-m-holmes
Email: lmholmes@psu.edu
Vasant Honavar
Professor and Edward Frymoyer Chair of Information Sciences and Technology, Director, Artificial Intelligence Research Laboratory
University Park
My most recent work in Data Sciences has focused on (i) Scalable algorithms for building predictive models from large, distributed, semantically disparate data (big data), including more recently, linked open data (ii) Algorithms for constructing predictive models from sequence, image, text, multi-relational, graph-structured data; (iii) New approaches to selective sharing of knowledge across autonomous knowledge bases (including knowledge base federation, secrecy-preserving query answering); (iv) Theoretically sound yet practically useful approaches to functional and non-functional specification driven composition of complex services from components; (v) Expressive languages for representing, and model checking approaches to reasoning with, qualitative preferences; (vi) Algorithms for eliciting causal effects from disparate sources of observational and experimental data; (vii) Scalable algorithms and software for comparative analyses of large bio-molecular networks and (6) Machine learning approaches to analysis and prediction of macromolecular interactions and interfaces (including in particular, the first algorithm for partner-specific prediction of protein-protein interface sites and state-of-the-art sequence based protein-RNA interface predictors) that have resulted in several widely used web servers for analysis and prediction of protein-protein, protein-DNA, and protein-RNA interactions and interfaces, B-cell and T-cell epitopes.
My current research focuses on (1) Computational abstractions scientific artifacts (e.g., data, knowledge, hypotheses), and universes of scientific discourse (e.g., biology), and scientific processes (e.g., hypothesis generation, predictive modeling, experimentation, simulation, and hypothesis testing), cognitive tools that augment and extend human intellect; and human-machine infrastructure (including data and computational infrastructure and organizational structures and processes) to accelerate science; (2) Design and analysis of algorithms for predictive modeling from very large, high dimensional, richly structured, multi-modal, longitudinal data; (3) Elucidation of causal relationships from disparate experimental and observational studies; (4) Elucidation of causal relationships from relational, temporal, and temporal-relational data; (5) Design and analyses of accountable, explainable, and fair AI systems; (5) Analysis and prediction of macromolecular interactions, elucidation of complex biological pathways e.g., those involved in immune response, development, and disease; (6) Predictive and causal modeling of individual and population health outcomes from behavioral, biomedical, clinical, environmental, socio-demographic data; (7) Predictive and causal modeling of behavioral and cognitive systems in naturalistic settings; (8) Accelerating materials discovery using machine learning (8) Modeling the structure, activity, and function of brain networks from fMRI and other types of data.
Methodologies: Artificial Intelligence, Casual Inference, Data Mining, Deep Learning, Machine Learning, Network Analysis, Spatio-Temporal Data Analysis
Applications: Bioinformatics, Computer Science, Cyber Security, Health Sciences, Industrial Engineering, Materials Science, Networks, Neuroscience
Website: http://ailab.ist.psu.edu
Email: vuh14@psu.edu
Sharon Xiaolei Huang
Associate Professor, College of Information Sciences and Technology
University Park
Dr. Sharon Huang is a data scientist who works with multimedia data, especially image and video data in the biomedical domain. She focuses on image analysis, machine learning and visual analytics methods for object recognition, image and video segmentation, image and video synthesis, computer-assisted diagnosis and intervention, image registration/matching, and motion tracking. Her broader interests include data science for healthcare, artificial intelligence for medicine, biomedical informatics, computer vision, computer graphics, and human-computer interaction.
Methodologies: Algorithms, Artificial Intelligence, Bayesian Methods, Computational Tools for Data Science, Data Visualization, Deep Learning, High-Dimensional Data Analysis, Image Data Processing and Analysis, Machine Learning, Predictive Modeling, Spatio-Temporal Data Analysis
Applications: Bioinformatics, Biological Sciences, Climate Research, Computer Science, Computer Vision, Health Sciences, Materials Science
Website: sharon-huang.ist.psu.edu
Email: suh972@psu.edu
David Hunter
Professor of Statistics
University Park
My work in statistical optimization algorithms includes coining and and helping to popularize the term “MM algorithms,” which is a class of algorithms that contains the well-known EM algorithms. I also work on statistical models for networks and am a co-creator of the “statnet” suite of packages for network analysis in R. Finally, I work on the theory and computational practice of unsupervised clustering using nonparametric finite mixture models.
Methodologies: Algorithms, Network Analysis
Applications: Networks
Website: http://personal.psu.edu/drh20/
Email: dhunter@stat.psu.edu
Jia Li
Professor of Statistics
University Park
Jia Li’s research interests include statistical/machine learning, probabilistic graph models, image analysis with applications in a variety of disciplines. She has developed fundamental methods and algorithms for machine learning as well as real-time AI systems for image annotation, classification, and composition analysis.
Methodologies: Algorithms, Artificial Intelligence, Computational Tools for Data Science, Data Mining, Data Visualization, Deep Learning, High-Dimensional Data Analysis, Image Data Processing and Analysis, Information Retrieval, Machine Learning, Real-time Data Processing, Spatio-Temporal Data Analysis, Statistical Modeling, Time Series Analysis
Applications: Bioinformatics, Biological Sciences, Climate Research, Computer Science, Computer Vision, Digital Humanities, Electrical Engineering, Materials Science, Psychology
Website: stat.psu.edu/~jiali
Email: jiali@psu.edu
Shaun Mahony
Assistant Professor of Biochemistry & Molecular Biology
University Park
My lab develops machine learning applications for understanding gene regulation. We are particularly interested in regulatory proteins called transcription factors, which recognize particular DNA binding sites in the genome and thereby regulate the cell-specific activities of genes. We develop machine learning approaches to understand the DNA sequence and chromatin patterns that determine transcription factor regulatory events within a given cell type.
Methodologies: Deep Learning, Machine Learning
Applications: Bioinformatics, Biological Sciences
Website: http://mahonylab.org/
Email: sam77@psu.edu
Paul Medvedev
Associate Professor
University Park
Paul Medvedev’s research focus is on developing computer science techniques for analysis of biological data and on answering fundamental biological questions using such methods.
Methodologies: Algorithms, Artificial Intelligence, Computational Tools for Data Science, Machine Learning
Applications: Bioinformatics, Biological Sciences, Computer Science
Website: http://medvedevgroup.com
Email: pzm11@psu.edu
Kevin Munger
Assistant Professor of Political Science and Social Data Analytics
University Park
I do large-scale quantitative analysis of social media trace data, with a focus on video-focused platforms like YouTube and TikTok. I also conduct randomized “field” experiments on social media using Twitter bots. My primary methods are webscraping and quantitative text analysis.
Methodologies: Casual Inference, Computational Tools for Data Science, Experimental Design, Machine Learning
Applications: Political Science
Website: https://polisci.la.psu.edu/people/kmm7999
Email: kmm7999@psu.edu
Rebecca Napolitano
Assistant Professor of Architectural Engineering
University Park
My research group focuses on hybrid analytics which lies at the intersection of architectural engineering, data science, and historic preservation. Hybrid analytics, a nascent field, is the combination of physics-based modeling and data-driven modeling for the end goal of making real-time predictions and monitoring in the context of Digital Twin a reality. This new field leverages the decipherability and clear-box nature of physics-based modeling, with accuracy and pattern recognition techniques of data-driven machine learning algorithms. More specifically, our research at the intersection with data science focuses on the following aspects for preservation and adaptive reuse of existing and historic structures as a sustainable infrastructure solution: 1) eye tracking and knowledge graphs to analyze bias during a visual inspection, 2) pattern recognition for damage detection and model generation, 3) sensor modality and location optimization, 4) feature learning from monitoring data, 5) predictive modeling of infrastructure using physics-based models, 6) adaptive design of experiments for new construction/repair materials.
Methodologies: Artificial Intelligence, Bayesian Methods, Experimental Design, Data Mining, High-Dimensional Data Analysis, Machine Learning, Predictive Modeling, Real-time Data Processing
Applications: Civic Infrastructure, Materials Science
Website: https://sites.psu.edu/thebeamlab/research/
Email: nap@psu.edu
Becky Passonneau
Professor of Computer Science and Engineering
University Park
My area of research is natural language processing (NLP), with a focus on semantics and pragmatics. I investigate how the same combinations of words have different meanings in different contexts, in spoken or written language. Recently I have been working on NLP applied to educational technology to support reading and writing skills, and on novel adaptive dialogue policies for agents that learn from people through text-based multi-modal dialogue. In the past I have worked on a wide range of topics including summarization of textual and quantitative data, exploration of knowledge graphs, causal models of failures on the electrical grid based on mining structured and unstructured (textual) data, text forecasting from financial news.
Methodologies: Artificial Intelligence, Casual Inference, Data Mining, Deep Learning, Natural Language Processing
Applications: Computational Linguistics
Website: https://www.nlplab.psu.edu/
Email: rjp49@psu.edu
Wesley Reinhart
Assistant Professor of Materials Science & Engineering
University Park
My research platform takes advantage of lessons learned from the traditional materials design approach to develop efficient and robust inverse design workflows based on both physics-based modeling and data-driven paradigms, including GPU accelerated computing, hybrid simulation methods accelerated by machine learning, and generative models which require no simulation at all. We seek to capitalize on advances in both data science and machine learning, including the increasingly popular deep learning but also methods based on Gaussian Process, Optimal Transport, and other related methods, as well as high-performance physics simulation to predict the thermodynamic, electromagnetic, and mechanical responses of materials. The increasingly close coupling of these topics with materials synthesis and characterization will undoubtedly unlock new and improved functionalities in a wide variety of materials applications.
Methodologies: Artificial Intelligence, Data Mining, Deep Learning, High-Dimensional Data Analysis, Machine Learning, Optimization, Predictive Modeling
Applications: Chemistry and Chemical Engineering, Materials Science, Nanotechnology
Website: https://sites.psu.edu/reinhartgroup/
Email: reinhart@psu.edu
Shomir Wilson
Assistant Professor of Information Sciences and Technology
University Park
My research brings together natural language processing (NLP), privacy, and artificial intelligence. I direct the Human Language Technologies Lab at Penn State.
I am interested in solving problems to enable computers to do meaningful work with large volumes of natural language text. My lab develops new methods for NLP and applies them to a variety of domains, including privacy, online social networks, web science, and digital libraries. I am particularly interested in breaking down technology’s “walls of text”, i.e., situations where a human user or decision-maker is expected to consume a large quantity of text to take action while lacking sufficient resources (time, expertise) to properly understand what they have been given. I have applied this paradigm to privacy policies, scholarly manuscripts, documents from the world wide web, and historical texts, and I am always interested in new domains to work with.
Methodologies: Artificial Intelligence, Data Security and Privacy, Decision Science, Deep Learning, Machine Learning, Natural Language Processing
Applications: Behavioral Science, Business Analytics, Computational Linguistics, Computer Science, Cyber Security, Digital Humanities
Website: https://shomir.net
Email: shomir@psu.edu
Lingzhou Xue
Associate Professor of Statistics
University Park
My research focuses on the development and application of advanced statistical methods, theory, and computational algorithms for analyzing complex, high-dimensional data, with a special emphasis on the variable selection, network analysis, high-dimensional hypothesis testing, and nonconvex statistical learning.
Methodologies: Deep Learning, High-Dimensional Data Analysis, Machine Learning, Network Analysis, Optimization, Statistical Inference, Statistical Modeling
Applications: Bioinformatics, Biological Sciences, Business Analytics, Environmental Sciences, Finance Research, Networks
Website: https://stat.psu.edu/people/lingzhou-xue
Email: lzxue@psu.edu
Christopher Zorn
Liberal Arts Professor of Political Science
University Park
Christopher Zorn is the Liberal Arts Professor of Political Science, Professor of Sociology and Criminology (by courtesy), and Affiliate Professor of Law at Pennsylvania State University. He holds a Ph.D. in political science from Ohio State University (1997) and a B.A. in political science and philosophy from Truman State University (1991). Prior to coming to Penn State, he was Professor of Political Science at the University of South Carolina (2005-2007), a Visiting Scientist and Program Director for the Law and Social Science Program at the National Science Foundation (2003-2005), and Winship Distinguished Research Professor of Political Science at Emory University, where he taught from 1996 to 2003. His research focuses on judicial politics and on statistics for the social and behavioral sciences. Professor Zorn is the recipient of eight grants from the NSF, as well as numerous other fellowships and awards. His current research interests include unsupervised learning methods for text, measurement models and data reduction, and data visualization for group decision making.
Methodologies: Data Mining, Data Visualization, Decision Science, Natural Language Processing, Spatio-Temporal Data Analysis, Statistical Inference, Statistical Modeling
Applications: Behavioral Science, Business Analytics, Law, Political Science, Social Sciences
Website: http://goo.gl/20mBf/
Email: cuz10@psu.edu