Skip to toolbar

Search the Directory

Data Science Researcher Search Form

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Data Science Researchers

Want to be listed on this page? Add me as a Data Science Researcher (Requires login.)

Eric B. Ford

Professor of Astronomy & Astrophysics
University Park

Ford’s research integrates planet formation theory and astronomical observations to improve our understanding of planet formation & evolution, both in our Solar System and in general. They develop, adapt and apply Bayesian methods to: (1) improve the detection and characterization of exoplanets, (2) characterize exoplanet populations, and (3) improve the design and efficiency of exoplanet surveys. For example, the Ford group is characterizing the population of planetary architectures based on data from NASA’s Kepler mission by combining Hierarchical Bayesian Modeling, Approximate Bayesian Computing, and Gaussian Process emulators. As another example, the Ford group is researching how radial velocity surveys can distinguish planets from intrinsic stellar variability by applying machine learning to time series of high-resolution stellar spectra.
Ford created a graduate class on High-Performance Scientific Computing for Astrophysics (Astro 528), contributes to advanced summer schools run by the Penn State Center for Astrostatistics, and maintains a mailing list for Julia Language Users at Penn State. Ford is an Institute for CyberScience co-hire, a co-PI for the CyberLAMP cluster, and has served on Penn State’s Data Sciences Major Management Committee.

Methodologies: Bayesian Methods, Computational Tools for Data Science, High-Dimensional Data Analysis, Machine Learning, Predictive Modeling, Statistical Modeling, Time Series Analysis
Applications: Astronomy and Cosmology

Website: http://personal.psu.edu/~ebf11/
Email: ebf11@psu.edu

Lee Giles

David Reese Professor of Information Sciences and Technology
University Park

My research involves the creation and development of various novel search engines and digital libraries that utilize machine learning and information retrieval techniques.

Methodologies: Deep Learning, Information Retrieval, Machine Learning, Natural Language Processing
Applications: Computer Vision, Education

Website: https://clgiles.ist.psu.edu/
Email: giles@ist.psu.edu

Terry P. Harrison

Professor of Supply Chain and Information Systems
University Park

I use optimization to look at large scale production-distribution systems. I also have a focus on the use of optimization to explore the tradeoffs between additive and subtractive manufacturing. Lastly, I am examining the use of blockchain as a method to create more robust and efficient supply chains.

Methodologies: Algorithms, Decision Science, Network Analysis, Optimization
Applications: Business Analytics, Environmental Sciences, Supply chain management

Website:
Email: hbx@psu.edu

Pete Hatemi

Distinguished Professor
University Park

Pete Hatemi is Distinguished Professor of Political Science, Co-fund Microbiology and Biochemistry at Penn State University. He conducts research in the fields of individual differences in preferences, decision-making, and social behaviors on a wide range of topics, including: political behaviors and attitudes, addiction, political violence and terrorism, public health, gender identification, religion, mate selection, and the nature of interpersonal relationships. In so doing he advocates theoretical and methodological pluralism, including but not limited to behavioral experiments, endocrinology, genetics, physiology, neuroscience, and social learning approaches. He works on policy, health care and national defense in the government, private and public sectors.

Methodologies: Experimental Design, Sparse Data Analysis, Biostatistics, quantitative genetics
Applications: Behavioral Science, Biological Sciences, Health Sciences, Political Science, Psychology, Social Sciences

Website: https://scholar.google.com/citations?user=Ci8Ix08AAAAJ&hl=en
Email: pkh11@psu.edu

Vasant Honavar

Professor and Edward Frymoyer Chair of Information Sciences and Technology, Director, Artificial Intelligence Research Laboratory
University Park

My most recent work in Data Sciences has focused on (i) Scalable algorithms for building predictive models from large, distributed, semantically disparate data (big data), including more recently, linked open data (ii) Algorithms for constructing predictive models from sequence, image, text, multi-relational, graph-structured data; (iii) New approaches to selective sharing of knowledge across autonomous knowledge bases (including knowledge base federation, secrecy-preserving query answering); (iv) Theoretically sound yet practically useful approaches to functional and non-functional specification driven composition of complex services from components; (v) Expressive languages for representing, and model checking approaches to reasoning with, qualitative preferences; (vi) Algorithms for eliciting causal effects from disparate sources of observational and experimental data; (vii) Scalable algorithms and software for comparative analyses of large bio-molecular networks and (6) Machine learning approaches to analysis and prediction of macromolecular interactions and interfaces (including in particular, the first algorithm for partner-specific prediction of protein-protein interface sites and state-of-the-art sequence based protein-RNA interface predictors) that have resulted in several widely used web servers for analysis and prediction of protein-protein, protein-DNA, and protein-RNA interactions and interfaces, B-cell and T-cell epitopes.

My current research focuses on (1) Computational abstractions scientific artifacts (e.g., data, knowledge, hypotheses), and universes of scientific discourse (e.g., biology), and scientific processes (e.g., hypothesis generation, predictive modeling, experimentation, simulation, and hypothesis testing), cognitive tools that augment and extend human intellect; and human-machine infrastructure (including data and computational infrastructure and organizational structures and processes) to accelerate science; (2) Design and analysis of algorithms for predictive modeling from very large, high dimensional, richly structured, multi-modal, longitudinal data; (3) Elucidation of causal relationships from disparate experimental and observational studies; (4) Elucidation of causal relationships from relational, temporal, and temporal-relational data; (5) Design and analyses of accountable, explainable, and fair AI systems; (5) Analysis and prediction of macromolecular interactions, elucidation of complex biological pathways e.g., those involved in immune response, development, and disease; (6) Predictive and causal modeling of individual and population health outcomes from behavioral, biomedical, clinical, environmental, socio-demographic data; (7) Predictive and causal modeling of behavioral and cognitive systems in naturalistic settings; (8) Accelerating materials discovery using machine learning (8) Modeling the structure, activity, and function of brain networks from fMRI and other types of data.

Methodologies: Artificial Intelligence, Casual Inference, Data Mining, Deep Learning, Machine Learning, Network Analysis, Spatio-Temporal Data Analysis
Applications: Bioinformatics, Computer Science, Cyber Security, Health Sciences, Industrial Engineering, Materials Science, Networks, Neuroscience

Website: http://ailab.ist.psu.edu
Email: vuh14@psu.edu

David Hunter

Professor of Statistics
University Park

My work in statistical optimization algorithms includes coining and and helping to popularize the term “MM algorithms,” which is a class of algorithms that contains the well-known EM algorithms. I also work on statistical models for networks and am a co-creator of the “statnet” suite of packages for network analysis in R. Finally, I work on the theory and computational practice of unsupervised clustering using nonparametric finite mixture models.

Methodologies: Algorithms, Network Analysis
Applications: Networks

Website: http://personal.psu.edu/drh20/
Email: dhunter@stat.psu.edu

Jia Li

Professor of Statistics
University Park

Jia Li’s research interests include statistical/machine learning, probabilistic graph models, image analysis with applications in a variety of disciplines. She has developed fundamental methods and algorithms for machine learning as well as real-time AI systems for image annotation, classification, and composition analysis.

Methodologies: Algorithms, Artificial Intelligence, Computational Tools for Data Science, Data Mining, Data Visualization, Deep Learning, High-Dimensional Data Analysis, Image Data Processing and Analysis, Information Retrieval, Machine Learning, Real-time Data Processing, Spatio-Temporal Data Analysis, Statistical Modeling, Time Series Analysis
Applications: Bioinformatics, Biological Sciences, Climate Research, Computer Science, Computer Vision, Digital Humanities, Electrical Engineering, Materials Science, Psychology

Website: stat.psu.edu/~jiali
Email: jiali@psu.edu

Paul Medvedev

Associate Professor
University Park

Paul Medvedev’s research focus is on developing computer science techniques for analysis of biological data and on answering fundamental biological questions using such methods.

Methodologies: Algorithms, Artificial Intelligence, Computational Tools for Data Science, Machine Learning
Applications: Bioinformatics, Biological Sciences, Computer Science

Website: http://medvedevgroup.com
Email: pzm11@psu.edu

Rebecca Napolitano

Assistant Professor of Architectural Engineering
University Park

​My research group focuses on hybrid analytics which lies at the intersection of architectural engineering, data science, and historic preservation. Hybrid analytics, a nascent field, is the combination of physics-based modeling and data-driven modeling for the end goal of making real-time predictions and monitoring in the context of Digital Twin a reality. This new field leverages the decipherability and clear-box nature of physics-based modeling, with accuracy and pattern recognition techniques of data-driven machine learning algorithms. More specifically, our research at the intersection with data science focuses on the following aspects for preservation and adaptive reuse of existing and historic structures as a sustainable infrastructure solution: 1) eye tracking and knowledge graphs to analyze bias during a visual inspection, 2) pattern recognition for damage detection and model generation, 3) sensor modality and location optimization, 4) feature learning from monitoring data, 5) predictive modeling of infrastructure using physics-based models, 6) adaptive design of experiments for new construction/repair materials.

Methodologies: Artificial Intelligence, Bayesian Methods, Experimental Design, Data Mining, High-Dimensional Data Analysis, Machine Learning, Predictive Modeling, Real-time Data Processing
Applications: Civic Infrastructure, Materials Science

Website: https://sites.psu.edu/thebeamlab/research/
Email: nap@psu.edu

Becky Passonneau

Professor of Computer Science and Engineering
University Park

My area of research is natural language processing (NLP), with a focus on semantics and pragmatics. I investigate how the same combinations of words have different meanings in different contexts, in spoken or written language. Recently I have been working on NLP applied to educational technology to support reading and writing skills, and on novel adaptive dialogue policies for agents that learn from people through text-based multi-modal dialogue. In the past I have worked on a wide range of topics including summarization of textual and quantitative data, exploration of knowledge graphs, causal models of failures on the electrical grid based on mining structured and unstructured (textual) data, text forecasting from financial news.

Methodologies: Artificial Intelligence, Casual Inference, Data Mining, Deep Learning, Natural Language Processing
Applications: Computational Linguistics

Website: https://www.nlplab.psu.edu/
Email: rjp49@psu.edu

Shomir Wilson

Assistant Professor of Information Sciences and Technology
University Park

My research brings together natural language processing (NLP), privacy, and artificial intelligence. I direct the Human Language Technologies Lab at Penn State.

I am interested in solving problems to enable computers to do meaningful work with large volumes of natural language text. My lab develops new methods for NLP and applies them to a variety of domains, including privacy, online social networks, web science, and digital libraries. I am particularly interested in breaking down technology’s “walls of text”, i.e., situations where a human user or decision-maker is expected to consume a large quantity of text to take action while lacking sufficient resources (time, expertise) to properly understand what they have been given. I have applied this paradigm to privacy policies, scholarly manuscripts, documents from the world wide web, and historical texts, and I am always interested in new domains to work with.

Methodologies: Artificial Intelligence, Data Security and Privacy, Decision Science, Deep Learning, Machine Learning, Natural Language Processing
Applications: Behavioral Science, Business Analytics, Computational Linguistics, Computer Science, Cyber Security, Digital Humanities

Website: https://shomir.net
Email: shomir@psu.edu

Lingzhou Xue

Associate Professor of Statistics
University Park

My research focuses on the development and application of advanced statistical methods, theory, and computational algorithms for analyzing complex, high-dimensional data, with a special emphasis on the variable selection, network analysis, high-dimensional hypothesis testing, and nonconvex statistical learning.

Methodologies: Deep Learning, High-Dimensional Data Analysis, Machine Learning, Network Analysis, Optimization, Statistical Inference, Statistical Modeling
Applications: Bioinformatics, Biological Sciences, Business Analytics, Environmental Sciences, Finance Research, Networks

Website: https://stat.psu.edu/people/lingzhou-xue
Email: lzxue@psu.edu

Christopher Zorn

Liberal Arts Professor of Political Science
University Park

Christopher Zorn is the Liberal Arts Professor of Political Science, Professor of Sociology and Criminology (by courtesy), and Affiliate Professor of Law at Pennsylvania State University. He holds a Ph.D. in political science from Ohio State University (1997) and a B.A. in political science and philosophy from Truman State University (1991). Prior to coming to Penn State, he was Professor of Political Science at the University of South Carolina (2005-2007), a Visiting Scientist and Program Director for the Law and Social Science Program at the National Science Foundation (2003-2005), and Winship Distinguished Research Professor of Political Science at Emory University, where he taught from 1996 to 2003. His research focuses on judicial politics and on statistics for the social and behavioral sciences. Professor Zorn is the recipient of eight grants from the NSF, as well as numerous other fellowships and awards. His current research interests include unsupervised learning methods for text, measurement models and data reduction, and data visualization for group decision making.

Methodologies: Data Mining, Data Visualization, Decision Science, Natural Language Processing, Spatio-Temporal Data Analysis, Statistical Inference, Statistical Modeling
Applications: Behavioral Science, Business Analytics, Law, Political Science, Social Sciences

Website: http://goo.gl/20mBf/
Email: cuz10@psu.edu