04/22/2024
By Hsien-Yuan Hsu

Date: April 24, 2024
Time: Now 11 a.m. to noon 
Location: Coburn Hall 275

Speaker: Niloofar Ramezani (in-person presentation; Department of Biostatistics at Virginia Commonwealth University)

Title: Statistical and Machine Learning Pipelines for Public Health Studies: Challenges and Complexities of National Studies

Niloofar Ramezani’s Bio:

Ramezani is an Associate Professor in the Department of Biostatistics at Virginia Commonwealth University. She has been active in the leadership of the American Public Health Association’s Applied Public Health Statistics Section since 2017, serving in roles such as Section Chair, Program Chair, and Nomination Co-Chair. Ramezani earned her bachelor’s degree in Statistics (with minor in Mathematics) and her Master of Science and Ph.D. degrees in Applied Statistics. Prior to joining VCU, she was an Assistant Professor of Statistics at George Mason University and she continues her collaboration as an Affiliate Faculty with their School of Policy and Government near Washington DC.

Ramezani’s collaborative research is mainly funded by NIH, National Foundation to End Senior Hunger, CDC, and Arnold Ventures. As co-investigator and co-PI on these studies, she focuses on health and justice, mental health and quality of life of vulnerable and at-risk population groups, and public health policy. She also works on various biomedical studies modeling infectious diseases, cancer detection, signature development for breast cancer patients in the US and sepsis and TB patients in Africa, and modeling the progression and treatment of their diseases over time. Her methodological research focuses on longitudinal and multilevel models, missing data analysis, power analysis, time series, survey methodology, and latent variable modeling.

Talk Abstract:

Statistical and Machine Learning Pipelines for Public Health Studies: Challenges and Complexities of National Studies
This talk will illustrate typical challenges arising in the design and analysis of large-scale longitudinal public health studies, such as sampling, case matching, data collection and management, and missing data treatment using national-level studies such as the Stepping Up project. This project evaluates the effectiveness of county-level efforts to reduce the number of individuals with mental health in jails over multiple years.

During the sampling design process, it was necessary to identify matched control counties for each of the 475 counties participating in the Stepping Up initiative across the U.S. Therefore, screening pilot data and identifying state- and county-level variables to be used for case matching was a crucial step. From thousands of candidate variables, a parsimonious subset was selected using a blend of probability-based statistical and machine learning techniques. Nested within states, counties were clustered on health and social indicators; therefore, a hierarchical case control matching approach needed to be developed to accommodate state- and county-level covariates.

After the baseline data collection phase was completed, survey responses were collected from 761 behavioral health and criminal justice practitioners from 504 counties across the U.S., with over 2000 recorded variables. The resulting dataset suffered from multiple issues concerning data representation and missing data, as commonly encountered for human subject studies. Weighting and missing data handling techniques were applied to address these issues, using a variety of adaptive imputation-based techniques based on statistical machine learning methods. The combination of these steps enabled us to move toward a better study design, avoid the loss of a lot of data due to missingness, and obtain more accurate results when answering our research questions.