white gear icon with a circle outline on a green gradient background
Accelerating Foundation Models Research

Cognition and Societal Benefits

Academic research plays such an important role in advancing science, technology, culture, and society. This grant program helps ensure this community has access to the latest and leading AI models.

Brad Smith, Vice Chair and President
medium green icon of three people standing under an archway with a checkmark

AFMR Goal: Improve human interactions via sociotechnical research

which increases trust, human ingenuity, creativity, and productivity, and decreases the digital divide while reducing the risks of developing AI which does not benefit individuals and society

The proposals mainly focus on significant advancements in the field of healthcare, education, and various social aspects. They highlight the use of Large Language Models (LLMs) to enhance several aspects, such as improved teaching in online education platforms, generating personalized education for cybersecurity, and advancing health outcomes research. There are also proposals focused on understanding the proficiency of LLMs in extracting and understanding clinical data, simulating student interactions in classrooms, and developing privacy-aware medical dialogue systems. Other studies investigate the utility and harms of LLMs for mental health support, their use in English as a foreign language (EFL) education, and their potential application within the legal field. In healthcare, LLMs aim to not only assist doctors in patient-trial matching and radiology report summarization but also to provide patients with more understandable health data. Additionally, there are efforts to align LLMs with the diversity of global user preferences, and establish standardized protocols for using Generative Artificial Intelligence (GAI) in behavioral research, among others.

  • George Mason University: Ziyu Yao (PI)

    The proposal is focused on using Large Language Models (LLMs) to simulate student agents discussing STEM concepts in a virtual classroom. The platform is intended to aid STEM concept learning in PreK-12 education, facilitating teacher professional development and immersive peer learning. The researchers aim to develop student agents with consistent stances in concept understanding, which would interact and debate with each other. A human teacher or student can also partake in the discussions, fostering deeper concept learning.

  • University of Illinois Urbana-Champaign: Volodymyr Kindratenko (PI)

    The proposal is about developing a versatile platform that allows the creation of course-specific chatbots for teaching and research purposes. The system uses the GPT-4 model via OpenAI API, and is capable executing codes, accessing databases, and assisting with computational research tasks.

  • MIT: Marzyeh Ghassemi (PI)

    Investigate the bias of de-identification systems on names in clinical notes via a large-scale empirical analysis. To achieve this, we created 16 name sets that vary along four demographic dimensions: gender, race, name popularity, and the decade of popularity. We insert these names into 100 manually curated clinical templates and evaluate the performance of nine public and private de-identification methods. We found that there are statistically significant performance gaps along a majority of the demographic dimensions in most methods.

    Related paper:

  • Morehouse School of Medicine: Muhammed Idris (PI)

    Breast and Cervical Cancers are among the most common cancers and leaders in cancer-related deaths among women worldwide. Due to low screening rates among disadvantaged groups, including minorities, late-stage diagnoses and mortality rates were higher when compared to their counterparts. The overarching goal of our project is to evaluate the capabilities of foundation models to help facilitate accessible and culturally congruent cancer-related health information with the aim of addressing community concerns (i.e., trust, myths, access) and promoting cancer screenings among underrepresented women of color. Specifically, we will evaluate and compare the breadth, quality, and accuracy of the health-related information around specific community concerns generated by GPT and Llama-2 model families and develop a prototype of an interactive tool, fine-tuned around specific community concerns related to breast and cervical cancer screening using GPT-3 and leverage Azure API and DALL-E 2 to address health literacy barriers.

  • University of Illinois Chicago: Mohan Zalake (PI)

    The research aims to evaluate the acceptability of using Digital Twins of Doctors (DTDs) in patient care. DTDs are AI-generated characters that share the facial and vocal identities of real doctors and can deliver health information to patients. Past research has explored the benefits (e.g., efficient delivery of repetitive information and personalizing patient care) and concerns (e.g., ethical and social concerns) for integrating DTDs in patient care from the perspective of doctors who share their identities with DTDs. Given there exist both potential benefits and limitations, research efforts are required to systematically study the implications of using DTDs in healthcare with all the stakeholders before widely adopting them. In this proposal, I aim to evaluate the acceptability of DTDs with the next important stakeholder: patients; by understanding a) how patients perceive and respond to DTDs that share the identities of their own doctors and b) how DTDs influence patients’ trust, engagement, comprehension, and adherence to health information and advice. I will conduct a mixed-methods study involving patients interacting with DTDs, filling surveys, and follow-up interviews. The proposed research will contribute to the understanding of responsible integration of generative AI solutions like DTDs into healthcare.

  • Harvard University: Pranav Rajpurkar (PI)

    This proposal aims to develop an evaluation framework for assessing the performance of Large Language Models in medical AI applications. The framework simulates real-world doctor-patient conversations and uses AI agents to imitate doctor-patient interaction and assess conversational abilities. The project will initially focus on assessing the diagnosis of skin conditions.

    Related paper:

  • University of Washington: Jennifer Mankoff (PI)

    The proposal aims to explore the capability of Generative AI (GAI) in the realm of text simplification, particularly to aid individuals with cognitive impairments. The team plans to develop a proof-of-concept text simplification system using Open AI’s GPT4 for both generation and validation, addressing the potential risks and consequences for people with disabilities along the way. The research seeks to bridge the digital divide for those who benefit from simplified text and in the process generate important new insights about the value of GAI in accessibility.

  • University of California, Berkeley: Ahmed Alaa (PI)

    The proposal seeks to develop a foundation model that can aid in analyzing real-world observational data (RWD) and generate high-quality Real-World Evidence (RWE). The researchers aim to reduce the development time and expert input required to devise Statistical Analysis Plans (SAPs). The steps proposed include studying the zero-shot performance of LLMs at generating SAPs, and then refining the process to produce a model that can automate SAP production.

  • University of Illinois Urbana-Champaign: Karrie Karahalios (PI)

    The proposal aims to study the impact of AI errors on learners’ engagement, learning outcomes, and perceived helpfulness of AI educational systems. The researchers intend to conduct an online controlled experiment involving adult learners in STEM where they are presented with various scenarios of imperfection in conversational Q&A systems. The outcomes of this study aim to provide the necessary knowledge to maximize learners’ gains and do so fairly.

  • University of British Columbia: Xiaoxiao Li (PI)

    This project aims to leverage Large Language Models (LLM) for multimodal medical data analysis in community healthcare focusing on wound care. It proposes to build a trustworthy conversational AI tuned to provide evidence-based responses to clinical inquiries. Moreover, it seeks to overcome challenges related to multimodal data complexity, trustworthiness and fairness.

  • Northeastern University: Vedant Swain (PI)

    *AICE Accelerator collaboration

    To make AI agents more empathetic towards worker’s goals, the agent needs to (i) understand broader wellbeing goals beyond saving time, (ii) maintain latitudinal and longitudinal awareness of workers’ context outside their task, and (iii) provide workers suggestions to meet those goals by preempting opportunities in their work context. In this project, we propose to prototype and study Pro-Pilot, an enhancement over the existing Copilot that introduces a new Human-AI interaction framework that builds empathy.

  • University of Toronto: Alistair Johnson (PI)

    Create a highly accurate and efficient deidentification system that can be applied to various medical data sources, ultimately facilitating secure data sharing and collaboration in the healthcare industry.

  • The University of Texas at Arlington: Junzhou Huang (PI)

    This project proposes to address the critical challenge in personalized healthcare of accurately predicting survival outcomes using digital pathology techniques. It identifies two key challenges: the complexity of microenvironment of tissues in histopathological images, and the integration of the images with corresponding biomedical text data. To tackle these, we propose two aims: The first is to develop an advanced cell segmentation foundation model that enhances feature extraction and analysis in histopathological images. The second aim focuses on developing a multimodal foundation model that effectively combines pathological image features with biomedical captions for improved survival predictions. These proposed methods promise to significantly influence both machine learning and histopathological imaging by introducing novel foundation models for integrated image-caption data analytics and have the potential to impact other related fields in a similar capacity.

  • Georgia Institute of Technology: May Dongmei Wang (PI)

    Expedite AI research and improve healthcare by developing a privacy-aware medical dialogue system that 1) leverages human interaction and prompting in dialogue systems for unstructured clinical data analysis, and 2) adapts large language models (LLMs) to clinical use cases.

    Related papers:

  • Waseda University: Daisuke Kawahara (PI)

    Proposal for a system to assist visually impaired individuals in outdoor navigation by using vision and language foundation models to extract visual information from images/videos captured by mobile cameras. These insights are communicated through relevant dialogues. Includes use of Azure and OpenAI technologies, and the creation of two specific datasets for model development.

  • University of Michigan, Ann Arbor: Qiaozhu Mei (PI)

    The proposal focuses on a novel approach, distributional alignment, that aligns Large Language Models (LLMs) to the broad spectrum of human preferences stemming from varied contexts. This involves the collection of diverse human contexts and preferences across expansive domains and their integration into LLM training and refinement. Unique metrics to assess the diversity of LLM outputs will be introduced. The investigators request access to GPT-3, GPT-4, and other modeling resources to aid their research.

  • Harvard University: Emily Alsentzer (PI)

    Clinicians face challenges in summarizing a patient’s medical history upon hospital admission due to information overload in electronic health records. We aim to develop LLM-based methods for generating factual summaries by leveraging retrieval-based approaches and to design evaluation approaches for assessing the quality of the generated summaries, comparing them to existing metrics and clinician evaluations.

  • University of New South Wales: Raina MacIntyre (PI)

    The project aims to further develop EPIWATCH, an epidemic detection and surveillance system, using large language models (LLMs). The developed LLMs will be used to automoate certain functions of EPIWATCH and will include low-resource languages important for Australian communities. The models will be fine-tuned and retrained for tasks including classification of public health threats and extraction of key information from large data sets. The project aims to fill the gap in AI usage for public health, especially for underrepresented languages.

  • KAIST: Sangchul Park (PI)

    Study prompting or fine-tuning strategies for improving GPT’s capabilities for contract generation. A set of prompts can be produced as an output of this research. I find GPT particularly good at generating contract terms if proper prompts are fed into it. I will study prompting or fine-tuning strategies for improving GPT’s capabilities for contract generation. A set of prompts can be produced as an output of this research. I will also try to prepare an evaluation set for measuring the performance of contract generation and compare GPT with other language models. If there are difficulties in designing an evaluation set, I can consider, as an alternative, conducting a “snowball sampling” to solicit multiple law professors for human evaluation, or using my law school class to engage law students for evaluation.

  • University of California, San Francisco: Vivek Rudrapatna

    The project aims to evaluate the accuracy of GPT-4 in extracting patient symptoms and medications from clinical notes in the electronic health record. It intends to compare GPT-4’s performance against comparator models and explore hybrid approaches to improve the accuracy of clinical information extraction.

  • San Diego State University: Hajar Homayouni

    AI’s potential in healthcare is limited by the scarcity of representative and balanced Electronic Health Records (EHRs). This proposal aims to address issues stemming from inaccessible, incomplete, and biased EHRs crucial for critical data analysis and decision-making. The research approach involves utilizing limited available data to generate balanced, correlated EHRs for precise and equitable training and validation of data-driven models. The proposed solution is a Federated Privacy-preserving Multimodal Generative (FPMG) framework, designed to generate unbiased EHR data and facilitate secure collaborative learning. Primarily, it targets the generation of balanced and correlated multimodal EHR data types, utilizing a deep generative adversarial model. By capturing cross-modal correlations and associations, the framework aims to enhance decision-making systems. Additionally, the project seeks to explore decentralized cross-silo federated learning to safeguard patients data privacy and enhance the robustness and generalization of models in healthcare applications.

  • University of California, Berkeley: David Bamman (PI)

    Foundation models such as ChatGPT, GPT-4 and Llama 2 are poised to transform research at the intersection of natural language processing and computational social science/cultural analytics, not simply in providing more accurate measuring instruments for existing tasks (Ziems et al. 2023) but also in opening up the ability to ask fundamentally more difficult questions that require world knowledge, long document context, and sophisticated inference. This research project probes the ability of foundation models to accelerate responsible computational research in the social sciences and humanities; the goal is to generate new knowledge about culture and society and provide a roadmap for other researchers to do so themselves.

  • Rice University: Xia Hu (PI)

    The proposal seeks to leverage the abilities of Large Language Models (LLMs) to address the challenge of accurately and reliably matching patients with appropriate clinical trials. It aims to achieve precise and reliable patient-trial matching by resolving the incompatibility between Electronic Health Records (EHRs) and clinical trial descriptions and providing comprehensive explanations for the matches.

  • University of North Carolina at Charlotte: Razvan Bunescu (PI)

    With the aim of improving teaching and learning of coding, we propose to develop Socratic conversational agents by fine-tuning large foundation models on a dataset of dialogues where an instructor helps students debug code. The Socratic conversational agents are intended to augment human instruction, assisting novice programmers to fix their code and thus enhancing their learning outcomes.

    Related paper:

  • University of California, Berkeley: Juliana Schroeder (PI)

    This proposal tackles the urgent need for standardized protocols when integrating Generative Artificial Intelligence (GAI) into behavioral research. The research goals include understanding the current state of GAI use in behavioral science, exploring the potential benefits and risks, and developing guidelines for its effective and responsible use. The proposal includes the conduction of a large field study and consultation with a panel of experts.

  • Florida International University: Mohammadhadi Amini (PI)

    Natural disasters introduce major challenges for critical infrastructure and human lives. In such scenarios, effective communication among first responders, agencies, and residents, is critical to ensure timely recovery and survival. However, existing notification systems are not benefiting from the state-of-the-art AI-based solutions to handle the real-time evolving situations that arise during disasters. Hence, this project proposes to develop and evaluate a conversational agent, using Microsoft Azure OpenAI Services, that can facilitate the coordination of stakeholders in disaster situations using natural language. The conversational agents that are created using Microsoft Azure OpenAI service will be based on large language models (LLMs) that are fine-tuned using historical disaster datasets. The PI and his team have prior experience in using pre-trained models for computer vision, critical infrastructure resilience, and healthcare applications. They used MS Azure services including Azure Machine Learning.

    The performance of the conversational agent will be validated using synthetic scenarios that simulate realistic and challenging use cases and interactions in disasters, by tracking performance metrics such as loss, accuracy, and perplexity. The project will contribute to identifying and exploring new use cases for conversational AI in critical infrastructure resilience, and the outcomes will be disseminated via research publications.

  • Northeastern University: Samuel Scarpino (PI)

    This proposal sets out the goal of developing AI tools for pandemic risk assessment. This is to be achieved through a partnership between the CAPTRS and the IEAI at Northeastern University. By using data from ProMED and the WHO Disease Outbreak Network, the team aims to fine-tune two distinct models – OpenAI’s GPT-3 text-davinci-003 and the open-source model Llama-2. The final outcome of this project would be models capable of generating early-stage outbreak alerts along with an associated risk score.

  • University of Cambridge: Mihaela van Der Schaar (PI)

    Develop and evaluate a hallucination detection system for medical text generation. By detecting and mitigating hallucinations in AI generated text, we aim to enhance patient safety and improve the quality of medical care. Developing a hallucination detection system can improve the safety and quality of AI generated medical text. Furthermore, the insights gained from this research will contribute to the broader understanding of responsible AI deployment in healthcare and help develop best practices for the ethical use of AI in medicine.

  • Emory University: Carl Yang (PI)

    The proposed research aims to explore the application of foundation models, specifically GPTs, to advance health outcomes research. This is achieved by focusing on their alignment with patient values, dealing with social determinants of health, and their capabilities in advancing health sciences and healthcare.

  • University of Berlin: Matthias Groeschel (PI)

    The research project aims at evaluating the feasibility of GPT-4’s ability to compare physicians’ diagnostic and therapeutic choices to national guidelines. The goal is to provide in hospital access to local and national guidelines for two chronic pulmonary diseases, and assessing and defining the prompt strategy for comparison between national guidelines and patient treatment.

  • Carnegie Mellon University: Tom M. Mitchell (PI)

    We propose research applying GPT models to improve the quality of teaching in online education platforms, and request Azure access to GPT to support this research. As a case study of this problem, we will partner with the widely used K-12 education platform freely available at the non-profit www.ck12.org website, which has served over 100 million unique student visitors worldwide.

    Related papers:

  • University of Illinois Urbana-Champaign: Hari Sundaram (PI)

    The proposal is aimed at developing a framework using LLMs to simplify the understanding of online Terms of Service and privacy notices, making them accessible to people with little or no legal understanding. The research will focus on making these contracts understandable in fifth-grade English and analyse alignment with user values on four privacy-related dimensions.

  • KAIST: Alice Oh (PI)

    The integration of ChatGPT in the field of education has garnered significant interest, offering an opportunity to examine its effectiveness in English as a foreign language (EFL) education. A novel learning platform, RECIPE, collects students’ interaction data with ChatGPT by guiding students and ChatGPT prompting. The goal is to investigate students’ usage and perception of generative AI and explore how students can effectively utilize generative AI in EFL writing education.

    Related paper:

  • San Diego State University: Hossein Shirazi (PI)

    In the U.S. knowledge-based sector, 10% of new hires (around 2 million employees) face recurrent challenges, partly from a flawed mentorship system where less than 40% of employees receive mentorship. Overwhelmed mentors hinder productivity, resulting in inefficiencies and wasted resources. To address this, employees turn to Large Language Model (LLM)-based chatbots, like ChatGPT, for career advice, though their reliability and personalization remain concerns. This research explores how LLMs, accessible through Azure and OpenAI Services, can enhance employee experiences, focusing on AI-driven professional mentorship. We investigate benefits, challenges, and strategies to optimize mentorship using advanced technologies. Our methodology comprises three phases: data collection through interviews and platform analysis, LLM utilization for refining AI responses, and developing a Retrieval-Augmented Generation (RAG) chatbot to assess AI mentor effects across industries. Empirical data collection involves interviews with employees, managers, and mentors, supplemented by platform data, creating a robust dataset. We employ LLMs to answer common queries, iteratively refining responses for accuracy and emotional dynamics.

    This project’s potential impact on Employee Experience Management and Human Resource Management is substantial, promising to enrich mentorship experiences and improve HR processes through AI innovation.

  • Georgia Institute of Technology: Munmun De Choudhury (PI)

    In collaboration with mental health clinicians from Northwell Health, we explore the question of where LLM-based chatbots may be useful for mental health contexts, and where they may be harmful. We conduct audits of how LLM-based chatbots (accessed via the Azure OpenAI Service) respond to pregenerated queries seeking support, and how responses from chatbots compare to how peer supporters might answer the query. We identify where chatbots provide credible mental health information and support, and where they may provide poor advice or propagate misinformation. This work contributes a beginning framework around the harms of generative AI for mental health, including methods for studying generative AI in mental health and ethical considerations.

  • Emory University: Craig Jabaley (PI)

    The research proposal aims to evaluate the proficiency of LLMs in extracting and understanding clinical concepts from routine clinical documentation in adult critical care. The process involves comparing LLM outputs against human-annotated clinical notes. The study seeks to understand the capabilities, strengths, and limitations of LLMs in the realm of adult critical care.

  • Emory University: Pedram Rooshenas (PI)

    This project aims to develop an AI-based teaching assistant by leveraging large language models (LLMs). Through precise fine-tuning and strategic prompting, this system will be capable of offering constructive feedback to students and responding to their course-specific queries. Moreover, by incorporating feedback from human educators, we steer the LLMs to produce responses that exemplify the thought process essential for mastering each concept in the course. We are going to pilot our system for Database Systems, a core course in the Computer Science program. Our proposed system has the potential to enhance the learning experience in public universities, particularly in light of the significant rise in enrollments for computer science and data science programs.

  • University of Texas at Austin: Desmond Ong (PI)

    *AICE Accelerator collaboration

    The Digital Empathy pilot aims to investigate emotional intelligence in Large Foundation Model (LFM) -driven systems and to develop and study a series of empathic AI agents to understand and augment human performance and wellbeing. Until now, there has been very little empirical evidence of how empathic LFM systems are or the psychological implications of these systems during human-AI interactions. The project will contribute to a comprehensive survey of the research opportunities and priorities concerning empathy in AI systems and a research platform for the systematic evaluation of empathic agents.

    Related papers:

  • Carnegie Mellon University: Zachary Lipton (PI)

    The proposal aims to study the statistical and algorithmic foundation of training and applying LLMs in a human-centered manner. It consists of two research projects:

  • University of Toronto: Anastasia Kuzminykh (PI)

    We investigate the use of randomized A/B experiments to provide a more scientific basis for prompt engineering. We experimentally compare LLM prompts and user interfaces for LLMs to emulate teaching assistant behaviours: Providing feedback vs asking questions for students to explain to themselves; guiding students to effectively learn from an LLM; motivating students to engage in reflection vs passive learning. Field data from thousands of students will be used for qualitative and quantitative analysis of impact. In addition, we use reinforcement learning for statistically informed adaptive experiments, which automatically enhance, personalize, and contextualize prompts to different users, or the same user at different points in time. The techniques for adaptive experimentation in prompt engineering will be evaluated in domains besides education, from mental well-being chat to field experiments that integrate human-computer interaction, ML/AI, statistics, and psychology.

    Related paper:

  • Stanford University: Akshay Chaudhari (PI)

    We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization and other generative text tasks in healthcare. Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, or clinical text) and via discrete prompting or parameter-efficient fine-tuning. Our findings highlight the importance of domain adaptation for applying LLMs to healthcare and provide valuable insights toward developing effective natural language processing solutions for clinical tasks.

    Related paper:

  • University of Washington: Linda Shapiro (PI)

    This project serves three synergistic projects aimed at advancing medical AI: Quilt-Medical for curating a new multimodal multi-domain dataset of medical concepts; Quilt-Instruct, which uses the curated data to develop a medical chatbot for histopathology education; and MediEval, which evaluates the performance and diagnosis capability of these AI models, highlighting potential unexplored gaps in medical AI. Therefore, we aim to advance AI research in the healthcare sector by enhancing multi-modal chatbot training, data curation, and model evaluation.

    Related paper:

  • University of Southern California: Mahdi Soltanolkotabi (PI)

    The proposal aims to leverage Large Language Models (LLMs) in improving the interpretation and reporting of MRI scans. The approach involves creating a multi-modal dataset consisting of images paired with captions and question-answer pairs and fine-tuning architectures for accuracy and increasing their reliability using conformal prediction. The expected outcome is an improved system of providing reliable interpretations of medical images and assisting in medical report generation and decision making. This system could also potentially provide a ‘second opinion’ for radiologists, reducing the risk of misdiagnosis and decreasing healthcare expenses.

  • Florida International University: Christian Poellabauer (PI)

    Medication errors most commonly occur at the ordering or prescribing stage, potentially leading to medical complications and poor health outcomes. While it is possible to catch these errors using different techniques; the focus of our work is on textual and contextual analysis of prescription information to detect and prevent potential medication errors.

    In previous work, we demonstrated how to use contextual language models (based on BERT) to detect anomalies in written or spoken text based on a data set extracted from real-world medical data of thousands of patient records. The resulting models are able to learn patterns of text dependency and predict erroneous output based on contextual information such as patient data, with an experimental accuracy of about 96% for text input and 79% for speech input. In the proposed project, we will extend this work to replace the contextual language model with a large language model to investigate how such a change will impact accuracy, especially for speech input, which will be increasingly important with the growing use of speech input in medical software and systems.

  • University College London: Clara Colombatto (PI)

    *AICE Accelerator collaboration

    In this project, we propose to leverage insights from the psychology and neuroscience of metacognition and decision-making to study human-AI interactions and their potential for trustworthy collaboration. This past work has highlighted that successful collaboration hinges on sharing not just our cognitive states (e.g. what we believe), but also metacognitive estimates (e.g. our confidence in ourselves and one another). Humans routinely signal their metacognitive states explicitly (e.g., via verbal estimates) or implicitly (e.g., via speech prosody).

  • Emory University: Judy Gichoya (PI)

    Leverage access to the various versions of GPT to evaluate the accuracy and utility of LLMs for radiology report summarization. Our ultimate goal is to develop video and image integration (capable in GPT-4) to evaluate synthetically generated video and text combinations for radiology reports summarization.

  • New Jersey Institute of Technology: Salam Daher (PI)

    Before treating real patients, healthcare trainees use patient simulations to practice safely. Simulated patients range from physical (e.g. mannequin) to virtual (e.g. computer graphics). In simulation, patient responses can be controlled via a human in the loop or can be automated. Having a human in the loop is costly and sometimes suffers from scheduling limitations. The automation of patients’ responses is the future of training; it allows healthcare students to practice certain skills that involve communication with the patient anytime anywhere. Digital Assistants (e.g. Microsoft’s Cortana, Amazon Alexa, Apple Siri) have been used in various fields but their use in healthcare has been limited thus far. Large Language Models such as ChatGPT can provide text responses to any query, but those responses are not personalized to simulate healthcare scenarios such as a conversation with a patient. We propose to develop and test Automated Digital Assistants for Patient Tele-Simulation (ADAPTS) to simulate patients at any time anywhere. ADAPTS combines personalized responses (e.g., a specific healthcare scenario) with the versatility of digital assistants and OpenAi’s Large Language Models to simulate a patient’s responses, and to provide feedback for healthcare students after they interact with the simulated patient.

  • Carnegie Mellon University: Cleotilde Gonzalez (PI)

    The proposal describes a method to integrate foundation models with cognitive models to generate personalized education that can train users to identify and manage risks related to online scams or phishing activities. This will be done by running experiments with human participants using LLM-generated example emails, as well as natural language feedback of participant responses. The intention is to develop an understanding of human behavior and learning styles and to use this data to provide individualized educational feedback.

  • Harvard University: Junwei Lu (PI)

    Adapting large language models to the medical domain by aligning electronic health records (EHR) with NLP prompts and utilizing EHR data to fine-tune the models. The aim is to improve diagnosis reporting, handle disparities and biases in the data, and explore the potential of EHR data for generating deeper medical insights.

  • KAIST: Sangchul Park (PI)

    This study focuses on the application of a large language model (LLM) within the legal domain of trademarks. An essential aspect of all trademark procedures is assessing the likelihood of consumer confusion, which serves as a pivotal touchstone. However, the evaluation of trademark similarity, while being a decisive factor, has historically been characterized by its elusive and subjective nature. To introduce greater structure and consistency to this process, the proposed project aims to vectorize the judgments rendered by judges at the U.S. Trademark Trial and Appeal Board (TTAB) and subsequently develop a judgment prediction model. To achieve this, GPT will be employed not only for the fine-tuning of a model based on TTAB decisions but also to capture pertinent features such as semantic similarity between pairs of marks and the acquired distinctiveness (commonly known as secondary meaning) of generic marks. The amalgamation of GPT and the analysis of judge decisions seeks to enhance the precision and reliability of trademark evaluations in legal proceedings.

  • University of California, Berkeley: David Holtz (PI)

    This research study explores the micro-level interactions between human users and generative AI, aiming to provide rigorous evidence on ‘prompt engineering’. The goal is to understand the optimal use of foundation models in a variety of domains using robust quantitative research. The researchers will conduct large scale online experiments to explore, among other things, individual aptitude for prompt engineering, learning dynamics, and generative AI’s integration into information work.

  • Stanford University: Curtis Langlotz (PI)

    RadGPT is a system that uses artificial intelligence to generate simple and personalized explanations of radiology reports for patients. It extracts key concepts from the reports and uses GPT-4 to create prompts and responses that describe them in plain language. The system also allows patients to ask follow-up questions and provides hyperlinks to more information. The system will be tested by radiologists and radiology trainees, and will be integrated with Stanford Health Care’s myHealth App. The goal of RadGPT is to empower patients with a better understanding of their health data, and to improve patient engagement and health outcomes.

  • Fisk University: Sajid Hussain (PI)

    While Learning Management Systems (LMS) offer numerous benefits such as the collection of extensive data on student performance, they also come with certain limitations. One limitation is the potential for a lack of personalization in delivering sufficient real-time evaluation and feedback to students. Currently, most of the existing LMS are focusing on improving the existing functionality and technology rather than focusing on the students’ learning perspective [3]. LMS platforms often rely on standardized formats and can only provide numeric assessments, making it challenging to provide personalized and verbal evaluation and feedback, and tailor the educational content to individual student needs. Balancing the advantages of LMS with considerations for personalization is crucial for addressing these limitations effectively.

    The proposal involves leveraging advanced language models like GPT-4 or Orca 2 to create a real-time performance evaluation and feedback system for students. The comprehensive project comprises the following key components: Course Design, Data Collection, System Building, Evaluation and Feedback Generation.

    We will create a real-time evaluation and feedback system for students using Microsoft Azure Foundation Models.

  • UT Southwestern Medical Center: Andrew Jamieson (PI)

    The project aims to establish and evaluate the effectiveness of AI-generated feedback on medical students’ post-encounter notes. Leveraging a dataset of 15,000 existing examples, the project plans to refine and evaluate feedback using metrics like factuality, fidelity, helpfulness, and actionability, ultimately aiming to improve feedback quality and timeliness.

  • Stanford University: Dan Jurafsky (PI)

    The proposal aims to realign generated language from AI models with human needs, focusing on how uncertainty is expressed. It plans to conduct a three-part study: analyzing the linguistic miscalibration of language models, creating a dataset of transcribed human decision-making conversations, and training foundation models with artificially calibrated datasets.

  • University of Oxford: Scott Hale (PI)

    This proposal aims to align large language models (LLMs) with the diverse values and preferences of global users. It plans to build a large-scale, personalized, and diverse dataset of human feedback over LLM interactions with individuals from over 35 different countries, rating the responses of more than 20 commercial and open-source models. This rich dataset can be used for developing LLM that are more socio-culturally aware. The Microsoft support will enable the evaluation and training of models at scale, allowing more experiments, iterations, and refinements.

  • MIT: Mert Demirer (PI)

    *AICE Accelerator collaboration

    As the adoption of generative AI tools becomes more widespread, it is crucial to anticipate the macroeconomic effects on labor and production. This requires both a whole-market view and a detailed accounting of the differences between jobs. We will approach this challenge by treating jobs as interconnected sequences of tasks that vary in how easily they can be automated and overseen. This results in some jobs being “more automatable” than others — even accounting for the level of skill required to complete the job manually — and suggests jobs where human-AI collaboration might be especially useful. We will use these models to study the general equilibrium impact of advances in AI automation across different job domains.

  • Stanford University: Russ Altman (PI)

    The research focuses on using social media text data to track and understand the opioid epidemic. It includes two subprojects. The first subproject is a review of a broad set of social media platforms for their utility in analyzing opiate-related discussions. Because social media text is rife with slang and intentionally misspelled words for evasion of censorship (“algospeak”), knowledge of these nonstandard terms for illicit drugs is needed in order to quantify the amount of opiate-related discussion present on a particular social media platform. This subproject uses large language models (LLMs) and the Google API for algospeak generation. The expected outcome is a systematic assessment of volume of opiate-related discussion across different social media platforms. The second subproject is an analysis of sentiment regarding the opioid epidemic in different specific American cities. This subproject uses LLMs and Azure Open AI Embeddings for sentiment analysis, and the expected outcome is an understanding of how city-specific drug policies are reflected in citizen sentiment.

  • University of Oxford: Andrew Soltan (PI)

    This project will apply large language models to support decision making in cancer diagnosis and treatment, using real-world data from a digitally mature NHS Trust. We aim to (i) improve upon the quality of referrals made to specialist cancer multidisciplinary team meetings (MDTs) where treatment decisions are taken, (ii) predict outcomes of the meeting in advance, allowing factors that may delay decisions to be remedied before the meeting, and (iii) use multimodal models to generate MDT referrals from unreported imaging & pathology data. Our study aims to improve the patient experience and inter-professional communication, reduce delays within the clinical pathway, and offer a decision aid for low and middle income settings.

  • Savannah State University: Kai Shen (PI)

    This project seeks to advance healthcare by utilizing Large Language Models (LLMs) for the extraction of Social Determinants of Health (SDOH) from Electronic Health Records (EHR), focusing on improving care for individuals at risk of substance use disorder (SUD). Using the comprehensive All of Us (AOU) dataset, the project will assess and refine the use of GPT models for extracting SDOH, aiming to identify potential biases in these models and understand their impact on healthcare data interpretation across diverse demographics. The project involves two primary goals: 1) evaluating GPT model biases in extracting SDOH from EHR notes, ensuring equitable performance across race, ethnicity, and sex; 2) investigating the impact of adverse SDOH on SUD outcomes, employing advanced GPT models and statistical methods. The expected outcomes include a refined process for SDOH extraction using LLMs, an assessment of biases in LLM models, and insights into the impact of SDOH on SUD. This project is poised to enhance the understanding of SUD in minority communities, improve intervention strategies, and promote more equitable healthcare outcomes.


*AICE Accelerator collaboration