AI & Qualitative Research: How LLMs are Impacting Data Analysis & Citations

Eric Buckley LLM Citations
October 31st, 2025 13 minute read

Explore AI Summary Of This Article

Listen to article

Audio generated by DropInBlog's Blog Voice AI™ may have slight pronunciation nuances. Learn more

Introduction: LLMs in Qualitative Research
LLM Market Growth and Accessibility
Transforming Data Analysis with Conversational AI
Challenges and Ethical Considerations
Implementing LLMs in Research Workflows
Evolving Citation Practices for AI-Assisted Research
Case Studies: Real-World Applications
Future Outlook: Sustainability and Innovation
Conclusion
FAQs

Introduction: LLMs in Qualitative Research

Large Language Models (LLMs) are reshaping qualitative research by offering powerful tools for data analysis. Conversational AI automates tasks previously requiring extensive human effort, changing how researchers approach complex datasets. This shift brings both opportunities for efficiency and new considerations for research integrity and citation.

The integration of AI technology into qualitative methodologies promises to scale analysis, deepen insights, and democratize access to advanced research tools. Researchers now use LLMs for transcription, thematic coding, and summarization, tasks that traditionally consumed significant time. This guide explores the intersection of qualitative research and LLMs, focusing on how conversational AI transforms data analysis and the implications for citation practices.

Understanding the capabilities and limitations of LLMs is crucial for researchers. While these models offer speed and scale, they also introduce challenges related to bias, accuracy, and the need for rigorous human oversight. Navigating this evolving landscape requires a clear framework for responsible AI integration.

This article provides a comprehensive overview of how AI technology is influencing qualitative research, detailing its applications, benefits, challenges, and the necessary adjustments in academic writing and citation. It aims to equip researchers with the knowledge to effectively use LLMs while maintaining scholarly rigor.

LLM Market Growth and Accessibility

The market for Large Language Models shows rapid expansion, reflecting their growing adoption across various industries, including research. This growth makes advanced AI technology solutions more accessible to a wider range of users, including qualitative researchers.

The global LLM market was valued at $5.6 billion in 2024 and is projected to reach $7.4 billion by 2025, with a compound annual growth rate (CAGR) of 36.9% through 2030, reaching $35.4 billion. North America, particularly the U.S., holds the largest market share at 32.1% in 2024, according to Grand View Research. This significant market expansion indicates a strong trend in AI technology adoption.

Accessibility to these powerful tools has also improved dramatically. The cost of querying an AI system with GPT-3.5-level performance dropped from about $20 per million tokens in late 2022 to just a few cents by mid-2024, as detailed by Lumivero. This reduction in cost makes sophisticated qualitative analysis tools available to researchers outside of major tech companies and well-funded institutions.

The performance of leading AI models in handling complex qualitative tasks also saw a substantial jump. Between 2023 and 2024, their solvability on major benchmarks increased from 4.4% to 71.7%, according to Lumivero. The performance gap between proprietary models like OpenAI’s GPT-4 and leading open-source models has nearly closed, with only a 1.7% difference by early 2025. This convergence offers researchers more choices for AI technology solutions.

Processed with VSCO with c4 preset — Photo by Natália Oliveira from Pexels

Transforming Data Analysis with Conversational AI

Conversational AI, powered by LLMs, is fundamentally changing how qualitative data analysis is conducted. These tools automate numerous tasks, allowing researchers to process large volumes of data more efficiently and identify patterns that might be missed by human analysts alone.

LLMs are widely applied for qualitative analysis in fields such as healthcare, education, and social sciences, automating processes traditionally requiring significant human effort, as discussed in a systematic review on arXiv. These capabilities are now available to non-specialists, with both open-source and proprietary models proving viable for qualitative analysis, as Lumivero highlights.

Specific applications of LLMs in qualitative data analysis include:

Transcription: Converting audio or video recordings of interviews and focus groups into text. This saves hours of manual labor.
Translation: Facilitating research across linguistic barriers by translating qualitative data. This expands the scope of global studies.
Sentiment Analysis: Identifying emotional tones and attitudes within text data. This helps gauge participant perceptions at scale.
Summarization: Condensing lengthy qualitative documents, such as interview transcripts or open-ended survey responses, into concise summaries. This aids in quick review and theme identification.
Thematic Coding: Automatically identifying, categorizing, and organizing themes and patterns within textual data. This accelerates the initial stages of analysis.

In thematic analysis of online nursing forums, there was 80% agreement (247 out of 310 cases) between LLM and human coders on theme identification, with LLMs providing additional subthemes and depth not initially identified by humans, according to a study in JMIR AI. In two-thirds of topics, human and LLM interpretations converged, demonstrating the models' ability to match human judgment in many instances.

Challenges and Ethical Considerations

Despite the advancements, integrating LLMs into qualitative research presents several challenges and ethical dilemmas. Researchers must navigate these issues carefully to maintain the integrity and trustworthiness of their studies.

Persistent issues include reliance on prompt engineering, occasional inaccuracies, contextual limitations, and concerns about bias, toxicity, and the “surrogate effect.” The “surrogate effect” refers to using LLMs as proxies for human participants, raising ethical and epistemological questions, as discussed in a systematic review and ACM proceedings. Model “hallucinations,” where LLMs generate plausible but incorrect information, and the continued need for human oversight remain significant barriers, as noted by AIMultiple.

Key challenges for researchers using LLMs:

Prompt Engineering: The quality of LLM output heavily depends on the precision and clarity of the input prompts. Crafting effective prompts requires skill and iterative refinement.
Accuracy and Hallucinations: LLMs can sometimes generate information that is factually incorrect or inconsistent with the source data. Human researchers must verify all AI-generated insights.
Contextual Limitations: LLMs may struggle with nuanced interpretations, cultural contexts, or implicit meanings that human qualitative researchers readily grasp. They lack lived experience.
Bias and Toxicity: LLMs are trained on vast datasets that can contain societal biases. These biases may be reflected in the AI's analysis, potentially leading to skewed or unfair interpretations.
Data Privacy and Security: Researchers must ensure that sensitive qualitative data is handled securely when processed by LLMs, especially when using third-party services.

Researchers from MIT, Cornell, Rutgers, and Michigan highlighted in recent interviews the urgent need for norms and tooling to guide appropriate use of LLMs in qualitative research. This ensures the preservation of integrity and participant trust, as detailed in a discussion on arXiv. Ethical considerations extend to obtaining participant consent for AI use and monitoring potential biases introduced by LLMs, as suggested by Nature.

Close-up of a woman's hand on an open diary on a wooden table, capturing an intimate reading moment indoors. — Photo by Polina Tankilevitch from Pexels

Implementing LLMs in Research Workflows

Integrating LLMs into qualitative research workflows requires a strategic approach that combines AI capabilities with human expertise. The most robust qualitative research often involves a hybrid approach, where LLMs assist with initial coding, summarization, and thematic analysis, but human researchers remain essential for interpretation, validation, and ensuring ethical rigor, as noted in a systematic review and JMIR AI.

How to effectively implement LLMs in qualitative research:

Initial Data Processing: Use LLMs for automated transcription of interviews or focus groups. This converts spoken data into text quickly.
Preliminary Coding and Theme Generation: Apply LLMs to generate initial codes or identify emergent themes from large datasets. This accelerates the first pass of analysis.
Summarization of Transcripts: Employ LLMs to create concise summaries of individual interviews or documents. This helps researchers quickly grasp key points.
Deductive Coding Application: LLMs can effectively reuse prior coding schemes deductively, applying existing categories to new data. This ensures consistency across datasets.
Analytic Table Generation: Use LLMs to systematically generate analytic tables and illustrative quotes. This enhances traceability and transparency of findings, as suggested by Sage Journals.

While LLMs can accelerate coding, they require human intervention to identify and remove duplicated codes, as LLMs code interviews in isolation. This hybrid human-AI approach yields unique codebooks and accelerates the coding process while maintaining quality standards, according to insights from the Qualitative Research Forum. Researchers must combine LLM-driven thematic coding with human review to manage duplication and ensure contextual accuracy.

This collaborative model ensures that the speed and scale of AI are balanced with the critical thinking, ethical judgment, and nuanced understanding that only human researchers can provide. It represents a practical AI technology strategy for modern qualitative studies.

Evolving Citation Practices for AI-Assisted Research

The increasing use of LLMs in qualitative research necessitates a re-evaluation of traditional citation practices. Transparency and reproducibility become paramount when AI tools contribute to data analysis and interpretation.

As LLMs become integral to qualitative workflows, there is a growing need for clear documentation of prompt design, model configuration, and the role of AI in the analysis process to maintain research integrity and support reproducibility, as highlighted in a systematic review. This means explicitly disclosing LLM involvement and human oversight to preserve scholarly rigor, as noted by Nature.

Guidelines for citing AI contributions:

Explicit Disclosure: Clearly state which LLM was used (e.g., GPT-4, Llama 2), its version, and the specific tasks it performed (e.g., transcription, thematic coding, summarization).
Prompt Documentation: Include the exact prompts used to generate AI outputs, either in the main text, an appendix, or supplementary materials. This allows others to replicate the AI's contribution.
Human Oversight Description: Detail the extent of human review, validation, and intervention applied to AI-generated results. This clarifies the human-AI collaboration.
Ethical Statement: Address how ethical considerations, such as bias mitigation and data privacy, were managed in the context of AI use.
Data Traceability: Explain how AI-generated insights were linked back to the original qualitative data, ensuring transparency in the analytical process.

The trend toward integrating external, up-to-date data sources, such as Microsoft Copilot’s use of live internet data with GPT-4, enables fact-checking and citation, moving beyond static, pre-trained knowledge bases, as discussed by AIMultiple. This practice, while not fully mature, is a focus of current research and will further shape citation norms. Citation practices must evolve to include AI contribution acknowledgments, reflecting that LLMs act as tools or co-analyzers rather than autonomous researchers, ensuring transparency and reproducibility, according to Nature and arXiv.

Blonde woman slumped over desk filled with books, showcasing study fatigue indoors. — Photo by Andrea Piacquadio from Pexels

Case Studies: Real-World Applications

Examining real-world applications demonstrates the practical impact of LLMs on qualitative research across various domains. These examples illustrate how AI technology solutions are being deployed to enhance research efficiency and depth.

One notable application comes from Abertay University, where De Paoli utilized GPT-3.5 for thematic analysis of semi-structured interviews. This work generated coherent user personas, significantly streamlining a complex phase of User-Centered Design processes, as detailed in Sage Journals. This represents a methodological innovation where LLMs contribute creatively beyond mere coding.

Another example involves market research firms deploying LLM-powered chatbots. These firms have scaled qualitative data collection efficiently, automating participant interaction while preserving the richness of responses, as reviewed in ACM proceedings. This approach allows for broader reach and faster data acquisition without sacrificing the depth characteristic of qualitative inquiry.

The JMIR AI study from 2025 provides further evidence, reporting 80% thematic concordance between LLM and human interpretation in analyzing expert nurse forum texts. This illustrates the reliability of LLMs in identifying themes and even uncovering subtle coherence and complementary subthemes that human analysts might overlook. Businesses implementing LLM-augmented analysis gain considerable time savings and scalability while maintaining or enhancing analytic quality, according to Sage Journals and ACM proceedings.

These case studies underscore the practical benefits of integrating LLMs into qualitative research, from automating routine tasks to assisting in complex analytical processes like persona generation and thematic discovery. They highlight the potential for AI technology best practices to drive innovation in research.

Woman feeling stressed while studying in a library, surrounded by books and laptop. — Photo by Ron Lach from Pexels

Future Outlook: Sustainability and Innovation

The rapid evolution of LLMs points to a future where AI plays an even more central role in qualitative research. This future also brings important considerations regarding sustainability and the ongoing need for innovation in AI technology trends.

The volume of AI research has tripled since 2013, with a corresponding surge in computational demands for training and running LLMs. This raises sustainability concerns about energy use and environmental impact, as highlighted by Lumivero. As models grow larger and more complex, their carbon footprint becomes a significant factor researchers and developers must address.

The democratization of tools is a key trend. The proliferation of open-source and affordable AI models means that qualitative researchers outside of Big Tech can now access state-of-the-art tools. However, they must navigate the trade-offs between cost, performance, and transparency, as discussed in Lumivero's blog and arXiv. This accessibility fosters innovation but also requires careful evaluation of each tool's suitability and ethical implications.

Key areas for future innovation and development:

Improved Contextual Understanding: Future LLMs will likely offer more sophisticated contextual understanding, reducing errors and improving the nuance of qualitative analysis.
Enhanced Bias Mitigation: Ongoing research aims to develop more effective methods for identifying and mitigating biases embedded in LLM training data and outputs.
User-Friendly Interfaces: The development of intuitive interfaces will make LLMs more accessible to researchers without extensive technical expertise, broadening their adoption.
Real-time Fact-Checking: As mentioned by AIMultiple, the integration of real-time external data sources will allow LLMs to provide more accurate and up-to-date information, reducing hallucinations.
Ethical AI Frameworks: The development of robust ethical AI frameworks and guidelines will ensure responsible use of LLMs in sensitive research contexts.

The future of qualitative research with LLMs involves a continuous dialogue between technological advancement and ethical considerations. Balancing the power of AI technology strategies with the core values of qualitative inquiry will define its trajectory.

A person writing in a notebook with a red ballpoint pen, showcasing detail and focus. — Photo by lil artsy from Pexels

Conclusion

Large Language Models are rapidly transforming qualitative research, offering unprecedented capabilities for automating and scaling data analysis tasks. From transcription and summarization to thematic coding, conversational AI tools are becoming indispensable, matching or complementing human judgment in many cases, as evidenced by studies showing high agreement rates between LLMs and human coders. The market for these AI technology solutions is expanding rapidly, making advanced tools more accessible and affordable for researchers globally.

This shift, while promising, brings new challenges that demand careful consideration. Ethical concerns regarding bias, data privacy, and the "surrogate effect" require robust frameworks and diligent human oversight. Researchers must also adapt their workflows to a hybrid human-AI model, where AI handles initial processing and theme generation, while human expertise remains critical for nuanced interpretation, validation, and ensuring the ethical integrity of the study. The need for precise prompt engineering and vigilance against AI hallucinations underscores the ongoing importance of human involvement.

The implications for citation practices are profound. Transparency is key, requiring explicit disclosure of the LLM used, its version, the exact prompts, and the extent of human intervention. This ensures reproducibility and maintains scholarly rigor in an era where AI acts as a co-analyzer. As AI technology trends continue to evolve, with advancements in real-time fact-checking and bias mitigation, the qualitative research community must remain proactive in developing best practices and ethical guidelines.

Ultimately, the intersection of qualitative research and LLMs represents a powerful opportunity to expand analytical capacity and deepen insights, provided researchers approach this integration with a critical, ethical, and transparent mindset. This collaborative future promises to enhance the quality and scope of qualitative scholarship, pushing the boundaries of what is possible in understanding human experience.

By Eric Buckley — Published October 31, 2025

FAQs

How do LLMs assist in qualitative data analysis?

LLMs automate tasks like transcription, translation, sentiment analysis, summarization, and thematic coding. This helps researchers process large datasets quickly, identify patterns, and generate initial insights, significantly reducing manual effort in the early stages of analysis.

What are the main benefits of using conversational AI in qualitative research?

The main benefits include increased efficiency in data processing, scalability for large datasets, the ability to uncover subtle themes, and democratized access to advanced analytical tools. LLMs can also accelerate the generation of initial codes and summaries, freeing up human researchers for deeper interpretation.

Why should researchers be cautious when using LLMs for qualitative analysis?

Researchers should be cautious due to potential issues like model hallucinations, inherent biases in training data, contextual limitations, and the need for precise prompt engineering. Human oversight is essential to verify accuracy, ensure ethical data handling, and provide nuanced interpretation.

When should human researchers intervene in AI-assisted qualitative analysis?

Human researchers must intervene at all critical stages: defining research questions, designing prompts, validating AI-generated codes and themes, interpreting findings, and ensuring ethical compliance. They are crucial for addressing AI inaccuracies, biases, and contextual gaps.

How do LLMs impact the reproducibility of qualitative research?

LLMs can enhance reproducibility if researchers clearly document the specific model used, its version, and the exact prompts. This transparency allows other researchers to replicate the AI's contribution. Without such documentation, AI use can hinder reproducibility.

What are the ethical considerations for using LLMs with sensitive qualitative data?

Ethical considerations include obtaining informed consent from participants for AI processing, ensuring data privacy and security, mitigating algorithmic bias, and avoiding the "surrogate effect" where LLMs might replace human interpretation of participant experiences. Researchers must prioritize participant protection.

How should LLM contributions be cited in academic papers?

Citation practices should include explicit disclosure of the LLM model and version, detailed documentation of prompts, and a description of the human oversight applied. This ensures transparency and acknowledges the AI's role as a tool rather than an autonomous researcher.

Can LLMs generate new qualitative insights or only process existing data?

LLMs can identify subtle patterns and subthemes that human coders might miss, effectively generating new insights from existing data. For example, a JMIR AI study found LLMs provided additional depth in thematic analysis. However, human interpretation is still needed to contextualize and validate these insights.

What is the "surrogate effect" in LLM-assisted qualitative research?

The "surrogate effect" refers to the risk of using LLMs as proxies for human participants or their experiences. This raises epistemological questions about whether AI can truly represent human perspectives, potentially eroding the core qualitative value of understanding lived experience directly.

What is the market growth trajectory for LLMs in the coming years?

The global LLM market is projected to grow from $5.6 billion in 2024 to $35.4 billion by 2030, with a CAGR of 36.9%. This indicates significant expansion and increasing adoption across various sectors, including research, making LLMs a growing area of investment and development.

How has the cost of using LLMs changed, and what does this mean for researchers?

The cost of querying LLMs has dropped significantly, with GPT-3.5-level performance costing just a few cents per million tokens by mid-2024, down from $20 in late 2022. This makes advanced qualitative analysis tools more accessible to a broader range of researchers and institutions.

What role does prompt engineering play in LLM-assisted qualitative research?

Prompt engineering is crucial because the quality and relevance of LLM outputs directly depend on the clarity and specificity of the input prompts. Researchers must develop skill in crafting effective prompts to guide the AI towards desired analytical outcomes and minimize irrelevant or inaccurate results.

Are open-source LLMs as effective as proprietary models for qualitative analysis?

The performance gap between proprietary and leading open-source models has nearly closed, with only a 1.7% difference by early 2025. This means open-source LLMs are increasingly viable for qualitative analysis, offering researchers powerful alternatives with potentially greater transparency and customization options.

How can LLMs help in generating user personas from qualitative data?

LLMs can analyze interview transcripts and other qualitative data to identify common characteristics, behaviors, and motivations, then synthesize this information into coherent user personas. This streamlines the persona generation process, as demonstrated by Abertay University's use of GPT-3.5 for User-Centered Design.

What are the sustainability concerns associated with LLM use in research?

The significant computational demands for training and running LLMs raise concerns about energy consumption and environmental impact. As AI research volume triples, the carbon footprint of these models becomes a critical sustainability issue that requires attention from developers and researchers alike.

Win AI Search

Start creating content that not only ranks - but gets referenced by ChatGPT, Perplexity, and other AI tools when people search for your niche.

Try outwrite.ai Free - start getting leads from ChatGPT

No credit card required - just publish smarter.

« Back to Blog