From Cited Evidence Tables to Forest Plots: Meta-Analysis with AI-Extracted Data
Turn citation-backed systematic review extraction data into paper-ready outputs — forest plots in R, PRISMA tables, subgroup analyses, and living-review updates via MCP integration.
TL;DR
Once extraction data is structured, every section of a systematic review or meta-analysis becomes a column query: study characteristics, subgroup analyses, sensitivity checks, PRISMA tables, and forest plots. Instill AI Collection produces a citation-backed evidence table that can be sanity-checked in chat, exported to CSV, and analyzed reproducibly in R with
metafor
. AI accelerates the handoff; R remains the publication-ready analysis layer.
You’ve screened your papers. You’ve extracted data from each one. Your Instill AI Collection is full of sample sizes, effect sizes, risk-of-bias ratings, and outcome scores. Now comes the part that justifies all that work: writing the actual systematic review (SR) paper and running the meta-analysis (MA).
But here’s what makes this phase frustrating: an SR paper isn’t one big narrative. It’s made up of 15–20 small sections, and each section is essentially a different cross-analysis of the same extraction data. The “Characteristics of Included Studies” table pulls from one set of columns. The forest plot pulls from another. The sensitivity analysis pulls from yet another. In a traditional Excel workflow, every section means building a new pivot table, writing new formulas, or manually re-sorting the same spreadsheet in different ways.
This article shows how Instill AI Collection makes this entire phase faster and less error-prone. If you’re not yet familiar with how Collection works for screening and data extraction, we recommend reading
Systematic Review with AI: Screen and Extract Data from Research Papers in Minutes
first — this article builds on the structured output from that workflow.
Key Takeaways
- Every section of an SR/MA paper is a cross-column query — the extraction table already contains all the raw material. Writing the paper becomes a matter of querying the right column pairs.
- Preview forest plots in chat, publish from R — the agent generates an
inline preview for quick sanity-checking, then produces a reproducible
metaforscript for the publication-ready version. - Minimal R pipeline — ~10 lines from CSV to a first forest plot — CSV
export feeds directly into
metafor(version 4.8+). Your final analysis may still require model choices, sensitivity checks, and reviewer judgment. - The structured data is reusable — via Model Context Protocol (MCP) integration, other AI tools can query your extraction data without manual export. Add a new paper, and the analysis updates automatically.
- Human-verified data, AI-accelerated analysis — the extraction table includes both AI-filled columns and human-judgment columns (Reviewer Confidence, Notes), ensuring the data feeding your meta-analysis has been human-reviewed.
Every Paper Section Maps to a Column Cross
Here’s something most researchers feel intuitively but rarely see stated explicitly: every section of a systematic review paper is a cross-analysis of two or more columns from the extraction table.
Let’s use a concrete example — a systematic review titled “Effect of Exercise Interventions on Major Depressive Disorder in Adults.” The extraction collection has 23 columns. Each paper section draws from a specific combination:
| Paper Section | Columns Used | Output |
|---|---|---|
| Table 1: Study Characteristics | First Author & Year × Country × Exercise Type × Sample Size × Mean Age × Depression Measure | The standard “characteristics of included studies” table |
| Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Flow Diagram | Screening Decision × Exclusion Reason | How many papers included/excluded and why |
| Intervention Summary | Exercise Type × Exercise Protocol × Sample Size | Studies used aerobic (n=5), resistance training (n=2), yoga (n=2)… |
| Overall Effect (Forest Plot) | Post Score × Standard Deviation (SD) × Sample Size (intervention + control) | The core meta-analytic result |
| Subgroup: By Exercise Type | Exercise Type × Effect Size | Aerobic exercise showed Standardized Mean Difference (SMD) = −0.72 while resistance training showed SMD = −0.55… |
| Subgroup: By Outcome Measure | Depression Measure × Effect Size | Do studies using Hamilton Depression Rating Scale (HAM-D) show different effects than those using Beck Depression Inventory-II (BDI-II)? |
| Sensitivity: Risk of Bias | Risk of Bias Notes × Effect Size | Does removing high-risk studies change the overall conclusion? |
| Publication Bias (Funnel Plot) | Effect Size × Sample Size | Visual test for publication bias |
| Risk of Bias Summary | Risk of Bias Notes (5 domains) × First Author | The red/yellow/green traffic light table |
| Inter-rater Reliability | Reviewer A columns × Reviewer B columns | Cohen’s κ for categorical data, Intraclass Correlation Coefficient (ICC) for continuous data |
A typical SR/MA paper — following the
PRISMA 2020
27-item checklist (Page et al., 2021, doi:
10.1136/bmj.n71
) — contains 15–20 such sections, each one a structured query against the same extraction table.
Why This Matters for Collection Users
In a chat-based workflow, producing each section requires re-reading papers and re-asking questions. In a Collection-based workflow, the data is already there — you just query different column combinations.
For example, three sequential questions to the agent:
Group the included studies by Exercise Type. What’s the total sample size for each group?
This crosses Exercise Type × Sample Size — the raw material for the Intervention Summary section.
For the Aerobic group, which studies have high risk of bias?
This crosses Exercise Type (filtered to Aerobic) × Risk of Bias Notes — the foundation for a sensitivity analysis.
If I exclude those high-risk studies, what do the remaining effect sizes look like?
This is the actual sensitivity analysis — and the data to answer it is already structured in the collection.
Each question takes seconds to answer because the 345 data points are already extracted, structured, and queryable. No re-reading PDFs. No re-asking the AI to parse the same results table again.
From Collection to Forest Plot: The R Pipeline
The forest plot is the signature output of a meta-analysis — the figure that shows each study’s effect size and the pooled overall effect. Generating one requires exactly the columns that Collection extracts.
Step 1: Export Your Collection
Export the consensus extraction collection (the reconciled dataset from both reviewers) as CSV. The file will have columns like:
First_Author_Year, Exercise_Type, Sample_Size_Intervention, Sample_Size_Control,
Post_Score_Intervention, Post_Score_Control, SD_Post_Intervention, SD_Post_Control, ...
Step 2: Sanity-Check Your Data in Chat
Before opening RStudio, you can validate your extraction data without leaving the conversation. Mention your collection and ask the agent to analyze it — for example, checking data completeness, computing summary statistics, or previewing effect sizes.
The agent loads the collection as a dataframe, runs Python code on the spot, and reports results inline. In the demo below, we ask the agent to inspect the six columns required for meta-analysis and count how many studies have complete numeric data:
This kind of quick validation catches extraction errors early — missing values, text flags where numbers should be, or mismatched column names — before you commit to the formal analysis.
You can go further: ask the agent to compute SMDs, generate a preview forest
plot, or produce a reproducible R script using metafor. The same collection
data feeds every request. Think of the chat as a scratchpad for exploration —
fast, interactive, and disposable.
Step 3: Run the Definitive Meta-Analysis in R
For the publication-ready analysis — the one reviewers and editors will
scrutinize — use metafor in R. The metafor package (Viechtbauer, 2010)
supports random-effects models, multilevel meta-analyses, meta-regression, and
over 20 visualization types including forest plots, funnel plots, and bubble
plots. The same CSV export feeds both the in-chat preview and the formal
pipeline. The code is remarkably short:
library(metafor)
data <- read.csv("extraction_export.csv")
# Calculate Standardized Mean Differences
data <- escalc(measure = "SMD",
m1i = Post_Score_Intervention,
sd1i = SD_Post_Intervention,
n1i = Sample_Size_Intervention,
m2i = Post_Score_Control,
sd2i = SD_Post_Control,
n2i = Sample_Size_Control,
data = data)
# Random-effects meta-analysis
model <- rma(yi, vi, data = data)
summary(model)
# Forest Plot — the core figure of any meta-analysis
forest(model, slab = data$First_Author_Year)
# Funnel Plot — test for publication bias
funnel(model)
The minimal pipeline is short — about 10 lines from CSV to a first forest plot.
The reason it’s so simple is that Collection already enforces the structure that
metafor expects — each column maps directly to a function parameter. Your
final analysis may still require model choices, sensitivity checks, and reviewer
judgment.
The in-chat preview is for exploration and sanity-checking. The reproducible, publication-ready analysis lives in your R environment. This is by design: AI accelerates your workflow; it doesn’t replace the peer-reviewed statistical tooling your field already trusts.
Step 4: Subgroup and Sensitivity Analyses
Want to see if aerobic exercise works better than yoga? Add one line:
# Subgroup analysis by Exercise Type
model_sub <- rma(yi, vi, mods = ~ Exercise_Type, data = data)
summary(model_sub)
Want to test whether the overall effect holds after removing high-risk studies? Filter and re-run:
# Sensitivity: exclude high-risk studies
low_risk <- subset(data, !grepl("High risk", Risk_of_Bias_Notes))
model_sens <- rma(yi, vi, data = low_risk)
forest(model_sens, slab = low_risk$First_Author_Year)
Each of these analyses maps directly to a section of the final paper. The data pipeline is: Collection → CSV → R → paper figure/table. The in-chat preview lets you iterate quickly on which columns and filters to use; the R script is the audit-ready artifact you submit.
The Before and After
Here’s how the full systematic review workflow compares:
| Step | Before (Excel + Covidence) | After (Instill AI Collection) |
|---|---|---|
| Read PDF and copy data to spreadsheet | Manual, ~30 min per paper | AI generates cited first-pass extraction; reviewers verify |
| Trace “where did this number come from?” | Not possible in Excel | Every value links to source paragraph |
| Dual-reviewer extraction | Two separate Excel files, manual comparison | Four Collections, same schema, automated diff |
| Cross-paper analysis | Write Excel formulas or pivot tables | Ask the agent in natural language |
| Prepare data for R | Manually reformat columns and clean data | Export CSV — columns already match metafor parameters |
| Add a new paper to the review | Re-do extraction from scratch | Add one row, autofill runs automatically |
| Generate PRISMA table | Manual formatting | Ask: “Generate a characteristics-of-included-studies table” |
Living Reviews with MCP Integration
Traditional systematic reviews are static — published once, outdated within months as new studies appear. A “living systematic review” aims to keep the evidence current by continuously incorporating new research.
Instill AI Collection enables this through
MCP
integration. Other AI tools — Claude, ChatGPT, Cursor, or custom workflows — can query your Collection data programmatically:
Available via Instill AI MCP:
- query-collection: Retrieve extracted data with filters
- summarize-column: Get statistics per column
- aggregate-by-column: Group by exercise type, compute mean effect sizes
This means your extraction data becomes a live API endpoint. A research assistant using Claude can ask “What’s the current pooled effect size for aerobic exercise interventions?” and get an answer grounded in your structured, citation-backed data — without opening Instill AI or exporting a CSV.
When a new randomized controlled trial (RCT) is published, you add it as a row in the Collection. Autofill extracts the data. You verify the AI output and fill the human-judgment columns (Reviewer Confidence, Notes). The MCP-connected tools immediately see the updated dataset. The analysis stays current without rebuilding anything from scratch.
FAQ
What are the best tools for meta-analysis?
For the statistical analysis itself, metafor (R), Stata’s metan, and RevMan
are the established tools. The bottleneck is getting data into those tools:
extracting numbers from PDFs into a structured format. Instill AI Collection
handles this extraction step — producing CSV output that feeds directly into
metafor with no reformatting.
How do I create a forest plot in R?
Export your Instill AI Collection as CSV, then use the metafor package:
escalc() to compute standardized mean differences, rma() for the
random-effects model, and forest() to render the plot. The minimal pipeline is
about 10 lines of R code — see the Step 3 section above for the complete
baseline script.
How do I use metafor for forest plots?
Install with install.packages("metafor"), load your CSV with read.csv(),
compute effect sizes with escalc(measure="SMD", ...), fit the model with
rma(yi, vi, data=data), and call forest(model, slab=data$Study_Label). The
key requirement is that your data has separate columns for means, SDs, and
sample sizes for each group — which is exactly what Collection’s extraction
columns produce.
What is the difference between a systematic review and a meta-analysis?
A systematic review is the complete research process: defining a question, searching databases, screening papers, extracting data, and synthesizing findings. A meta-analysis is the statistical component — pooling effect sizes across studies to compute an overall estimate. Not all systematic reviews include a meta-analysis (some are qualitative), but all meta-analyses require a systematic review as their foundation. Instill AI Collection supports both: the extraction workflow produces the data, and the CSV export feeds the statistical analysis.
Can I update a meta-analysis when new studies are published?
Yes — this is what “living systematic reviews” are for. With Instill AI Collection, you add a new paper as a row, let autofill extract the data, verify the output, and the updated dataset is immediately available via MCP to any connected tool. Re-run your R script on the new CSV export to get an updated forest plot.
Try It Yourself
We’ve built out the full data extraction collections from the exercise-and-depression example — the same collections whose CSV export feeds the R pipeline above. Instill AI is currently in closed beta, so email hello@instill-ai.com for a guided walkthrough of these collections.
Compare Reviewer A and B side by side — the column schemas are identical, but open any column’s property panel and you’ll see different extraction instructions. For example, when a paper reports only confidence intervals instead of standard deviations, Reviewer A flags “CI reported: [12.3, 18.7]” while Reviewer B computes the SD from the CI. This controlled disagreement is how Collection operationalises the Cochrane dual-reviewer requirement.
You can:
- Browse the extraction schema and chat with the Collection — ask questions like “which studies have the largest effect sizes?” or “compute the pooled SMD for aerobic studies” and see structured, cited answers.
- Clone it into your workspace — customise the extraction instructions for your own meta-analysis topic, add columns for your discipline’s specific outcome measures, and upload your own papers.
- Export as CSV — download the structured data and run the R pipeline above in your own environment.
Questions?
- Try the product directly — email us with your extraction protocol and we’ll set up a closed-beta workspace with a collection schema designed for your topic on the spot.
- Need more help? moto.mo@instill-ai.com or click the chat icon in the bottom-right corner of this page.
The collections in this article use freely available open-access research papers selected to simulate a real systematic review workflow. Some columns (Reviewer Confidence, Notes) are filled by human reviewers after AI extraction — not all columns are AI-autofilled. This hybrid design reflects how Collection works in practice: AI handles extraction, humans handle judgment.
_This article builds on
Systematic Review with AI: Screen and Extract Data from Research Papers in Minutes
, which covers the screening and data extraction workflow._
Stop re-reading. Start knowing.
Turn scattered documents into structured knowledge — fast. Results in your first session, not your first quarter.