From Structured Data to Forest Plot: Powering Meta-Analysis with AI-Extracted Evidence
Turn AI-extracted systematic review data into paper-ready outputs — forest plots in R, PRISMA tables, subgroup analyses, and a living review via MCP integration.
You’ve screened your papers. You’ve extracted data from each one. Your spreadsheet is full of sample sizes, effect sizes, risk-of-bias ratings, and outcome scores. Now comes the part that justifies all that work: writing the actual systematic review (SR) paper and running the meta-analysis (MA).
But here’s what makes this phase frustrating: a systematic review paper isn’t one big narrative. It’s made up of 15–20 small sections, and each section is essentially a different cross-analysis of the same extraction data. The “Characteristics of Included Studies” table pulls from one set of columns. The forest plot pulls from another. The sensitivity analysis pulls from yet another. In a traditional Excel workflow, every section means building a new pivot table, writing new formulas, or manually re-sorting the same spreadsheet in different ways.
This article shows how Instill AI Collection makes this entire phase faster and less error-prone. If you’re not yet familiar with how Collection works for screening and data extraction, we recommend reading Systematic Review with AI: Screen and Extract Data from Research Papers in Minutes first — this article builds on the structured output from that workflow.
Key Takeaways
- Every section of an SR/MA paper is a cross-column query — the extraction table already contains all the raw material. Writing the paper becomes a matter of querying the right column pairs.
- Export to R in 10 lines of code — CSV export feeds directly into the
metaforpackage for forest plots, funnel plots, and subgroup analyses. - The structured data is reusable — via MCP integration, other AI tools can query your extraction data without manual export. Add a new paper, and the analysis updates automatically.
Every Paper Section Maps to a Column Cross
Here’s something most researchers feel intuitively but rarely see stated explicitly: every section of a systematic review paper is a cross-analysis of two or more columns from the extraction table.
Let’s use a concrete example — a systematic review titled “Effect of Exercise Interventions on Major Depressive Disorder in Adults.” The extraction collection has 23 columns. Each paper section draws from a specific combination:
| Paper Section | Columns Used | Output |
|---|---|---|
| Table 1: Study Characteristics | First Author & Year × Country × Exercise Type × Sample Size × Mean Age × Depression Measure | The standard “characteristics of included studies” table |
| PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) Flow Diagram | Screening Decision × Exclusion Reason | How many papers included/excluded and why |
| Intervention Summary | Exercise Type × Exercise Protocol × Sample Size | ”Studies used aerobic (n=5), resistance training (n=2), yoga (n=2)…” |
| Overall Effect (Forest Plot) | Post Score × SD × Sample Size (intervention + control) | The core meta-analytic result |
| Subgroup: By Exercise Type | Exercise Type × Effect Size | ”Aerobic exercise showed standardized mean difference (SMD) while resistance training showed SMD …” |
| Subgroup: By Outcome Measure | Depression Measure × Effect Size | Do studies using the Hamilton Depression Rating Scale (HAM-D) show different effects than those using the Beck Depression Inventory-II (BDI-II)? |
| Sensitivity: Risk of Bias | Risk of Bias Notes × Effect Size | Does removing high-risk studies change the overall conclusion? |
| Publication Bias (Funnel Plot) | Effect Size × Sample Size | Visual test for publication bias |
| Risk of Bias Summary | Risk of Bias Notes (5 domains) × First Author | The red/yellow/green traffic light table |
| Inter-rater Reliability | Reviewer A columns × Reviewer B columns | Cohen’s for categorical data, intraclass correlation coefficient (ICC) for continuous data |
A typical SR/MA paper contains 15–20 such sections, each one a structured query against the same extraction table.
Why This Matters for Collection Users
In a chat-based workflow, producing each section requires re-reading papers and re-asking questions. In a Collection-based workflow, the data is already there — you just query different column combinations.
For example, three sequential questions to the agent:
Group the included studies by Exercise Type. What’s the total sample size for each group?
This crosses Exercise Type × Sample Size — the raw material for the Intervention Summary section.
For the Aerobic group, which studies have high risk of bias?
This crosses Exercise Type (filtered to Aerobic) × Risk of Bias Notes — the foundation for a sensitivity analysis.
If I exclude those high-risk studies, what do the remaining effect sizes look like?
This is the actual sensitivity analysis — and the data to answer it is already structured in the collection.
Each question takes seconds to answer because the 345 data points are already extracted, structured, and queryable. No re-reading PDFs. No re-asking the AI to parse the same results table again.
From Collection to Forest Plot: The R Pipeline
The forest plot is the signature output of a meta-analysis — the figure that shows each study’s effect size and the pooled overall effect. Generating one requires exactly the columns that Collection extracts.
Step 1: Export Your Collection
Export the consensus extraction collection (the reconciled dataset from both reviewers) as CSV. The file will have columns like:
First_Author_Year, Exercise_Type, Sample_Size_Intervention, Sample_Size_Control,
Post_Score_Intervention, Post_Score_Control, SD_Post_Intervention, SD_Post_Control, ...
Step 2: Run the Meta-Analysis in R
The metafor package is the standard tool for meta-analysis in R. With your
Collection export, the code is remarkably short:
library(metafor)
data <- read.csv("extraction_export.csv")
# Calculate Standardized Mean Differences
data <- escalc(measure = "SMD",
m1i = Post_Score_Intervention,
sd1i = SD_Post_Intervention,
n1i = Sample_Size_Intervention,
m2i = Post_Score_Control,
sd2i = SD_Post_Control,
n2i = Sample_Size_Control,
data = data)
# Random-effects meta-analysis
model <- rma(yi, vi, data = data)
summary(model)
# Forest Plot — the core figure of any meta-analysis
forest(model, slab = data$First_Author_Year)
# Funnel Plot — test for publication bias
funnel(model)
That’s it. Ten lines from CSV to forest plot. The reason it’s so simple is that
Collection already enforces the structure that metafor expects — each column
maps directly to a function parameter.
Step 3: Subgroup and Sensitivity Analyses
Want to see if aerobic exercise works better than yoga? Add one line:
# Subgroup analysis by Exercise Type
model_sub <- rma(yi, vi, mods = ~ Exercise_Type, data = data)
summary(model_sub)
Want to test whether the overall effect holds after removing high-risk studies? Filter and re-run:
# Sensitivity: exclude high-risk studies
low_risk <- subset(data, !grepl("High risk", Risk_of_Bias_Notes))
model_sens <- rma(yi, vi, data = low_risk)
forest(model_sens, slab = low_risk$First_Author_Year)
Each of these analyses maps directly to a section of the final paper. The data pipeline is: Collection → CSV → R → paper figure/table.
The Before and After
Here’s how the full systematic review workflow compares:
| Step | Before (Excel + Covidence) | After (Instill AI Collection) |
|---|---|---|
| Read PDF and copy data to spreadsheet | Manual, ~30 min per paper | AI extracts with citations, ~2 min per paper |
| Trace “where did this number come from?” | Not possible in Excel | Every value links to source paragraph |
| Dual-reviewer extraction | Two separate Excel files, manual comparison | Four Collections, same schema, automated diff |
| Cross-paper analysis | Write Excel formulas or pivot tables | Ask the agent in natural language |
| Prepare data for R | Manually reformat columns and clean data | Export CSV — columns already match metafor parameters |
| Add a new paper to the review | Re-do extraction from scratch | Add one row, autofill runs automatically |
| Generate PRISMA table | Manual formatting | Ask: “Generate a characteristics-of-included-studies table” |
Living Reviews with MCP Integration
Traditional systematic reviews are static — published once, outdated within months as new studies appear. A “living systematic review” aims to keep the evidence current by continuously incorporating new research.
Collection enables this through MCP (Model Context Protocol) integration. Other AI tools — Claude, ChatGPT, Cursor, or custom workflows — can query your Collection data programmatically:
Available via Instill AI MCP:
- query-collection: Retrieve extracted data with filters
- summarize-column: Get statistics per column
- aggregate-by-column: Group by exercise type, compute mean effect sizes
This means your extraction data becomes a live API endpoint. A research assistant using Claude can ask “What’s the current pooled effect size for aerobic exercise interventions?” and get an answer grounded in your structured, citation-backed data — without opening Instill AI or exporting a CSV.
When a new randomized controlled trial (RCT) is published, you add it as a row in the Collection. Autofill extracts the data. The MCP-connected tools immediately see the updated dataset. The analysis stays current without rebuilding anything from scratch.
Putting It All Together
The core insight of this two-part series is that a systematic review is fundamentally a structured data problem. The papers are the raw material. The extraction table is the intermediate representation. The paper sections, forest plots, and statistical analyses are all downstream queries on that table.
Instill AI Collection handles the hard middle step — turning unstructured PDFs into structured, traceable, analysis-ready data — while preserving the methodological rigor that systematic reviews demand. The dual-reviewer architecture, citation audit trails, and structured export aren’t just convenience features. They map directly to PRISMA 2020 and Cochrane Handbook requirements.
Whether you’re running your first systematic review or your twentieth, the workflow is the same: upload papers, define your extraction logic, let AI do the heavy lifting, and focus your expertise where it matters most — interpreting the evidence.
Early Access Offer
The first 30 readers who sign up through the link below will receive a 14-day free trial of Pro Plan — enough to run a full meta-analysis pipeline from Collection export to forest plot.
Get the Collection Example
Want to start with the exact collection schemas, column instructions, and paper list used in this series? We’re happy to share the full template for “Effect of Exercise Interventions on Major Depressive Disorder in Adults” — including the dual-reviewer extraction collections whose CSV export feeds directly into the R pipeline above.
Reach out to us:
- Email: moto.mo@instill-ai.com
- Live chat: Click the black chat icon in the bottom-right corner of this page — let us know you’re interested in the systematic review collection example and leave your contact details.
This article builds on Systematic Review with AI: Screen and Extract Data from Research Papers in Minutes, which covers the screening and data extraction workflow.
Stop re-reading. Start knowing.
Turn scattered documents into structured knowledge — fast. Results in your first session, not your first quarter.