Analyzing Risks in Voluntary Forest Carbon Offsets Using Open Data
A Systematic Evaluation Framework Integrating Retrieval-Augmented Generation in LLMs and Geospatial Analytics
Presented by Becky Xu | May 2025
Overview
The voluntary carbon market (VCM) faces challenges in assessing the additionality and leakage accuracy of forestry carbon offset projects.
Varying methodologies, policy changes, and financial incentives create transparency issues and risk greenwashing.
To help investors navigate the forestry carbon market opportunities, I built this application leveraging open datasets, remote sensing, and large language models (LLMs) to support early-stage evaluation of forestry projects, focusing on additionality testing and leakage measurement.
Background: What is VCM and Carbon Credits?
What is the Voluntary Carbon Market (VCM)?
  • The VCM allows companies and individuals to offset their carbon emissions by purchasing carbon credits.
  • Unlike compliance markets (e.g., EU ETS), participation is optional and no legal bond.
  • One credit = 1 tonne of COâ‚‚ reduced or removed.
  • Used to meet sustainability goals, achieve "net zero," or support climate action.
What is a Forestry Carbon Offset?
  • Forestry carbon offsets are credits generated from forest-based activities that reduce or remove COâ‚‚.
  • Projects must demonstrate additionality, permanence, and low leakage.
Types of Forestry Carbon Offsets
  • Afforestation/Reforestation (A/R): Planting trees on land that was not previously forested.
  • Avoided Deforestation (REDD+): Protecting forests at risk of being cleared.
  • Improved Forest Management (IFM): Changing logging practices to store more carbon.
  • Agroforestry: Integrating trees into farmland for carbon and co-benefits.
Background: Key Concepts
Baseline:
What would emissions be without the project?
Additionality:
Would the project happen without carbon finance?
Permanence:
Would reduced emission stay there?
Leakage:
Do emissions just shift elsewhere?
Background: Lifecycle of a Forestry Carbon Offset Project
1. Feasibility & Design: Identify project area, assess carbon potential, and engage local stakeholders.
2. Documentation: Draft Project Design Document (PDD) outlining methods, baseline, and risks.
PDDs are key sources of data for this project!
3. Validation & Registration: Third-party audit and listing with a carbon registry (e.g., Verra, Gold Standard).
4. Implementation: Execute forestry actions like planting or forest protection.
5. Monitoring & Verification: Collect data, undergo external audits to verify emission reductions.
6. Credit Issuance & Sale: Verified credits are issued, marketed, and sold to buyers.
7. Retirement & Reporting: Credits are retired, and ongoing reporting ensures long-term integrity.
Background: Credits Issuance by Country
According to data collected by Berkeley Carbon Trading Project, U.S. is the leading forestry carbon credit issuer country, followed by Indonesia.
While the U.S. has abundant of public data (such as LiDAR dataset on Google Earth Engine) to conduct independent research regarding the project, it is much challenging to do the same for projects outside of the U.S. given very limited public data access.
Interested in addressing this data gap challenge, I conducted this project / application focus on forestry carbon credit project in Indonesia.
Data Source: Berkeley Carbon Trading Project
Motivation
The VCM faces credibility challenges due to no legal bond and no/weak oversight
  • As a global market, rules differ by country/state, creating inconsistencies and loophole exploitation opportunities.
  • High-profile scandals have exposed serious flaws in project claims and methodologies.
Forestry & land-use offsets are popular for their low cost. But it is also controversial due to loopholes.
  • Verra, one of the largest VCM seller, was in serious scandal for selling with loopholes
  • Microsoft recently entered into a landmark agreement to purchase 8 million nature-based carbon removal credits (afforestation project), largest transaction of its kind.
Individual buyers often find it confusing and difficult to assess credit quality.
  • High risk of greenwashing
  • No systematic method for evaluation except for paying independent consultancy.
  • Lack of standardization and little transparency in project data

How can investors trust the claimed carbon savings from forestry VCM credits?
Challenges: Lack of Standardizations In VCM
It is difficult to evaluate the quality of carbon credits due to the following "loopholes" on Baseline, Additionality, and Leakage
Baseline Complexity
Determining what would happen without the project is difficult due to changing policies and local conditions. Baselines must reflect realistic deforestation risks, which vary by area.
Scale and Standards
Hundreds of projects (each has hundreds of pages of documentations to read through) with diverse methodologies create inconsistencies and make manual evaluation impractical for buyers.
Incentives to Overstate
Developers may inflate baseline deforestation to maximize credits, undermining market credibility without strict oversight.
Evolving Policy Standards
Although not currently required by most carbon standards, baseline measurements may need to change over time in response to evolving policies and regulations to meet regulatory surplus requirement
Leakage
Activity-shifting, market, and geographic leakage can displace emissions, reducing net climate benefits.
Current Strategies For Accessing VCM Credit Quality
Quality Assurance Tools From Existing Frameworks on Baseline, Additionality, and Leakage
Baseline
  • Historical Data and Reference Plots
  • Remote Sensing & GIS Monitoring
  • Risk Identification and Buffer Pools
  • Community Engagement and Legal Protections
Additionality
  • Regulatory Surplus Test
  • Alternatives and Barrier Analysis
  • Financial Feasibility Analysis
  • Common-Practice Analysis
Leakage
  • Leakage Identification and Risk Assessment
  • Quantitative Leakage Modeling Spatial
  • Leakage Mitigation Planning
  • Leakage Monitoring and Verification

Three Main Types of Tools:
(1) Project Design (2) Finance & FinTech (3) Data & Tech
Emerging Solutions for VCM
Technologies that Help Improving Credit Quality and Trust
Project Design
Mitigation Strategies
Buffer zones, alternative livelihoods, and jurisdictional accounting enhance leakage management and transparency.
Finance & FinTech
Blockchain traceability systems
Records project data (emission reductions, forest cover changes) and credit transactions on an immutable ledger.
Data & Tech
Monitoring Technologies
Remote sensing and AI risk modeling improve detection and prediction of leakage around project areas.
Research Question
How can we leverage open data, LLMs, and geospatial tools to provide easy access to early-stage evaluation of forestry carbon offset projects?
Focus on additionality (regulatory surplus), baseline evaluation, and leakage monitoring.
System Architecture
Data Pipelines: GIS & LLM
Part A: GIS Analysis Pipeline
Processes spatial data to create buffer zones and analyze forest loss trends for leakage and land-use monitoring.
Part B: LLM Pipeline
Automates ingestion and semantic indexing of project documents using embeddings and vector databases for AI-driven risk assessments.
Data Sources
Text: Verra Registry, VCM Tools pdfs, Government Website, Berkley Carbon Offset Database
GIS: Global Forest Watch, Dynamic World
Data pipelines are fully automated from ingestion to dashboard integration.
Data Pipelines: Part B - GIS
"Donut" Buffer Creation
  • Reads project boundaries from KML or shapefiles using GeoPandas.
  • Creates a 10 km wide ring around each project ("donut zone").
  • Enables comparison inside vs. outside project boundary.
Geospatial Analytics from GEE
  • Computes yearly forest loss in the donut zone using GFW data (2001–2023, 10 m resolution).
  • Computes monthly land use change in the donut zone using GFW data (2017–2025, 30 m resolution).
  • Extracts monthly land use classification to track land cover change.
  • Supports spatial verification of additionality and leakage risks.
Output & Automation
  • Forest loss and land use trends are exported as structured CSV and GeoTIFF files.
  • Processed layers are visualized in the dashboard alongside project boundaries.
  • Results are uploaded to the cloud database.
  • Data then get feed into LLM for analysis along with text data
Data Pipelines: Part B - LLM
Document Ingestion & Preprocessing
  • Loads PDDs, VCM framework files, country's forestry policy and regulations with Docling readers on LlamaIndex.
  • Splits documents into semantic units (nodes) using SentenceSplitter and add Metadata.
Embedding & Vector Indexing
  • Each node is converted to a high-dimensional vector (via Gemini).
  • Vectors are stored in ChromaDB for similarity-based search.
Retrieval-Augmented Generation (RAG)
  • Given a user query (prompt), most relevant nodes are retrieved from PDD, frameworks, and policy/regulation files.
  • Gemini LLM generates grounded responses using the retrieved context.
Multi-Step Workflows & Prompt Engineering
  • Custom prompt templates ensure structured LLM outputs: risk categories, policy compliance, recommendations.
  • Perform retrieval, reformulate queries, generate prompts, summarize results from multiple steps and parse results (JSON).
Output & Automation
  • Outputs include risk metrics on Additionality, Permanence, Leakage, etc.
  • Analysis outputs are uploaded to a cloud relational database (Supabase).
Application Features
Project Dashboard
Searchable project overview with metadata and location details.
Project Design Risk Analysis
Visual gauges and tables display risk scores for additionality, permanence, leakage, and monitoring.
Policy & Regulatory Surplus Analysis
Summaries of alignment with national forestry policies and related news context.
Geospatial Visualization
Maps and charts show project boundaries, buffer zones, and forest loss and land use change trends over time.
User Journey and Experience
1
Project Selection
Users choose a project to view detailed metadata and location.
2
Risk Assessment
Review AI-generated risk scores and recommendations.
3
Policy Analysis
Explore policy alignment and related news insights.
4
Spatial Insights
Visualize forest loss and leakage zones on interactive maps.
Demo Time!
End of Demo!
Purpose of This VCM Analysis Platform
Enhanced Credibility
Open data and AI-driven analysis improve trust in forestry carbon offsets.
Informed Decision-Making
Stakeholders access clear risk metrics and spatial data for better evaluations.
Greenwashing Prevention
Early-stage scrutiny reduces exaggerated claims and promotes genuine climate benefits.
Limitations
Scope Limitations
Current focus only contains analysis on policy regulatory surplus, generic project design overview, and deforestation trend.
Project finance is another important part to analyze
Data Challenges
Currently only have a dozen of policy documents (though 1000+ pages), and project data are only using those from PDD and Berkeley.
The GIS analysis is also reliant on data products from Dynamic World and Global Forest Watch
LLM Model & Evaluation Challenges
Embedding accuracy and semantic retrieval require ongoing validation to avoid bias. AI output may oversimplify, hence need more evaluation for prompt engineering.
Deployment of AI Agents
Began exploring AI agent for analysis refinement, there are some promising progress. But the cost and time is quite significant and thus decided to not move forward.
Future Directions
User Testing Feedback Integration
  • More interaction between the data visualization and map.
  • Reduce the overall amount of text in dashboard and support report generation
  • Complex data may overwhelm non-technical users; training and support are essential.
Evaluation Results Comparison
  • Compare all projects in the database to visualize which project is more trust worthy.
  • Overview dashboard view for example
Incorporate More Data
  • Create a deforestation model specific to the project regions.
  • For project with more text data beyond PDD, incorporate them into analysis.
Continuous Updates
  • Real-time policy and geospatial data must be regularly updated to reflect policy and environmental changes.
Appendix
Appendix: Tools for additionality & baseline measurement
Appendix: Tools for leakage prevention
Made with