The Problem

Across East Africa, NGOs generate thousands of pages of programme reports every year. Field officers write them. Programme managers read them. Donors demand summaries from them. But almost none of this data is structured it lives in PDFs, locked away from analysis.

Our Approach

We built a three-stage pipeline using LangChain and OpenAI's API:

1. Ingestion: PDF reports are uploaded to a FastAPI endpoint, split into overlapping chunks, and embedded into a vector store. 2. Extraction: A structured extraction chain pulls key metrics (beneficiary counts, activity completion rates, geographic coverage) into a validated Pydantic schema. 3. Generation: A final chain drafts donor narrative sections based on the extracted data, with citations back to the source document.

Results

In our pilot with a Kigali-based programme monitoring office, the pipeline reduced manual report processing time by 82%. Programme officers now spend time reviewing and refining drafts rather than writing from scratch.

Key Lessons

Schema design is everything. The quality of your extracted data depends on how precisely you define what you're looking for. We iterated the schema six times before it was reliable.
Human review is non-negotiable. The pipeline generates drafts, not final outputs. We built the review interface before the extraction pipeline.
Chunking strategy matters more than model choice. Using 800-token chunks with 150-token overlap gave us far better extraction accuracy than the defaults.

Building AI Pipelines for NGOs: From PDF Chaos to Structured Intelligence

The Problem

Our Approach

Results

Key Lessons

What are you
building?

Building AI Pipelines for NGOs: From PDF Chaos to Structured Intelligence

The Problem

Our Approach

Results

Key Lessons

What are youbuilding?

What are you
building?