From Weeks to Minutes: How E-Valuate AI Evaluates Handwritten Answer Sheets at Scale

We are teachers. Every examination season meant evenings, weekends, and sometimes entire holidays spent evaluating stacks of handwritten answer sheets. That experience — evaluating stacks of 50 to 100 handwritten papers, every exam season — became the starting point for E-Valuate AI.
This isn't our problem. It's a global one. Handwritten exams remain the standard in most of Asia, Africa, and large parts of Europe and Latin America. The bottleneck is universal: teachers who care about giving detailed feedback simply don't have the time to do it at scale.
That's what led us to build E-Valuate AI: an AI-powered platform that evaluates handwritten answer sheets, generates annotated feedback PDFs, and produces class-level analytics — all in a fraction of the time manual evaluation takes.
This post covers what we built, the design decisions behind it, and the Google Cloud AI stack that powers it.
The Design Principles We Started With
Before writing a single line of code, we knew three things had to be true about the product:
"The rubric had to be teacher-controlled, the feedback had to be specific, and the output had to be something a student could actually use." — Abhijit Kadam & Monika Kadam, Co-Founders, E-Valuate AI
These weren't product requirements from a spec document. They came from watching evaluation go wrong for years — marks given without context, feedback that said "improve your answer" without saying how, and grading inconsistencies across batches that students couldn't appeal because no one had documented the standard.
Every technical decision in E-Valuate AI traces back to one of these three principles.
What E-Valuate AI Does
E-Valuate AI is a web-based SaaS platform. Teachers upload scanned or photographed answer sheets as PDFs, define evaluation rubrics, and receive back structured evaluation results — without replacing their judgment in the process.
Here's the end-to-end flow:
1. Exam setup — The teacher creates an exam on the platform, uploads the question paper and model answer. AI then defines the rubric for each question: what a full-marks answer looks like, what earns partial credit, and what the key concepts are that should be present. The teacher remains in control, reviews the rubric, and can modify it before approval.
2. Submission — Teachers can upload students' answer sheets via the examiner dashboard (web), or students can upload their own answer sheets using a companion Android app. All channels feed into the same evaluation pipeline.
3. Evaluation — The platform processes each uploaded answer sheet through a multi-stage AI pipeline: Google Cloud Vision API extracts handwritten text and layout information from answer sheets. Additional processing stages reconstruct question boundaries, tables, diagrams, and answer structure before evaluation, then apply rubric-based evaluation using Vertex AI.
4. Annotated PDF output — Each evaluated sheet is returned as an annotated PDF, with marks and written remarks placed directly on the paper — the way a teacher would mark it by hand.
5. Analytics — Across all submissions, teachers and institute administrators see question-wise performance, class averages, concept-level weak spots, and student-level patterns.
The Google Cloud Stack
E-Valuate AI is built entirely on Google Cloud. Here's what powers each layer of the product:
Vertex AI — the evaluation engine
The core of the platform. Vertex AI reads extracted handwritten text, tables, diagrams, graphs, and answer structure, and interprets it in the context of the question and rubric, assigns marks, and generates written feedback remarks.
What makes Vertex AI the right choice here isn't just accuracy — it's the ability to reason about partial answers. A student might write an answer that's conceptually correct but poorly structured, or partially right with a key term missing. Vertex AI evaluates these the way a trained teacher would: not as binary correct/incorrect, but with nuanced judgment that reflects the rubric. That's what produces feedback a student can actually learn from.
Google Cloud Vision API — the OCR layer
Before evaluation can begin, the platform needs to read the handwritten text and understand the tables, diagrams, graphs, and answer structure from uploaded PDFs. Google Cloud Vision API handles this: it extracts text and understands the layout from each answer sheet and spatially maps it — preserving the structure of the student's response (question-wise sections, numbering, diagram labels) so that Vertex AI evaluates the right content against the right rubric item.
Handwriting quality varies enormously in real exam conditions. Vision API handles this well across a range of handwriting styles, which was a hard requirement for an Indian classroom context.
Cloud Run & Cloud Functions — the evaluation pipeline
The evaluation pipeline runs as a series of serverless workers deployed on Google Cloud Run and Cloud Functions. The pipeline stages include: evaluation submission, OCR dispatch, rubric-based evaluation, PDF annotation generation, result finalisation, and analytics computation.
This architecture means the platform scales automatically. A teacher uploading 10 sheets and an institute uploading 500 sheets hit the same pipeline — the infrastructure adjusts without manual intervention. Each worker is independently deployable and observable, which has been important for debugging evaluation quality during beta.
Google Cloud Storage — document and output storage
Every uploaded answer sheet, every generated annotated PDF, and every evaluation output is stored in Google Cloud Storage. GCS handles the handoff between pipeline stages: workers read from and write to buckets at each step, with Firestore tracking pipeline state.
Firebase Hosting & Firestore — the application layer
The user-facing dashboards are served via Firebase Hosting. Firestore powers real-time state — teachers see evaluation progress update live as the pipeline processes their batch, without polling.
Technical Decisions Worth Noting
Rubric locking — Once an exam enters evaluation, the rubric is locked. This prevents inconsistency mid-batch and ensures every student in a batch is evaluated against the same standard. It's a product decision as much as a technical one — without it, a teacher modifying the rubric halfway through a batch would produce results that aren't comparable across students.
Annotated PDF as the primary output — We deliberately chose to return marks and remarks on the original paper rather than in a separate report. Students and teachers are used to seeing feedback directly on the answer sheet. Changing that format would have created adoption friction — the output needed to feel familiar, not like a new workflow.
Serverless pipeline over a monolith — Breaking evaluation into discrete worker stages made it easier to improve individual steps (OCR quality, rubric interpretation, annotation placement) without touching the rest of the pipeline. It also made failures easier to isolate and retry, which matters when you're processing hundreds of sheets in a single batch.
Where It Stands Today
E-Valuate AI is live in beta, with early users and institutions in India already evaluating real exam papers on the platform. Tasks that previously required weeks can now be completed in a fraction of that time — with more structured, consistent feedback than a manual process typically produces.
We have already processed 2,000+ answer sheets, 100+ exams, and 200+ students.
We're focused on India for now, where the scale is enormous: millions of handwritten exams annually, tens of thousands of teachers, and a persistent shortage of time for the analysis that actually improves learning outcomes. But the problem is the same in classrooms from Jakarta to Nairobi to São Paulo — and the solution scales.
What We'd Tell Other Builders
One of us is a teacher. Neither of us came from a software engineering background. E-Valuate AI was built using AI as a learning and building partner — iterating against real evaluation problems, testing in actual classrooms, and improving based on what teachers told us wasn't working.
The Google Cloud AI stack — particularly Vertex AI — made it possible to build something that genuinely works at the hardest part of the problem: reasoning about handwritten, partial, imperfect answers the way a skilled teacher does.
If you're sitting on a problem your community lives with every day, the tools to build a real solution are genuinely accessible now. The domain knowledge you already have is the hard part — and that's not something a CS degree gives you.
E-Valuate AI is currently in beta. Teachers and institutes in India can join the early access program at evaluate-ai.app.
— Abhijit Kadam & Monika Kadam, Co-Founders, E-Valuate AI