Back to the blog

How to Extract Passport Data Automatically (OCR + MRZ)

A complete guide to extracting data from passports of any country with OCR: what the MRZ is, how ICAO 9303 check digits work, and how to integrate it via API.

Extract Passport Data
extract passport datapassport OCRMRZICAO 9303passport API

If your business handles passports — for identity verification, KYC, guest check-in, hiring foreign employees, or immigration paperwork — typing the data in by hand is slow and error-prone. A single mistyped digit in a passport number can invalidate an entire file.

The good news: of all identity documents, the passport is the easiest to read automatically with verifiable accuracy. The reason is the MRZ.

What is the MRZ?

The MRZ (Machine Readable Zone) is the two 44-character lines printed at the bottom of every passport's data page:

P<MEXGOMEZ<VELAZQUEZ<<MARGARITA<<<<<<<<<<<<<
G123456786MEX8007050F3307054<<<<<<<<<<<<<<08

Its format is defined by the international ICAO 9303 standard and is identical in every country: the United States, the United Kingdom, Spain, Mexico, China… The first line carries the document type, the issuing state, and the holder's names; the second carries the passport number, nationality, date of birth, sex, expiry date, and the personal number.

Check digits: a reading with mathematical proof

What makes the MRZ special is that it verifies itself. The second line includes four check digits computed with a public algorithm (7-3-1 weights over each character's value):

  1. Passport-number check digit
  2. Date-of-birth check digit
  3. Expiry-date check digit
  4. A composite digit over all of the above

If the OCR confuses a 0 with an O or a 5 with an S, the digits stop adding up and the error is caught before the data reaches you. No other field on an identity document offers that guarantee.

Our API validates all four digits on every extraction. If the image doesn't allow a reading that passes validation, the token is automatically refunded instead of delivering questionable data.

What data is extracted

From the passport data page the API returns, as JSON:

  • passportNumber — the passport number
  • surname and givenNames — surnames and given names
  • nationality and issuingCountry — nationality and issuing state
  • dateOfBirth, dateOfIssue, dateOfExpiry — dates in DD/MM/YYYY format
  • sex, placeOfBirth, issuingAuthority
  • personalNumber — when present (e.g. the CURP on Mexican passports)
  • mrzLine1 and mrzLine2 — the complete, validated MRZ

How to integrate it into your application

Integration is a single HTTP request with the image of the data page:

curl -X POST https://extractpassportdata.com/api/v1/extract \
  -H "X-API-Key: pass_your_api_key" \
  -F "image_front=@./passport.jpg"

The response arrives in a second or two with every field structured. We support JPEG, PNG, and WebP images up to 10MB, sent as a file (multipart), base64, URL, or raw binary.

Tips for the best accuracy

  1. Photograph the whole page, including the two MRZ lines at the bottom. No MRZ, no validation.
  2. Avoid glare on the lamination — it's the #1 cause of unreadable characters. Tilt the document slightly or turn off the flash.
  3. Minimum recommended resolution: 1200 pixels wide. MRZ characters are small.
  4. Keep it straight: the MRZ reads best without heavy rotation.

Start for free

When you create an account you get 20 free extractions to try the service with your own documents, no card required. The API documentation includes examples in JavaScript, Python, and PHP.

Need to extract passport data automatically?

Try our API with 20 free extractions. Integrate in minutes, get results in seconds.

Start for free