CSV Encoding Explained: UTF-8 vs Windows-1252 vs ISO-8859-1 (2026)
Garbled characters in CSV—ä instead of ä, é instead of é—usually mean an encoding mismatch. This guide explains UTF-8, Windows-1252, and ISO-8859-1, how to detect the encoding of a file, and how to fix it in Excel, Python, and JavaScript.
Table of Contents
CSV files are just text. The bytes that represent "Müller" or "Zürich" depend on the character encoding. If you open a UTF-8 CSV in a tool that assumes Windows-1252, you get mojibake—wrong characters. Here’s how to avoid and fix it.
1. Why Encoding Matters for CSV
Every CSV is a sequence of bytes. The same byte value can mean different characters in different encodings. For example, the bytes C3 A4 in UTF-8 are the character ä; in ISO-8859-1, E4 is ä. If your app or Excel assumes the wrong encoding, you see or ä instead of ä. For more on CSV structure, see What is a CSV File?.
Typical symptom: You export from a German/French/Spanish system, open in Excel or another tool, and names or places show wrong characters (e.g. ö instead of ö). That’s almost always encoding.
2. UTF-8: The Modern Standard
UTF-8 can represent every character in Unicode. It uses 1 byte for ASCII (a–z, 0–9) and 2–4 bytes for accented letters, umlauts, and symbols. UTF-8 is the default for the web, modern APIs, and most databases. For CSV, use UTF-8 whenever possible and save with a BOM (Byte Order Mark: EF BB BF) if you need Excel on Windows to open it correctly without asking.
Example (UTF-8):
name,city
Müller,Zürich
François,Paris
Niño,Madrid
3. Windows-1252 (CP1252)
Windows-1252 (Code Page 1252) is the default in older Windows apps and Excel in many locales. It’s a superset of ISO-8859-1 with extra characters in the 0x80–0x9F range (e.g. smart quotes, Euro €). CSV exported from legacy Windows systems is often Windows-1252. If you open such a file as UTF-8, characters like ä, ö, ü can appear as two-character sequences (e.g. ä).
4. ISO-8859-1 (Latin-1)
ISO-8859-1 covers Western European languages with one byte per character (256 code points). It’s common in older European systems. It’s very similar to Windows-1252; the main difference is the 0x80–0x9F range. Many tools treat “Latin-1” and “CP1252” interchangeably for CSV, but strictly they’re not identical.
| Encoding | Use case | BOM |
|---|---|---|
| UTF-8 | Web, APIs, new systems; all languages | Optional (helps Excel) |
| Windows-1252 | Legacy Windows/Excel exports (Western Europe) | No |
| ISO-8859-1 | Older European systems | No |
5. How to Detect and Fix Encoding
Detect: If you see patterns like ä (ä), ö (ö), é (é), the file is likely UTF-8 being read as Windows-1252 or Latin-1. The reverse (wrong characters when you expect Latin-1) means the file is Windows-1252/Latin-1 but opened as UTF-8. Use a validator or a small script to try both. neatcsv’s CSV Validator can help spot encoding issues; Clean CSV can normalize output to UTF-8.
Fix in Excel: Use “Data → From Text/CSV”, choose the file, then in the import wizard select the correct “File origin” (e.g. “65001: Unicode (UTF-8)” or “1252: Western European”) and load. Save As → CSV UTF-8 (Comma delimited) to export as UTF-8 with BOM. More in How to Open CSV in Excel.
6. Code Examples: Python & JavaScript
Reading CSV with the right encoding and writing UTF-8 keeps data consistent.
Python
# Read Windows-1252 CSV, write UTF-8
with open('input.csv', encoding='cp1252') as f:
text = f.read()
with open('output.csv', 'w', encoding='utf-8-sig', newline='') as f:
f.write(text) # or use csv.reader/writer for structured data
JavaScript (browser / Node)
// File API: read as UTF-8 (default for text())
const text = await file.text();
// If you have bytes from a legacy system, use TextDecoder
const decoder = new TextDecoder('windows-1252');
const text = decoder.decode(uint8Array);
For full CSV parsing (headers, commas, quotes), use a library like Papa Parse (JS) or the built-in csv module (Python) and always pass the correct encoding when reading.
7. Best Practices
- Save new CSV as UTF-8, preferably with BOM if Excel on Windows will open it.
- Document the encoding in specs or filenames (e.g.
export_utf8.csv) so downstream tools don’t guess. - When importing, try UTF-8 first; if you see mojibake, try Windows-1252 or ISO-8859-1.
- Use a CSV validator or a small script to detect encoding (e.g. Python
chardet) before processing.
For more on fixing broken CSV (columns, delimiters, dates), see 10 Common CSV Errors.