gh-62259: Add support of multi-byte encodings in the XML parser by serhiy-storchaka · Pull Request #149860 · python/cpython

serhiy-storchaka · 2026-05-15T09:32:39Z

Supported encodings: "cp932", "cp949", "cp950", "Big5","EUC-JP", "GB2312", "GBK", "johab", and "Shift_JIS".

Partially supported encodings (only BMP characters): "Big5-HKSCS", "EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213", "utf-8-sig" and non-standard aliases like "UTF8" (without hyphen).

The parser now raises ValueError for known unsupported multi-byte encodings such us "ISO-2022-JP" or "raw-unicode-escape" instead of failing later, when encounter non-ASCII data.

Issue: Add multibyte encoding support to pyexpat #62259

Supported encodings: "cp932", "cp949", "cp950", "Big5","EUC-JP", "GB2312", "GBK", "johab", and "Shift_JIS". Partially supported encodings (only BMP characters): "Big5-HKSCS", "EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213", "utf-8-sig" and non-standard aliases like "UTF8" (without hyphen). The parser now raises ValueError for known unsupported multi-byte encodings such us "ISO-2022-JP" or "raw-unicode-escape" instead of failing later, when encounter non-ASCII data.

read-the-docs-community · 2026-05-15T09:36:18Z

Documentation build overview

📚 cpython-previews | 🛠️ Build #32704028 | 📁 Comparing 25c8b75 against main (7e98deb)

🔍 Preview build

3 files changed

± library/pyexpat.html
± whatsnew/3.16.html
± whatsnew/changelog.html

malemburg

LGTM.

The only detail that is missing is the documentation update. This still reads "UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII" which already appears to be out of date.

serhiy-storchaka requested a review from scoder May 15, 2026 09:32

serhiy-storchaka requested a review from AA-Turner as a code owner May 15, 2026 09:32

bedevere-app Bot added the awaiting core review label May 15, 2026

bedevere-app Bot mentioned this pull request May 15, 2026

Add multibyte encoding support to pyexpat #62259

Open

serhiy-storchaka requested a review from malemburg May 15, 2026 09:54

malemburg approved these changes May 15, 2026

View reviewed changes

bedevere-app Bot added awaiting merge and removed awaiting core review labels May 15, 2026

serhiy-storchaka mentioned this pull request May 15, 2026

gh-148821: Always reject known multi-byte encodings in pyexpat #148911

Open

Update docs.

25c8b75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-62259: Add support of multi-byte encodings in the XML parser#149860

gh-62259: Add support of multi-byte encodings in the XML parser#149860
serhiy-storchaka wants to merge 2 commits into
python:mainfrom
serhiy-storchaka:pyexpat-multibyte-encodings2

serhiy-storchaka commented May 15, 2026 •

edited by bedevere-app Bot

Loading

Uh oh!

read-the-docs-community Bot commented May 15, 2026 •

edited

Loading

Uh oh!

malemburg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

serhiy-storchaka commented May 15, 2026 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

read-the-docs-community Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

malemburg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

serhiy-storchaka commented May 15, 2026 •

edited by bedevere-app Bot

Loading

read-the-docs-community Bot commented May 15, 2026 •

edited

Loading