### Summary `ujson.dumps()` (or `ujson.dump()` or `ujson.encode()`) have a `reject_bytes=False` option. When set, they may accept malformed…
CWE-20·Published 2026-06-19
### Summary `ujson.dumps()` (or `ujson.dump()` or `ujson.encode()`) have a `reject_bytes=False` option. When set, they may accept malformed or truncated UTF-8 byte sequences, silently rewriting them into different Unicode characters instead of rejecting them. This leads to input validation bypass and data integrity issues. ### Details The expected behavior is that for `x` being any bytes string, `x == ujson.loads(ujson.dumps(x, reject_bytes=False)).encode(errors="surrogatepass")` should always either be true or `ujson.dumps()` will throw an exception. In reality, some strings which should've been errors are silently rewritten as other strings: * Invalid continuation bytes are replaced with valid ones: `b'\xcf\x13'` -> `b'\xcf\x93'` * Unterminated sequence completes the sequence: `b'\xc3'` -> `b'\xc3\x80'` * ... or leads to reading past the end of string: `b'\xf0\x90\x94'` -> `b"\xf0\x90\x94\x80inxcontrib'"` ### Impact An application relying on reject_bytes=False for UTF-8 handling may experience: - Data integrity issues - Experience validation bypass if said validation occurs before serialisation ### Remediation The missing/broken UTF-8 validation checks were added/fixed in https://github.com/ultrajson/ultrajson/commit/169eaf36b1116fece5034ee79a7a0ef3f6deedcf. We recommend upgrading to [UltraJSON 5.13.0](https://github.com/ultrajson/ultrajson/releases/tag/5.13.0). ### Workarounds Decoding bytes to strings in Python before passing them to `ujson.dumps()` avoids this issue.
### Summary `ujson.dumps()` (or `ujson.dump()` or `ujson.encode()`) have a `reject_bytes=False` option. When set, they may accept malformed or truncated UTF-8 byte sequences, silently rewriting them into different Unicode characters instead of rejecting them. This leads to input validation bypass and data integrity issues. ### Details The expected behavior is that for `x` being any bytes string, `x == ujson.loads(ujson.dumps(x, reject_bytes=False)).encode(errors="surrogatepass")` should always either be true or `ujson.dumps()` will throw an exception. In reality, some strings which should've been errors are silently rewritten as other strings: * Invalid continuation bytes are replaced with valid ones: `b'\xcf\x13'` -> `b'\xcf\x93'` * Unterminated sequence completes the sequence: `b'\xc3'` -> `b'\xc3\x80'` * ... or leads to reading past the end of string: `b'\xf0\x90\x94'` -> `b"\xf0\x90\x94\x80inxcontrib'"` ### Impact An application relying on reject_bytes=False for UTF-8 handling may experience: - Data integrity issues - Experience validation bypass if said validation occurs before serialisation ### Remediation The missing/broken UTF-8 validation checks were added/fixed in https://github.com/ultrajson/ultrajson/commit/169eaf36b1116fece5034ee79a7a0ef3f6deedcf. We recommend upgrading to [UltraJSON 5.13.0](https://github.com/ultrajson/ultrajson/releases/tag/5.13.0). ### Workarounds Decoding bytes to strings in Python before passing them to `ujson.dumps()` avoids this issue.
| Version | Type | Source | Base | Exp | Impact | Vector |
|---|---|---|---|---|---|---|
| 3.1 | Secondary | GHSA | 6.5 | — | — | CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:N |