How To Strip Special Characters
📖 Bu rehber ToolPazar ekibi tarafından hazırlanmıştır. Tüm araçlarımız ücretsiz ve reklamsızdır.
Start by defining “special”
Pick the output constraint first, then derive the allow-list:
Allow-list beats deny-list
Deny-listing (“remove these bad characters”) leaves you vulnerable to characters you didn’t think of — especially Unicode confusables, zero-width characters, and invisible tags. Allow-listing (“keep only these characters”) is safer.
ASCII-only with transliteration
Don’t just strip non-ASCII — transliterate first so “café” becomes “cafe,” not “caf.” The trick: normalize to NFD (decomposed form), then strip combining marks, then strip anything still non-ASCII.
URL-safe output
This handles accented Latin beautifully. It can’t transliterate non-Latin scripts — for Cyrillic, Greek, or CJK you need a dedicated library.
Preserve spaces but strip punctuation
URLs allow a narrow character set. The standard pattern:
Filename sanitization
Common for prepping text for tokenization or search indexing:
Control character stripping
Removes punctuation but keeps word boundaries intact.
Preserving quotes and apostrophes
Windows is the strictest. Safe filename regex:
Category-based stripping with Unicode
“Smart” quotes (U+2018, U+2019, U+201C, U+201D) vs straight (U+0027, U+0022) is a frequent headache. Pick one and normalize:
Testing your filter
Regex Unicode categories let you strip by meaning, not by codepoint:
Common mistakes
Always run it on a torture-test string:
Run the numbers
Check the output for smart quotes, combining marks, emoji, zero-width space, and control characters. If any slipped through, tighten your allow-list.