TPToolPazar
Ana Sayfa/Rehberler/How To Strip Special Characters

How To Strip Special Characters

📖 Bu rehber ToolPazar ekibi tarafından hazırlanmıştır. Tüm araçlarımız ücretsiz ve reklamsızdır.

Start by defining “special”

Pick the output constraint first, then derive the allow-list:

Allow-list beats deny-list

Deny-listing (“remove these bad characters”) leaves you vulnerable to characters you didn’t think of — especially Unicode confusables, zero-width characters, and invisible tags. Allow-listing (“keep only these characters”) is safer.

ASCII-only with transliteration

Don’t just strip non-ASCII — transliterate first so “café” becomes “cafe,” not “caf.” The trick: normalize to NFD (decomposed form), then strip combining marks, then strip anything still non-ASCII.

URL-safe output

This handles accented Latin beautifully. It can’t transliterate non-Latin scripts — for Cyrillic, Greek, or CJK you need a dedicated library.

Preserve spaces but strip punctuation

URLs allow a narrow character set. The standard pattern:

Filename sanitization

Common for prepping text for tokenization or search indexing:

Control character stripping

Removes punctuation but keeps word boundaries intact.

Preserving quotes and apostrophes

Windows is the strictest. Safe filename regex:

Category-based stripping with Unicode

“Smart” quotes (U+2018, U+2019, U+201C, U+201D) vs straight (U+0027, U+0022) is a frequent headache. Pick one and normalize:

Testing your filter

Regex Unicode categories let you strip by meaning, not by codepoint:

Common mistakes

Always run it on a torture-test string:

Run the numbers

Check the output for smart quotes, combining marks, emoji, zero-width space, and control characters. If any slipped through, tighten your allow-list.