5 lỗi giải mã URL phổ biến và cách khắc phục
URL decoding errors can turn a smooth user experience into a debugging nightmare. Based on years of web development experience and thousands of bug reports, here are the 5 most common URL decoding errors—and exactly how to fix them.
Error #1: Incorrect Percent-Encoding Format
The Problem
Not all strings that look URL-encoded are actually valid. Invalid percent sequences will cause decoding to fail.
Common invalid patterns:
hello%world // Missing hex digits
test%2 // Incomplete sequence (needs 2 hex digits)
data%ZZ // Invalid hex characters
url%GG%20test // Mix of invalid (%GG) and valid (%20)
What Happens
// This will throw an error!
decodeURIComponent('hello%world');
// URIError: URI malformed
decodeURIComponent('test%2');
// URIError: URI mal formed
The Root Cause
- Manual URL construction without proper encoding
- Truncated URLs (copy-paste errors)
- Non-URL data mistaken for encoded strings
- Legacy systems that don't follow RFC 3986
The Solution
Fix #1: Validate before decoding
function isValidEncoded(str) {
// Check for invalid percent patterns
const invalidPattern = /%(?![0-9A-Fa-f]{2})|%[0-9A-Fa-f](?![0-9A-Fa-f])/;
if (invalidPattern.test(str)) {
return false;
}
// Try decoding - if it throws, it's invalid
try {
decodeURIComponent(str);
return true;
} catch (e) {
return false;
}
}
// Usage
const userInput = params.get('search');
if (isValidEncoded(userInput)) {
const decoded = decodeURIComponent(userInput);
} else {
console.error('Invalid URL encoding detected');
// Handle the error appropriately
}
Fix #2: Sanitize malformed encodings
function sanitizeEncoding(str) {
// Replace incomplete or invalid percent sequences
return str.replace(/%(?![0-9A-Fa-f]{2})/g, '%25');
// Converts % to %25 when not followed by 2 hex digits
}
// Example
sanitizeEncoding('hello%world'); // → 'hello%25world'
decodeURIComponent(sanitizeEncoding('hello%world')); // → 'hello%world'
Fix #3: Pre-process with regex
function safelyDecode(str) {
try {
return decodeURIComponent(str);
} catch (e) {
// Fallback: manually replace common patterns
return str
.replace(/%20/g, ' ')
.replace(/%21/g, '!')
.replace(/%40/g, '@')
.replace(/%23/g, '#')
.replace(/%25/g, '%');
// Note: This is not comprehensive, just a fallback
}
}
Prevention
Always use proper encoding functions:
// ✅ Correct
const query = encodeURIComponent(userInput);
const url = `/search?q=${query}`;
// ❌ Wrong - manual URL building
const url = `/search?q=${userInput.replace(/ /g, '%20')}`;
Quick Test
// Test cases for validation
const testCases = [
{ input: 'hello%20world', valid: true },
{ input: 'hello%world', valid: false },
{ input: 'test%2', valid: false },
{ input: '%E4%B8%AD%E6%96%87', valid: true },
{ input: 'normal-text', valid: true }, // No encoding is valid too
];
testCases.forEach(({ input, valid }) => {
const result = isValidEncoded(input);
console.assert(result === valid, `Failed for: ${input}`);
});
Error #2: Character Encoding Mismatches
The Problem
Encoding a string in one character set (e.g., ISO-8859-1) and decoding it as another (UTF-8) produces gibberish or the replacement character �.
Symptoms:
Expected: café
Got: café
Expected: 中文
Got: ���
Expected: Ñoño
Got: �o�o
What Happens
// If the server encoded in ISO-8859-1 but you decode as UTF-8:
const encoded = '%C3%A9'; // é in UTF-8
decodeURIComponent(encoded); // → 'é' (correct in UTF-8)
// But if it was actually ISO-8859-1 encoded as %E9:
const wrongEncoding = '%E9';
decodeURIComponent(wrongEncoding); // → 'é' but displays wrong
The Root Cause
- Legacy systems using non-UTF-8 encodings
- Mixed encoding in different parts of the application
- Database configured with wrong charset
- HTTP headers specifying incorrect encoding
The Solution
Fix #1: Standardize on UTF-8 everywhere
<!-- In HTML -->
<meta charset="UTF-8">
<!-- In HTTP headers -->
Content-Type: text/html; charset=UTF-8
// In Express.js
app.use(express.urlencoded({ extended: true, charset: 'utf-8' }));
-- In MySQL
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Fix #2: Detect encoding mismatches
function looksLikeMojibake(str) {
// Common patterns of UTF-8 interpreted as ISO-8859-1
const suspiciousPatterns = [
/é|è|à |ç/, // Common in French
/£|¥|©/, // Currency and symbols
/�/, // Replacement character
];
return suspiciousPatterns.some(pattern => pattern.test(str));
}
// Usage
const decoded = decodeURIComponent(encoded);
if (looksLikeMojibake(decoded)) {
console.warn('Possible encoding mismatch detected!');
}
Fix #3: Re-encode if necessary
// If you know the source was Latin-1 but got decoded as UTF-8:
function fixLatin1ToUTF8(str) {
// This is a complex operation, use a library if possible
const encoder = new TextEncoder();
const decoder = new TextDecoder('iso-8859-1');
const bytes = encoder.encode(str);
return decoder.decode(bytes);
}
Prevention
Enforce UTF-8 at every layer:
- Database: UTF-8 (or utf8mb4 for MySQL)
- HTTP headers:
Content-Type: charset=UTF-8 - HTML:
<meta charset="UTF-8"> - Source files: Save as UTF-8
- APIs: Accept and return UTF-8
Quick Test
// Test with international characters
const tests = [
{ text: 'café', lang: 'French' },
{ text: '中文', lang: 'Chinese' },
{ text: 'العربية', lang: 'Arabic' },
{ text: '😀', lang: 'Emoji' },
];
tests.forEach(({ text, lang }) => {
const encoded = encodeURIComponent(text);
const decoded = decodeURIComponent(encoded);
console.assert(decoded === text, `${lang} encoding failed`);
});
Error #3: Incomplete Decoding (Multi-Layer Issues)
The Problem
URLs encoded multiple times need multiple decode operations. Stopping too early leaves percent sequences in the output.
Example:
Original: Hello World
Encoded once: Hello%20World
Encoded twice: Hello%2520World
Encoded thrice: Hello%252520World
// If you only decode once:
decodeURIComponent('Hello%252520World') // → 'Hello%2520World' (still encoded!)
What Happens
const doubleEncoded = 'search%253Dhello%2520world';
// Decode once
const once = decodeURIComponent(doubleEncoded);
console.log(once); // 'search%3Dhello%20world' - still contains %3D and %20!
// Decode twice
const twice = decodeURIComponent(once);
console.log(twice); // 'search=hello world' - correct!
The Root Cause
- Multiple redirects each encoding the URL
- Middleware chains that encode repeatedly
- User copy-paste of already-encoded URLs
- Framework auto-encoding on top of manual encoding
The Solution
Fix #1: Iterative decoding until stable
function fullyDecode(str) {
let decoded = str;
let previous = '';
let iterations = 0;
const MAX_ITERATIONS = 5; // Safety limit
while (decoded !== previous && iterations < MAX_ITERATIONS) {
previous = decoded;
try {
const temp = decodeURIComponent(decoded);
// Only continue if something actually changed
if (temp !== decoded) {
decoded = temp;
} else {
break;
}
} catch (e) {
// Stop on error
console.error('Decoding stopped due to error:', e);
break;
}
iterations++;
}
console.log(`Decoded ${iterations} times`);
return decoded;
}
// Usage
fullyDecode('Hello%252520World'); // → 'Hello World' (3 iterations)
Fix #2: Count encoding layers
function countLayers(str) {
let count = 0;
let current = str;
while (/%[0-9A-Fa-f]{2}/.test(current) && count < 10) {
try {
const decoded = decodeURIComponent(current);
if (decoded === current) break; // No change
current = decoded;
count++;
} catch (e) {
break;
}
}
return count;
}
// Usage
console.log(countLayers('Hello%20World')); // 1
console.log(countLayers('Hello%2520World')); // 2
console.log(countLayers('Hello%252520World')); // 3
Fix #3: Detect and warn
function decodeWithWarning(str) {
const layers = countLayers(str);
if (layers > 1) {
console.warn(`Multi-layer encoding detected: ${layers} layers`);
}
return fullyDecode(str);
}
Prevention
Avoid double-encoding:
// ❌ Don't do this
const alreadyEncoded = encodeURIComponent(userInput);
const doubleEncoded = encodeURIComponent(alreadyEncoded); // Wrong!
// ✅ Encode only once
const encoded = encodeURIComponent(userInput);
// ✅ Or check if already encoded
function encodeOnce(str) {
// Simple check: if it contains %, assume it's encoded
if (/%[0-9A-Fa-f]{2}/.test(str)) {
return str; // Already encoded
}
return encodeURIComponent(str);
}
Quick Test
const multilayerTests = [
{ input: 'Hello%20World', layers: 1 },
{ input: 'Hello%2520World', layers: 2 },
{ input: '%25252525', layers: 4 }, // %25 encoded 4 times
];
multilayerTests.forEach(({ input, layers }) => {
const detected = countLayers(input);
console.assert(detected === layers, `Failed: expected ${layers}, got ${detected}`);
});
Error #4: Reserved Character Confusion
The Problem
Not knowing which characters are reserved leads to incorrect encoding/decoding decisions.
Common mistakes:
Encoding ? in a query string // Wrong! ? is the query delimiter
Not encoding & in a value // Wrong! & separates parameters
Encoding / in a path // Usually wrong! / is the path separator
What Happens
// Wrong: encoding the query delimiter
const wrongUrl = `/search${encodeURIComponent('?q=test')}`;
// → /search%3Fq%3Dtest (the ? is encoded!)
// Wrong: not encoding & in a value
const name = 'Tom & Jerry';
const badUrl = `/search?query=${name}`;
// → /search?query=Tom & Jerry
// Browser interprets as: query=Tom and a parameter named "Jerry"
// Correct:
const goodUrl = `/search?query=${encodeURIComponent(name)}`;
// → /search?query=Tom%20%26%20Jerry
The Root Cause
- Confusion about URL structure
- Wrong encoding function (
encodeURIvsencodeURIComponent) - Manual URL building without understanding reserved characters
Reserved Characters in URLs
| Character | Meaning | Encode in values? |
|---|---|---|
: | Protocol/port separator | Yes (in values) |
/ | Path separator | No (in paths), Yes (in values) |
? | Query string start | No (as delimiter), Yes (in values) |
# | Fragment identifier | No (as delimiter), Yes (in values) |
& | Parameter separator | No (as separator), Yes (in values) |
= | Key-value separator | No (as separator), Yes (in values) |
@ | User info separator | Yes (usually) |
The Solution
Fix #1: Use the right encoding function
// For encoding COMPLETE URLs
const fullUrl = 'https://example.com/path with spaces/file.html';
const encoded = encodeURI(fullUrl);
// → 'https://example.com/path%20with%20spaces/file.html'
// Note: /, :, ? are NOT encoded
// For encoding URL COMPONENTS (query values, path segments)
const value = 'hello/world?test=value';
const encoded = encodeURIComponent(value);
// → 'hello%2Fworld%3Ftest%3Dvalue'
// Note: ALL special characters are encoded
Fix #2: Build URLs properly
// ❌ Wrong way
const search = 'hello & goodbye';
const url = '/search?q=' + search; // Breaks on &
// ✅ Right way - encode the value
const url = '/search?q=' + encodeURIComponent(search);
// ✅ Better - use URL API
const url = new URL('/search', window.location.origin);
url.searchParams.set('q', search); // Automatic encoding
console.log(url.href);
Fix #3: Parse URLs correctly
// ❌ Wrong - manual parsing
const query = window.location.search; // ?name=Tom%20%26%20Jerry
const value = query.split('=')[1]; // 'Tom%20%26%20Jerry'
// If you forget to decode, you'll show the encoded version
// ✅ Right - use URL API
const params = new URLSearchParams(window.location.search);
const value = params.get('name'); // Automatically decoded: 'Tom & Jerry'
Prevention
Use URL utilities:
// Node.js or modern browsers
const { URL, URLSearchParams } = require('url'); // Node.js
// Or just use global URL and URLSearchParams in browsers
// Build URLs safely
const url = new URL('https://example.com/search');
url.searchParams.append('query', 'hello & goodbye');
url.searchParams.append('page', '1');
console.log(url.toString());
// → https://example.com/search?query=hello+%26+goodbye&page=1
Quick Test
const reservedCharTests = [
{ char: '&', desc: 'Ampersand' },
{ char: '=', desc: 'Equals' },
{ char: '?', desc: 'Question mark' },
{ char: '#', desc: 'Hash' },
{ char: '/', desc: 'Slash' },
];
reservedCharTests.forEach(({ char, desc }) => {
const value = `before${char}after`;
const encoded = encodeURIComponent(value);
const decoded = decodeURIComponent(encoded);
console.log(`${desc} (${char}):`);
console.log(` Original: ${value}`);
console.log(` Encoded: ${encoded}`);
console.log(` Decoded: ${decoded}`);
console.assert(decoded === value, `${desc} failed roundtrip`);
});
Error #5: Using Wrong Decoding Functions/Methods
The Problem
Different languages and frameworks have different decoding functions. Using the wrong one produces incorrect results.
Common Mistakes
JavaScript:
// ❌ Wrong for query parameters
decodeURI('hello%20world%26test');
// → 'hello world%26test' (doesn't decode &)
// ✅ Correct
decodeURIComponent('hello%20world%26test');
// → 'hello world&test'
Python:
# ❌ Wrong - quote() instead of unquote()
from urllib.parse import quote
result = quote('hello%20world')
# → 'hello%2520world' (double encoded!)
# ✅ Correct
from urllib.parse import unquote
result = unquote('hello%20world')
# → 'hello world'
PHP:
// Plus signs (+) represent spaces in form data
$encoded = 'hello+world';
// ❌ urldecode() treats + as space
$result = urldecode($encoded);
// → 'hello world'
// ✅ Use rawurldecode() to keep + as literal
$result = rawurldecode($encoded);
// → 'hello+world'
// Or use urldecode() if + should be space (form data)
The Solution
Fix #1: Know your functions
JavaScript:
decodeURI()- for entire URLs (doesn't decode&,=,?, etc.)decodeURIComponent()- for URL parts (decodes everything)
Python:
urllib.parse.unquote()- standard decodeurllib.parse.unquote_plus()- decode + as space (for form data)
PHP:
urldecode()- decode + as spacerawurldecode()- don't decode +
Fix #2: Handle plus signs correctly
// If dealing with form-encoded data where + means space:
function decodeFormData(str) {
return decodeURIComponent(str.replace(/\+/g, ' '));
}
// Usage
decodeFormData('hello+world'); // → 'hello world'
decodeURIComponent('hello+world'); // → 'hello+world' (+ not decoded)
Fix #3: Test your decode function
const testStrings = [
'hello%20world', // Space
'hello+world', // Plus
'hello%2Bworld', // Encoded plus
'test%26value', // Ampersand
'%E4%B8%AD%E6%96%87', // UTF-8
];
testStrings.forEach(str => {
console.log(`Input: ${str}`);
console.log(`decodeURI: ${decodeURI(str)}`);
console.log(`decodeURIComponent: ${decodeURIComponent(str)}`);
console.log('---');
});
Prevention
Create wrapper functions:
// Standardize decoding across your application
function safeDecodeParam(str) {
if (!str) return '';
try {
// Replace + with space for form data, then decode
return decodeURIComponent(str.replace(/\+/g, ' '));
} catch (e) {
console.error('Decoding error:', e);
return str; // Return original on error
}
}
// Use consistently
const userQuery = safeDecodeParam(params.get('q'));
Quick Test
// Test all decoding functions with same input
const testInput = 'hello%20world%26test';
console.log('Testing:', testInput);
console.log('decodeURI: ', decodeURI(testInput));
console.log('decodeURIComponent:', decodeURIComponent(testInput));
console.log('Expected: hello world&test');
Debugging Checklist
When you encounter URL decoding issues, use this checklist:
- Valid encoding? Check for malformed percent sequences (
%ZZ,%2) - Correct charset? Verify UTF-8 throughout the stack
- Single or multi-layer? Count how many times it's encoded
- Reserved characters? Ensure proper handling of
&,=,?, etc. - Right function? Using
decodeURIComponent()vsdecodeURI()? - Plus signs? Are they meant to be spaces or literal
+? - Error handling? Wrapped in try-catch?
- Sanitized? Validated and sanitized after decoding?
Tools for Debugging
- Our URL Decoder: Free online tool with multi-layer detection
- Browser DevTools:
console.log(decodeURIComponent(str)) - URL Parser: Visualize URL components
- Hex viewers: See actual byte values
Summary
| Error | Quick Fix | Prevention |
|---|---|---|
| #1 Incorrect format | Validate before decode | Use proper encoding functions |
| #2 Encoding mismatch | Standardize on UTF-8 | UTF-8 everywhere |
| #3 Incomplete decode | Decode until stable | Avoid double-encoding |
| #4 Reserved chars | Use encodeURIComponent() | Use URL API |
| #5 Wrong function | Know your functions | Create wrappers |
By understanding and fixing these 5 common errors, you'll handle URL decoding like a pro. Remember: validate inputs, decode carefully, and always test with edge cases!
Avoid these errors instantly with our free URL decoder tool that handles all edge cases automatically!