5 lỗi giải mã URL phổ biến và cách khắc phục

20 tháng 1, 2026

URL decoding errors can turn a smooth user experience into a debugging nightmare. Based on years of web development experience and thousands of bug reports, here are the 5 most common URL decoding errors—and exactly how to fix them.

Error #1: Incorrect Percent-Encoding Format

The Problem

Not all strings that look URL-encoded are actually valid. Invalid percent sequences will cause decoding to fail.

Common invalid patterns:

hello%world      // Missing hex digits
test%2          // Incomplete sequence (needs 2 hex digits)
data%ZZ         // Invalid hex characters
url%GG%20test   // Mix of invalid (%GG) and valid (%20)

What Happens

// This will throw an error!
decodeURIComponent('hello%world');
// URIError: URI malformed

decodeURIComponent('test%2');
// URIError: URI mal formed

The Root Cause

Manual URL construction without proper encoding
Truncated URLs (copy-paste errors)
Non-URL data mistaken for encoded strings
Legacy systems that don't follow RFC 3986

The Solution

Fix #1: Validate before decoding

function isValidEncoded(str) {
  // Check for invalid percent patterns
  const invalidPattern = /%(?![0-9A-Fa-f]{2})|%[0-9A-Fa-f](?![0-9A-Fa-f])/;
  
  if (invalidPattern.test(str)) {
    return false;
  }
  
  // Try decoding - if it throws, it's invalid
  try {
    decodeURIComponent(str);
    return true;
  } catch (e) {
    return false;
  }
}

// Usage
const userInput = params.get('search');
if (isValidEncoded(userInput)) {
  const decoded = decodeURIComponent(userInput);
} else {
  console.error('Invalid URL encoding detected');
  // Handle the error appropriately
}

Fix #2: Sanitize malformed encodings

function sanitizeEncoding(str) {
  // Replace incomplete or invalid percent sequences
  return str.replace(/%(?![0-9A-Fa-f]{2})/g, '%25');
  // Converts % to %25 when not followed by 2 hex digits
}

// Example
sanitizeEncoding('hello%world');  // → 'hello%25world'
decodeURIComponent(sanitizeEncoding('hello%world'));  // → 'hello%world'

Fix #3: Pre-process with regex

function safelyDecode(str) {
  try {
    return decodeURIComponent(str);
  } catch (e) {
    // Fallback: manually replace common patterns
    return str
      .replace(/%20/g, ' ')
      .replace(/%21/g, '!')
      .replace(/%40/g, '@')
      .replace(/%23/g, '#')
      .replace(/%25/g, '%');
    // Note: This is not comprehensive, just a fallback
  }
}

Prevention

Always use proper encoding functions:

// ✅ Correct
const query = encodeURIComponent(userInput);
const url = `/search?q=${query}`;

// ❌ Wrong - manual URL building
const url = `/search?q=${userInput.replace(/ /g, '%20')}`;

Quick Test

// Test cases for validation
const testCases = [
  { input: 'hello%20world', valid: true },
  { input: 'hello%world', valid: false },
  { input: 'test%2', valid: false },
  { input: '%E4%B8%AD%E6%96%87', valid: true },
  { input: 'normal-text', valid: true },  // No encoding is valid too
];

testCases.forEach(({ input, valid }) => {
  const result = isValidEncoded(input);
  console.assert(result === valid, `Failed for: ${input}`);
});

Error #2: Character Encoding Mismatches

The Problem

Encoding a string in one character set (e.g., ISO-8859-1) and decoding it as another (UTF-8) produces gibberish or the replacement character �.

Symptoms:

Expected: café
Got: cafÃ©

Expected: 中文
Got: ���

Expected: Ñoño
Got: Ã�oÃ�o

What Happens

// If the server encoded in ISO-8859-1 but you decode as UTF-8:
const encoded = '%C3%A9';  // é in UTF-8
decodeURIComponent(encoded);  // → 'é' (correct in UTF-8)

// But if it was actually ISO-8859-1 encoded as %E9:
const wrongEncoding = '%E9';
decodeURIComponent(wrongEncoding);  // → 'é' but displays wrong

The Root Cause

Legacy systems using non-UTF-8 encodings
Mixed encoding in different parts of the application
Database configured with wrong charset
HTTP headers specifying incorrect encoding

The Solution

Fix #1: Standardize on UTF-8 everywhere

<!-- In HTML -->
<meta charset="UTF-8">

<!-- In HTTP headers -->
Content-Type: text/html; charset=UTF-8

// In Express.js
app.use(express.urlencoded({ extended: true, charset: 'utf-8' }));

-- In MySQL
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Fix #2: Detect encoding mismatches

function looksLikeMojibake(str) {
  // Common patterns of UTF-8 interpreted as ISO-8859-1
  const suspiciousPatterns = [
    /Ã©|Ã¨|Ã |Ã§/,  // Common in French
    /Â£|Â¥|Â©/,      // Currency and symbols
    /ï¿½/,            // Replacement character
  ];
  
  return suspiciousPatterns.some(pattern => pattern.test(str));
}

// Usage
const decoded = decodeURIComponent(encoded);
if (looksLikeMojibake(decoded)) {
  console.warn('Possible encoding mismatch detected!');
}

Fix #3: Re-encode if necessary

// If you know the source was Latin-1 but got decoded as UTF-8:
function fixLatin1ToUTF8(str) {
  // This is a complex operation, use a library if possible
  const encoder = new TextEncoder();
  const decoder = new TextDecoder('iso-8859-1');
  
  const bytes = encoder.encode(str);
  return decoder.decode(bytes);
}

Prevention

Enforce UTF-8 at every layer:

Database: UTF-8 (or utf8mb4 for MySQL)
HTTP headers: Content-Type: charset=UTF-8
HTML: <meta charset="UTF-8">
Source files: Save as UTF-8
APIs: Accept and return UTF-8

Quick Test

// Test with international characters
const tests = [
  { text: 'café', lang: 'French' },
  { text: '中文', lang: 'Chinese' },
  { text: 'العربية', lang: 'Arabic' },
  { text: '😀', lang: 'Emoji' },
];

tests.forEach(({ text, lang }) => {
  const encoded = encodeURIComponent(text);
  const decoded = decodeURIComponent(encoded);
  console.assert(decoded === text, `${lang} encoding failed`);
});

Error #3: Incomplete Decoding (Multi-Layer Issues)

The Problem

URLs encoded multiple times need multiple decode operations. Stopping too early leaves percent sequences in the output.

Example:

Original:      Hello World
Encoded once:  Hello%20World  
Encoded twice: Hello%2520World
Encoded thrice: Hello%252520World

// If you only decode once:
decodeURIComponent('Hello%252520World')  // → 'Hello%2520World' (still encoded!)

What Happens

const doubleEncoded = 'search%253Dhello%2520world';

// Decode once
const once = decodeURIComponent(doubleEncoded);
console.log(once);  // 'search%3Dhello%20world' - still contains %3D and %20!

// Decode twice
const twice = decodeURIComponent(once);
console.log(twice);  // 'search=hello world' - correct!

The Root Cause

Multiple redirects each encoding the URL
Middleware chains that encode repeatedly
User copy-paste of already-encoded URLs
Framework auto-encoding on top of manual encoding

The Solution

Fix #1: Iterative decoding until stable

function fullyDecode(str) {
  let decoded = str;
  let previous = '';
  let iterations = 0;
  const MAX_ITERATIONS = 5;  // Safety limit
  
  while (decoded !== previous && iterations < MAX_ITERATIONS) {
    previous = decoded;
    try {
      const temp = decodeURIComponent(decoded);
      // Only continue if something actually changed
      if (temp !== decoded) {
        decoded = temp;
      } else {
        break;
      }
    } catch (e) {
      // Stop on error
      console.error('Decoding stopped due to error:', e);
      break;
    }
    iterations++;
  }
  
  console.log(`Decoded ${iterations} times`);
  return decoded;
}

// Usage
fullyDecode('Hello%252520World');  // → 'Hello World' (3 iterations)

Fix #2: Count encoding layers

function countLayers(str) {
  let count = 0;
  let current = str;
  
  while (/%[0-9A-Fa-f]{2}/.test(current) && count < 10) {
    try {
      const decoded = decodeURIComponent(current);
      if (decoded === current) break;  // No change
      current = decoded;
      count++;
    } catch (e) {
      break;
    }
  }
  
  return count;
}

// Usage
console.log(countLayers('Hello%20World'));       // 1
console.log(countLayers('Hello%2520World'));     // 2
console.log(countLayers('Hello%252520World'));   // 3

Fix #3: Detect and warn

function decodeWithWarning(str) {
  const layers = countLayers(str);
  
  if (layers > 1) {
    console.warn(`Multi-layer encoding detected: ${layers} layers`);
  }
  
  return fullyDecode(str);
}

Prevention

Avoid double-encoding:

// ❌ Don't do this
const alreadyEncoded = encodeURIComponent(userInput);
const doubleEncoded = encodeURIComponent(alreadyEncoded);  // Wrong!

// ✅ Encode only once
const encoded = encodeURIComponent(userInput);

// ✅ Or check if already encoded
function encodeOnce(str) {
  // Simple check: if it contains %, assume it's encoded
  if (/%[0-9A-Fa-f]{2}/.test(str)) {
    return str;  // Already encoded
  }
  return encodeURIComponent(str);
}

Quick Test

const multilayerTests = [
  { input: 'Hello%20World', layers: 1 },
  { input: 'Hello%2520World', layers: 2 },
  { input: '%25252525', layers: 4 },  // %25 encoded 4 times
];

multilayerTests.forEach(({ input, layers }) => {
  const detected = countLayers(input);
  console.assert(detected === layers, `Failed: expected ${layers}, got ${detected}`);
});

Error #4: Reserved Character Confusion

The Problem

Not knowing which characters are reserved leads to incorrect encoding/decoding decisions.

Common mistakes:

Encoding ? in a query string  // Wrong! ? is the query delimiter
Not encoding & in a value      // Wrong! & separates parameters
Encoding / in a path           // Usually wrong! / is the path separator

What Happens

// Wrong: encoding the query delimiter
const wrongUrl = `/search${encodeURIComponent('?q=test')}`;
// → /search%3Fq%3Dtest (the ? is encoded!)

// Wrong: not encoding & in a value
const name = 'Tom & Jerry';
const badUrl = `/search?query=${name}`;
// → /search?query=Tom & Jerry
// Browser interprets as: query=Tom and a parameter named "Jerry"

// Correct:
const goodUrl = `/search?query=${encodeURIComponent(name)}`;
// → /search?query=Tom%20%26%20Jerry

The Root Cause

Confusion about URL structure
Wrong encoding function (encodeURI vs encodeURIComponent)
Manual URL building without understanding reserved characters

Reserved Characters in URLs

Character	Meaning	Encode in values?
`:`	Protocol/port separator	Yes (in values)
`/`	Path separator	No (in paths), Yes (in values)
`?`	Query string start	No (as delimiter), Yes (in values)
`#`	Fragment identifier	No (as delimiter), Yes (in values)
`&`	Parameter separator	No (as separator), Yes (in values)
`=`	Key-value separator	No (as separator), Yes (in values)
`@`	User info separator	Yes (usually)

The Solution

Fix #1: Use the right encoding function

// For encoding COMPLETE URLs
const fullUrl = 'https://example.com/path with spaces/file.html';
const encoded = encodeURI(fullUrl);
// → 'https://example.com/path%20with%20spaces/file.html'
// Note: /, :, ? are NOT encoded

// For encoding URL COMPONENTS (query values, path segments)
const value = 'hello/world?test=value';
const encoded = encodeURIComponent(value);
// → 'hello%2Fworld%3Ftest%3Dvalue'
// Note: ALL special characters are encoded

Fix #2: Build URLs properly

// ❌ Wrong way
const search = 'hello & goodbye';
const url = '/search?q=' + search;  // Breaks on &

// ✅ Right way - encode the value
const url = '/search?q=' + encodeURIComponent(search);

// ✅ Better - use URL API
const url = new URL('/search', window.location.origin);
url.searchParams.set('q', search);  // Automatic encoding
console.log(url.href);

Fix #3: Parse URLs correctly

// ❌ Wrong - manual parsing
const query = window.location.search; // ?name=Tom%20%26%20Jerry
const value = query.split('=')[1];     // 'Tom%20%26%20Jerry'
// If you forget to decode, you'll show the encoded version

// ✅ Right - use URL API
const params = new URLSearchParams(window.location.search);
const value = params.get('name');  // Automatically decoded: 'Tom & Jerry'

Prevention

Use URL utilities:

// Node.js or modern browsers
const { URL, URLSearchParams } = require('url');  // Node.js
// Or just use global URL and URLSearchParams in browsers

// Build URLs safely
const url = new URL('https://example.com/search');
url.searchParams.append('query', 'hello & goodbye');
url.searchParams.append('page', '1');
console.log(url.toString());
// → https://example.com/search?query=hello+%26+goodbye&page=1

Quick Test

const reservedCharTests = [
  { char: '&', desc: 'Ampersand' },
  { char: '=', desc: 'Equals' },
  { char: '?', desc: 'Question mark' },
  { char: '#', desc: 'Hash' },
  { char: '/', desc: 'Slash' },
];

reservedCharTests.forEach(({ char, desc }) => {
  const value = `before${char}after`;
  const encoded = encodeURIComponent(value);
  const decoded = decodeURIComponent(encoded);
  
  console.log(`${desc} (${char}):`);
  console.log(`  Original: ${value}`);
  console.log(`  Encoded:  ${encoded}`);
  console.log(`  Decoded:  ${decoded}`);
  console.assert(decoded === value, `${desc} failed roundtrip`);
});

Error #5: Using Wrong Decoding Functions/Methods

The Problem

Different languages and frameworks have different decoding functions. Using the wrong one produces incorrect results.

Common Mistakes

JavaScript:

// ❌ Wrong for query parameters
decodeURI('hello%20world%26test');  
//  → 'hello world%26test' (doesn't decode &)

// ✅ Correct
decodeURIComponent('hello%20world%26test');  
// → 'hello world&test'

Python:

# ❌ Wrong - quote() instead of unquote()
from urllib.parse import quote
result = quote('hello%20world')  
# → 'hello%2520world' (double encoded!)

# ✅ Correct
from urllib.parse import unquote
result = unquote('hello%20world')  
# → 'hello world'

PHP:

// Plus signs (+) represent spaces in form data
$encoded = 'hello+world';

// ❌ urldecode() treats + as space
$result = urldecode($encoded);  
// → 'hello world'

// ✅ Use rawurldecode() to keep + as literal
$result = rawurldecode($encoded);  
// → 'hello+world'

// Or use urldecode() if + should be space (form data)

The Solution

Fix #1: Know your functions

JavaScript:

decodeURI() - for entire URLs (doesn't decode &, =, ?, etc.)
decodeURIComponent() - for URL parts (decodes everything)

Python:

urllib.parse.unquote() - standard decode
urllib.parse.unquote_plus() - decode + as space (for form data)

PHP:

urldecode() - decode + as space
rawurldecode() - don't decode +

Fix #2: Handle plus signs correctly

// If dealing with form-encoded data where + means space:
function decodeFormData(str) {
  return decodeURIComponent(str.replace(/\+/g, ' '));
}

// Usage
decodeFormData('hello+world');  // → 'hello world'
decodeURIComponent('hello+world');  // → 'hello+world' (+ not decoded)

Fix #3: Test your decode function

const testStrings = [
  'hello%20world',      // Space
  'hello+world',        // Plus
  'hello%2Bworld',      // Encoded plus
  'test%26value',       // Ampersand
  '%E4%B8%AD%E6%96%87',    // UTF-8
];

testStrings.forEach(str => {
  console.log(`Input:  ${str}`);
  console.log(`decodeURI:          ${decodeURI(str)}`);
  console.log(`decodeURIComponent: ${decodeURIComponent(str)}`);
  console.log('---');
});

Prevention

Create wrapper functions:

// Standardize decoding across your application
function safeDecodeParam(str) {
  if (!str) return '';
  
  try {
    // Replace + with space for form data, then decode
    return decodeURIComponent(str.replace(/\+/g, ' '));
  } catch (e) {
    console.error('Decoding error:', e);
    return str;  // Return original on error
  }
}

// Use consistently
const userQuery = safeDecodeParam(params.get('q'));

Quick Test

// Test all decoding functions with same input
const testInput = 'hello%20world%26test';

console.log('Testing:', testInput);
console.log('decodeURI:         ', decodeURI(testInput));
console.log('decodeURIComponent:', decodeURIComponent(testInput));
console.log('Expected:           hello world&test');

Debugging Checklist

When you encounter URL decoding issues, use this checklist:

Valid encoding? Check for malformed percent sequences (%ZZ, %2)
Correct charset? Verify UTF-8 throughout the stack
Single or multi-layer? Count how many times it's encoded
Reserved characters? Ensure proper handling of &, =, ?, etc.
Right function? Using decodeURIComponent() vs decodeURI()?
Plus signs? Are they meant to be spaces or literal +?
Error handling? Wrapped in try-catch?
Sanitized? Validated and sanitized after decoding?

Tools for Debugging

Our URL Decoder: Free online tool with multi-layer detection
Browser DevTools: console.log(decodeURIComponent(str))
URL Parser: Visualize URL components
Hex viewers: See actual byte values

Summary

Error	Quick Fix	Prevention
#1 Incorrect format	Validate before decode	Use proper encoding functions
#2 Encoding mismatch	Standardize on UTF-8	UTF-8 everywhere
#3 Incomplete decode	Decode until stable	Avoid double-encoding
#4 Reserved chars	Use `encodeURIComponent()`	Use URL API
#5 Wrong function	Know your functions	Create wrappers

By understanding and fixing these 5 common errors, you'll handle URL decoding like a pro. Remember: validate inputs, decode carefully, and always test with edge cases!

Avoid these errors instantly with our free URL decoder tool that handles all edge cases automatically!