Các phương pháp hay nhất về giải mã URL: Tránh những lỗi thường gặp

20 tháng 1, 2026

URL decoding seems simple—convert %20 back to a space, right? But beneath this simplicity lies a minefield of edge cases, security vulnerabilities, and encoding nightmares that can break your application. This guide reveals the best practices that separate professional developers from those debugging production issues at 3 AM.

Understanding Percent-Encoding Standards

The RFC 3986 Foundation

URL encoding follows RFC 3986, the standard that defines how URLs should be structured and encoded. Understanding this specification is crucial.

Key principles:

Unreserved characters never need encoding:
- Letters: A-Z, a-z
- Numbers: 0-9
- Special characters: -, _, ., ~
Reserved characters have special meaning and must be encoded when used literally: :/?#[]@!$&'()*+,;=
All other characters must be percent-encoded, including spaces and international characters.

The Encoding Format

Percent-encoding follows this pattern:

%XX

Where XX is the hexadecimal representation of the byte value.

Example breakdown:

Character: @
ASCII code: 64 (decimal)
Hexadecimal: 40
Encoded: %40

For multi-byte UTF-8 characters:

Character: 中 (Chinese)
UTF-8 bytes: E4 B8 AD
Encoded: %E4%B8%AD

UTF-8 Handling and International Characters

Why UTF-8 Matters

Modern web applications must handle text in any language. UTF-8 is the universal encoding that makes this possible.

Best Practice #1: Always assume UTF-8

// ✅ Correct - decoders assume UTF-8 by default
const decoded = decodeURIComponent('%E4%B8%AD%E6%96%87');
console.log(decoded);  // "中文"

// ❌ Wrong - trying to use different encodings
// JavaScript's built-in functions only handle UTF-8

Common International Character Scenarios

Chinese/Japanese/Korean Characters:

Encoded: %E4%B8%AD%E6%96%87%E6%B5%8B%E8%AF%95
Decoded: 中文测试
Bytes: 12 (4 characters × 3 bytes each in UTF-8)

Arabic Text (right-to-left):

Encoded: %D8%A7%D9%84%D8%B9%D8%B1%D8%A8%D9%8A%D8%A9
Decoded: العربية

Emoji (4-byte UTF-8):

Encoded: %F0%9F%98%80
Decoded: 😀
Bytes: 4

Best Practice #2: Test with multi-byte characters

Always test your URL decoding with:

Chinese, Japanese, Korean (CJK) characters
Arabic and Hebrew (RTL text)
Emoji and special Unicode symbols
Accented characters (café, naïve)

Handling Encoding Errors

function safeDecodeURIComponent(str) {
  try {
    return decodeURIComponent(str);
  } catch (e) {
    // Handle malformed encodings
    console.error('Invalid URI encoding:', str);
    
    // Option 1: Return original string
    return str;
    
    // Option 2: Replace invalid sequences
    return str.replace(/%(?![0-9A-Fa-f]{2})/g, '%25');
  }
}

// Usage
const result = safeDecodeURIComponent('hello%world');  // Invalid!
// Returns 'hello%world' instead of throwing error

Multi-Layer Decoding Scenarios

Understanding Double Encoding

URLs can get encoded multiple times as they pass through different systems:

Original:       Hello World
1st encoding:   Hello%20World
2nd encoding:   Hello%2520World
3rd encoding:   Hello%252520World

Notice how the % itself gets encoded as %25 with each pass.

Why This Happens

Web frameworks: Some frameworks auto-encode query parameters
Proxies and load balancers: May re-encode URLs
Copy-paste errors: Users copying already-encoded URLs
Nested redirects: OAuth flows with encoded callback URLs

Detecting Multi-Layer Encoding

function countEncodingLayers(str) {
  let count = 0;
  let current = str;
  let previous = '';
  
  while (current !== previous && count < 10) {  // Max 10 to prevent infinite loops
    previous = current;
    try {
      current = decodeURIComponent(current);
      if (current !== previous) {
        count++;
      }
    } catch (e) {
      break;  // Malformed encoding
    }
  }
  
  return count;
}

// Examples
countEncodingLayers('Hello%20World');       // 1
countEncodingLayers('Hello%2520World');     // 2
countEncodingLayers('Hello%252520World');   // 3

The Idempotent Decoding Pattern

Best Practice #3: Decode until stable

function fullyDecode(str) {
  let decoded = str;
  let previous = '';
  let iterations = 0;
  const MAX_ITERATIONS = 10;  // Safety limit
  
  while (decoded !== previous && iterations < MAX_ITERATIONS) {
    previous = decoded;
    try {
      decoded = decodeURIComponent(decoded);
    } catch (e) {
      break;  // Stop on malformed encoding
    }
    iterations++;
  }
  
  return decoded;
}

// Usage
fullyDecode('Hello%252520World');  // → 'Hello World' (decodes 3 times)

⚠️ Warning: This approach assumes all encoding was percent-encoding. If the original string contained literal %20, it will be decoded too.

When NOT to Fully Decode

// Example: A URL parameter that contains another encoded URL
const url = '/redirect?target=https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello';

// Decode once to get the redirect target
const target = decodeURIComponent(url.split('=')[1]);
// → 'https://example.com/search?q=hello'

// If you fully decode, you'd decode the nested query too (usually wrong!)

Best Practice #4: Know your context

Only fully decode when you're certain the string has been accidentally multi-encoded. In most cases, one decode is correct.

Security Considerations

1. Preventing Injection Attacks

Decoded URLs can contain malicious payloads:

XSS (Cross-Site Scripting):

// Dangerous!
const userInput = decodeURIComponent(params.get('message'));
element.innerHTML = userInput;  // ❌ Can inject scripts!

// Encoded attack:
// %3Cscript%3Ealert%28%27XSS%27%29%3C%2Fscript%3E
// Decodes to: <script>alert('XSS')</script>

Best Practice #5: Always sanitize after decoding

// Safe approach
const userInput = decodeURIComponent(params.get('message'));

// Option 1: Use textContent (not innerHTML)
element.textContent = userInput;  // ✅ Safe - treats as text

// Option 2: Use a sanitization library
import DOMPurify from 'dompurify';
element.innerHTML = DOMPurify.sanitize(userInput);  // ✅ Safe

2. Path Traversal Attacks

// Dangerous!
const filename = decodeURIComponent(params.get('file'));
fs.readFile(`/uploads/${filename}`, ...);  // ❌ Vulnerable!

// Attack:
// file=..%2F..%2Fetc%2Fpasswd
// Decodes to: ../../etc/passwd

Best Practice #6: Validate paths after decoding

const filename = decodeURIComponent(params.get('file'));

// Validate: only allow safe characters
if (!/^[a-zA-Z0-9_-]+\.[a-z]{2,4}$/i.test(filename)) {
  throw new Error('Invalid filename');
}

// Or use path.basename to strip directory parts
const path = require('path');
const safeFile = path.basename(filename);  // Removes ../ parts

3. SQL Injection

Even after decoding, never trust user input in SQL:

const search = decodeURIComponent(params.get('query'));

// ❌ Dangerous - SQL injection
db.query(`SELECT * FROM products WHERE name = '${search}'`);

// Attack:
// query=%27%20OR%20%271%27%3D%271
// Decodes to: ' OR '1'='1

Best Practice #7: Use parameterized queries

// ✅ Safe - parameterized query
db.query('SELECT * FROM products WHERE name = ?', [search]);

// Or with named parameters
db.query('SELECT * FROM products WHERE name = :search', { search });

4. URL Redirection Attacks

Open redirect vulnerabilities can phish users:

// Dangerous!
const redirectUrl = decodeURIComponent(params.get('next'));
window.location = redirectUrl;  // ❌ Can redirect anywhere!

// Attack:
// next=https%3A%2F%2Fevil.com%2Fphishing

Best Practice #8: Whitelist redirect destinations

const redirectUrl = decodeURIComponent(params.get('next'));

// Option 1: Whitelist of allowed domains
const allowedDomains = ['example.com', 'app.example.com'];
const url = new URL(redirectUrl, window.location.origin);

if (allowedDomains.includes(url.hostname)) {
  window.location = redirectUrl;  // ✅ Safe
} else {
  throw new Error('Invalid redirect destination');
}

// Option 2: Only allow relative URLs
if (redirectUrl.startsWith('/') && !redirectUrl.startsWith('//')) {
  window.location = redirectUrl;  // ✅ Safe - same origin
}

Performance Considerations

Decoding Large Strings

URL decoding is generally fast, but with very large strings (e.g., Base64-encoded data in URLs), performance matters.

Better Practice #9: Avoid large data in URLs

// ❌ Bad - large data in URL
const largeData = encodeURIComponent(JSON.stringify(bigObject));
window.location = `/api/process?data=${largeData}`;

// ✅ Better - use POST body
fetch('/api/process', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify(bigObject)
});

Caching Decoded Values

If you're decoding the same parameter multiple times:

// ❌ Inefficient - decoding repeatedly
function getUser() {
  return decodeURIComponent(params.get('user'));
}

console.log(getUser());
console.log(getUser());
console.log(getUser());

// ✅ Better - decode once, cache result
const cachedUser = decodeURIComponent(params.get('user'));

console.log(cachedUser);
console.log(cachedUser);
console.log(cachedUser);

Lazy Decoding

For query strings with many parameters you might not use:

// ✅ Good - decode only what you need
const params = new URLSearchParams(window.location.search);

if (needsUserInfo) {
  const user = params.get('user');  // Auto-decoded only when accessed
}

Testing and Validation Strategies

Comprehensive Test Cases

Best Practice #10: Test these edge cases

const testCases = [
  // Basic cases
  { input: 'hello%20world', expected: 'hello world' },
  { input: 'hello+world', expected: 'hello+world' },  // + not decoded by decodeURIComponent
  
  // Special characters
  { input: '%21%40%23%24%25', expected: '!@#$%' },
  
  // International text
  { input: '%E4%B8%AD%E6%96%87', expected: '中文' },
  { input: '%F0%9F%98%80', expected: '😀' },
  
  // Multi-layer encoding
  { input: 'hello%2520world', expected: 'hello%20world' },  // Decode once
  
  // Already decoded
  { input: 'hello world', expected: 'hello world' },
  
  // Empty string
  { input: '', expected: '' },
  
  // Malformed encoding (should error or handle gracefully)
  { input: 'hello%2', shouldError: true },
  { input: 'hello%ZZ', shouldError: true },
];

testCases.forEach(({ input, expected, shouldError }) => {
  try {
    const result = decodeURIComponent(input);
    if (shouldError) {
      console.error(`Expected error for: ${input}`);
    } else {
      console.assert(result === expected, `Failed: ${input}`);
    }
  } catch (e) {
    if (!shouldError) {
      console.error(`Unexpected error for: ${input}`);
    }
  }
});

Validation Functions

// Validate that a string is properly percent-encoded
function isValidPercentEncoded(str) {
  // Check for invalid percent sequences
  const invalidPattern = /%(?![0-9A-Fa-f]{2})/;
  if (invalidPattern.test(str)) {
    return false;
  }
  
  // Try to decode - if it throws, it's invalid
  try {
    decodeURIComponent(str);
    return true;
  } catch (e) {
    return false;
  }
}

// Check if a string needs decoding
function needsDecoding(str) {
  return /%[0-9A-Fa-f]{2}/.test(str);
}

// Usage
if (needsDecoding(userInput) && isValidPercentEncoded(userInput)) {
  const decoded = decodeURIComponent(userInput);
}

Best Practices Summary

#	Best Practice	Why It Matters
1	Always assume UTF-8	Modern web is international
2	Test with multi-byte characters	Catches encoding bugs early
3	Decode until stable (carefully)	Handles accidental multi-encoding
4	Know your decoding context	Prevents over-decoding
5	Always sanitize after decoding	Prevents XSS attacks
6	Validate paths after decoding	Prevents path traversal
7	Use parameterized queries	Prevents SQL injection
8	Whitelist redirect destinations	Prevents open redirects
9	Avoid large data in URLs	Better performance
10	Test edge cases thoroughly	Robust applications

Debugging Tools and Techniques

Visual Inspection

Use our URL Decoder tool to quickly inspect encoded strings:

Input:  %E4%B8%AD%E6%96%87%20test%20%21
Output: 中文 test !

Browser DevTools

// In browser console
const url = new URL(window.location.href);
console.table([...url.searchParams]);  // Shows all params decoded

// Or inspect individual parameters
url.searchParams.forEach((value, key) => {
  console.log(`${key}: ${value}`);
});

Logging Middleware

For Express.js:

app.use((req, res, next) => {
  console.log('Query params (decoded):', req.query);
  console.log('Raw query'string:', req.url.split('?')[1]);
  next();
});

Common Anti-Patterns to Avoid

❌ Anti-Pattern 1: Manual Percent Decoding

// ❌ Don't do this!
function manualDecode(str) {
  return str.replace(/%20/g, ' ')
            .replace(/%21/g, '!')
            .replace(/%40/g, '@');
  // ... you'll never cover all cases
}

// ✅ Use built-in functions
const decoded = decodeURIComponent(str);

❌ Anti-Pattern 2: Decoding Before Validation

// ❌ Wrong order
const decoded = decodeURIComponent(userInput);
if (decoded.includes('admin')) {
  // Security check - but too late!
}

// ✅ Correct order
if (userInput.includes('admin') || decodeURIComponent(userInput).includes('admin')) {
  // Check both encoded and decoded versions
}

❌ Anti-Pattern 3: Ignoring Errors

// ❌ Silent failure
let result;
try {
  result = decodeURIComponent(input);
} catch (e) {
  result = input;  // Silently returns potentially dangerous input
}

// ✅ Proper error handling
try {
  result = decodeURIComponent(input);
} catch (e) {
  console.error('Invalid URL encoding:', e);
  throw new Error('Invalid input encoding');
}

Conclusion

URL decoding is more than just reversing percent-encoding. Professional developers:

Understand UTF-8 and handle international text properly
Recognize and handle multi-layer encoding scenarios
Prioritize security through validation and sanitization
Test thoroughly with edge cases
Use the right tools for debugging

By following these best practices, you'll build robust applications that handle URLs correctly and securely, avoiding the common pitfalls that plague poorly-designed systems.

Test your URL decoding knowledge with our free URL decoder tool and explore URL encoding best practices too!