Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gruns/furl

🌐 URL parsing and manipulation made easy.
https://github.com/gruns/furl

library manipulating-urls python python3 url url-manipulation url-parsing urls

Last synced: 1 day ago
JSON representation

🌐 URL parsing and manipulation made easy.

Awesome Lists containing this project

README

        


furl






## furl is a small Python library that makes parsing and
manipulating URLs easy.

Python's standard
[urllib](https://docs.python.org/3/library/urllib.html) and
[urlparse](https://docs.python.org/3/library/urllib.parse.html) modules
provide a number of URL related functions, but using these functions to
perform common URL operations proves tedious. Furl makes parsing and
manipulating URLs easy.

Furl is well tested, [Unlicensed](http://unlicense.org/) in the public
domain, and supports Python 3 and PyPy3.

πŸ‘₯ Furl is looking for a lead contributor and maintainer. Would you love
to lead furl, and making working with URLs a joy for everyone in Python?
Please [reach out]([email protected]) and let me know! πŸ™Œ

Code time: Paths and query arguments are easy. Really easy.

```python
>>> from furl import furl
>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f /= 'path'
>>> del f.args['one']
>>> f.args['three'] = '3'
>>> f.url
'http://www.google.com/path?two=2&three=3'
```

Or use furl's inline modification methods.

```python
>>> furl('http://www.google.com/?one=1').add({'two':'2'}).url
'http://www.google.com/?one=1&two=2'

>>> furl('http://www.google.com/?one=1&two=2').set({'three':'3'}).url
'http://www.google.com/?three=3'

>>> furl('http://www.google.com/?one=1&two=2').remove(['one']).url
'http://www.google.com/?two=2'
```

Encoding is handled for you. Unicode, too.

```python
>>> f = furl('http://www.google.com/')
>>> f.path = 'some encoding here'
>>> f.args['and some encoding'] = 'here, too'
>>> f.url
'http://www.google.com/some%20encoding%20here?and+some+encoding=here,+too'
>>> f.set(host=u'ドパむン.γƒ†γ‚Ήγƒˆ', path=u'Π΄ΠΆΠΊ', query=u'β˜ƒ=☺')
>>> f.url
'http://xn--eckwd4c7c.xn--zckzah/%D0%B4%D0%B6%D0%BA?%E2%98%83=%E2%98%BA'
```

Fragments also have a path and a query.

```python
>>> f = furl('http://www.google.com/')
>>> f.fragment.path.segments = ['two', 'directories']
>>> f.fragment.args = {'one': 'argument'}
>>> f.url
'http://www.google.com/#two/directories?one=argument'
```

## Installation

Installing furl with pip is easy.

```
$ pip install furl
```

## API

* [Basics](#basics)
* [Scheme, Username, Password, Host, Port, Network Location, and Origin](#scheme-username-password-host-port-network-location-and-origin)
* [Path](#path)
* [Manipulation](#manipulation)
* [Query](#query)
* [Manipulation](#manipulation-1)
* [Parameters](#parameters)
* [Fragment](#fragment)
* [Encoding](#encoding)
* [Inline manipulation](#inline-manipulation)
* [Miscellaneous](#miscellaneous)

### Basics

furl objects let you access and modify the various components of a URL.

```
scheme://username:password@host:port/path?query#fragment
```

* __scheme__ is the scheme string (all lowercase) or None. None means no
scheme. An empty string means a protocol relative URL, like
`//www.google.com`.
* __username__ is the username string for authentication.
* __password__ is the password string for authentication with __username__.
* __host__ is the domain name, IPv4, or IPv6 address as a string. Domain names
are all lowercase.
* __port__ is an integer or None. A value of None means no port specified and
the default port for the given __scheme__ should be inferred, if possible
(e.g. port 80 for the scheme `http`).
* __path__ is a Path object comprised of path segments.
* __query__ is a Query object comprised of key:value query arguments.
* __fragment__ is a Fragment object comprised of a Path object and Query object
separated by an optional `?` separator.

### Scheme, Username, Password, Host, Port, Network Location, and Origin

__scheme__, __username__, __password__, and __host__ are strings or
None. __port__ is an integer or None.

```python
>>> f = furl('http://user:[email protected]:99/')
>>> f.scheme, f.username, f.password, f.host, f.port
('http', 'user', 'pass', 'www.google.com', 99)
```

furl infers the default port for common schemes.

```python
>>> f = furl('https://secure.google.com/')
>>> f.port
443

>>> f = furl('unknown://www.google.com/')
>>> print(f.port)
None
```

__netloc__ is the string combination of __username__, __password__, __host__,
and __port__, not including __port__ if it's None or the default port for the
provided __scheme__.

```python
>>> furl('http://www.google.com/').netloc
'www.google.com'

>>> furl('http://www.google.com:99/').netloc
'www.google.com:99'

>>> furl('http://user:[email protected]:99/').netloc
'user:[email protected]:99'
```

__origin__ is the string combination of __scheme__, __host__, and __port__, not
including __port__ if it's None or the default port for the provided __scheme__.

```python
>>> furl('http://www.google.com/').origin
'http://www.google.com'

>>> furl('http://www.google.com:99/').origin
'http://www.google.com:99'
```

### Path

URL paths in furl are Path objects that have __segments__, a list of zero or
more path segments that can be manipulated directly. Path segments in
__segments__ are percent-decoded and all interaction with __segments__ should
take place with percent-decoded strings.

```python
>>> f = furl('http://www.google.com/a/large%20ish/path')
>>> f.path
Path('/a/large ish/path')
>>> f.path.segments
['a', 'large ish', 'path']
>>> str(f.path)
'/a/large%20ish/path'
```

#### Manipulation

```python
>>> f.path.segments = ['a', 'new', 'path', '']
>>> str(f.path)
'/a/new/path/'

>>> f.path = 'o/hi/there/with%20some%20encoding/'
>>> f.path.segments
['o', 'hi', 'there', 'with some encoding', '']
>>> str(f.path)
'/o/hi/there/with%20some%20encoding/'

>>> f.url
'http://www.google.com/o/hi/there/with%20some%20encoding/'

>>> f.path.segments = ['segments', 'are', 'maintained', 'decoded', '^`<>[]"#/?']
>>> str(f.path)
'/segments/are/maintained/decoded/%5E%60%3C%3E%5B%5D%22%23%2F%3F'
```

A path that starts with `/` is considered absolute, and a Path can be absolute
or not as specified (or set) by the boolean attribute __isabsolute__. URL Paths
have a special restriction: they must be absolute if a __netloc__ (username,
password, host, and/or port) is present. This restriction exists because a URL
path must start with `/` to separate itself from the __netloc__, if
present. Fragment Paths have no such limitation and __isabsolute__ and can be
True or False without restriction.

Here's a URL Path example that illustrates how __isabsolute__ becomes True and
read-only in the presence of a __netloc__.

```python
>>> f = furl('/url/path')
>>> f.path.isabsolute
True
>>> f.path.isabsolute = False
>>> f.url
'url/path'
>>> f.host = 'blaps.ru'
>>> f.url
'blaps.ru/url/path'
>>> f.path.isabsolute
True
>>> f.path.isabsolute = False
Traceback (most recent call last):
...
AttributeError: Path.isabsolute is True and read-only for URLs with a netloc (a username, password, host, and/or port). URL paths must be absolute if a netloc exists.
>>> f.url
'blaps.ru/url/path'
```

Conversely, the __isabsolute__ attribute of Fragment Paths isn't bound by the
same read-only restriction. URL fragments are always prefixed by a `#` character
and don't need to be separated from the __netloc__.

```python
>>> f = furl('http://www.google.com/#/absolute/fragment/path/')
>>> f.fragment.path.isabsolute
True
>>> f.fragment.path.isabsolute = False
>>> f.url
'http://www.google.com/#absolute/fragment/path/'
>>> f.fragment.path.isabsolute = True
>>> f.url
'http://www.google.com/#/absolute/fragment/path/'
```

A path that ends with `/` is considered a directory, and otherwise considered a
file. The Path attribute __isdir__ returns True if the path is a directory,
False otherwise. Conversely, the attribute __isfile__ returns True if the path
is a file, False otherwise.

```python
>>> f = furl('http://www.google.com/a/directory/')
>>> f.path.isdir
True
>>> f.path.isfile
False

>>> f = furl('http://www.google.com/a/file')
>>> f.path.isdir
False
>>> f.path.isfile
True
```

A path can be normalized with __normalize()__, and __normalize()__ returns the
Path object for method chaining.

```python
>>> f = furl('http://www.google.com////a/./b/lolsup/../c/')
>>> f.path.normalize()
>>> f.url
'http://www.google.com/a/b/c/'
```

Path segments can also be appended with the slash operator, like with
[pathlib.Path](https://docs.python.org/3/library/pathlib.html#operators).

```python
>>> from __future__ import division # For Python 2.x.
>>>
>>> f = furl('path')
>>> f.path /= 'with'
>>> f.path = f.path / 'more' / 'path segments/'
>>> f.url
'/path/with/more/path%20segments/'
```

For a dictionary representation of a path, use __asdict()__.

```python
>>> f = furl('http://www.google.com/some/enc%20oding')
>>> f.path.asdict()
{ 'encoded': '/some/enc%20oding',
'isabsolute': True,
'isdir': False,
'isfile': True,
'segments': ['some', 'enc oding'] }
```

### Query

URL queries in furl are Query objects that have __params__, a one dimensional
[ordered multivalue dictionary](https://github.com/gruns/orderedmultidict) of
query keys and values. Query keys and values in __params__ are percent-decoded
and all interaction with __params__ should take place with percent-decoded
strings.

```python
>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f.query
Query('one=1&two=2')
>>> f.query.params
omdict1D([('one', '1'), ('two', '2')])
>>> str(f.query)
'one=1&two=2'
```

furl objects and Fragment objects (covered below) contain a Query object, and
__args__ is provided as a shortcut on these objects to access __query.params__.

```python
>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f.query.params
omdict1D([('one', '1'), ('two', '2')])
>>> f.args
omdict1D([('one', '1'), ('two', '2')])
>>> f.args is f.query.params
True
```

#### Manipulation

__params__ is a one dimensional
[ordered multivalue dictionary](https://github.com/gruns/orderedmultidict) that
maintains method parity with Python's standard dictionary.

```python
>>> f.query = 'silicon=14&iron=26&inexorable%20progress=vae%20victus'
>>> f.query.params
omdict1D([('silicon', '14'), ('iron', '26'), ('inexorable progress', 'vae victus')])
>>> del f.args['inexorable progress']
>>> f.args['magnesium'] = '12'
>>> f.args
omdict1D([('silicon', '14'), ('iron', '26'), ('magnesium', '12')])
```

__params__ can also store multiple values for the same key because it's a
multivalue dictionary.

```python
>>> f = furl('http://www.google.com/?space=jams&space=slams')
>>> f.args['space']
'jams'
>>> f.args.getlist('space')
['jams', 'slams']
>>> f.args.addlist('repeated', ['1', '2', '3'])
>>> str(f.query)
'space=jams&space=slams&repeated=1&repeated=2&repeated=3'
>>> f.args.popvalue('space')
'slams'
>>> f.args.popvalue('repeated', '2')
'2'
>>> str(f.query)
'space=jams&repeated=1&repeated=3'
```

__params__ is one dimensional. If a list of values is provided as a query value,
that list is interpreted as multiple values.

```python
>>> f = furl()
>>> f.args['repeated'] = ['1', '2', '3']
>>> f.add(args={'space':['jams', 'slams']})
>>> str(f.query)
'repeated=1&repeated=2&repeated=3&space=jams&space=slams'
```

This makes sense: URL queries are inherently one dimensional -- query values
can't have native subvalues.

See the [orderedmultimdict](https://github.com/gruns/orderedmultidict)
documentation for more information on interacting with the ordered multivalue
dictionary __params__.

#### Parameters

To produce an empty query argument, like `http://sprop.su/?param=`, set the
argument's value to the empty string.

```python
>>> f = furl('http://sprop.su')
>>> f.args['param'] = ''
>>> f.url
'http://sprop.su/?param='
```

To produce an empty query argument without a trailing `=`, use `None` as the
parameter value.

```python
>>> f = furl('http://sprop.su')
>>> f.args['param'] = None
>>> f.url
'http://sprop.su/?param'
```

__encode(delimiter='&', quote_plus=True, dont_quote='')__ can be used to encode
query strings with delimiters like `;`, encode spaces as `+` instead of `%20`
(i.e. application/x-www-form-urlencoded encoded), or avoid percent-encoding
valid query characters entirely (valid query characters are
`/?:@-._~!$&'()*+,;=`).

```python
>>> f.query = 'space=jams&woofs=squeeze+dog'
>>> f.query.encode()
'space=jams&woofs=squeeze+dog'
>>> f.query.encode(';')
'space=jams;woofs=squeeze+dog'
>>> f.query.encode(quote_plus=False)
'space=jams&woofs=squeeze%20dog'
```

`dont_quote` accepts `True`, `False`, or a string of valid query characters to
not percent-enode. If `True`, all valid query characters `/?:@-._~!$&'()*+,;=`
aren't percent-encoded.

```python
>>> f.query = 'one,two/three'
>>> f.query.encode()
'one%2Ctwo%2Fthree'
>>> f.query.encode(dont_quote=True)
'one,two/three'
>>> f.query.encode(dont_quote=',')
'one,two%2Fthree'
```

For a dictionary representation of a query, use __asdict()__.

```python
>>> f = furl('http://www.google.com/?space=ja+ms&space=slams')
>>> f.query.asdict()
{ 'encoded': 'space=ja+ms&space=slams',
'params': [('space', 'ja ms'),
('space', 'slams')] }
```

### Fragment

URL fragments in furl are Fragment objects that have a Path __path__ and Query
__query__ separated by an optional `?` __separator__.

```python
>>> f = furl('http://www.google.com/#/fragment/path?with=params')
>>> f.fragment
Fragment('/fragment/path?with=params')
>>> f.fragment.path
Path('/fragment/path')
>>> f.fragment.query
Query('with=params')
>>> f.fragment.separator
True
```

Manipulation of Fragments is done via the Fragment's Path and Query instances,
__path__ and __query__.

```python
>>> f = furl('http://www.google.com/#/fragment/path?with=params')
>>> str(f.fragment)
'/fragment/path?with=params'
>>> f.fragment.path.segments.append('file.ext')
>>> str(f.fragment)
'/fragment/path/file.ext?with=params'

>>> f = furl('http://www.google.com/#/fragment/path?with=params')
>>> str(f.fragment)
'/fragment/path?with=params'
>>> f.fragment.args['new'] = 'yep'
>>> str(f.fragment)
'/fragment/path?new=yep&with=params'
```

Creating hash-bang fragments with furl illustrates the use of Fragment's boolean
attribute __separator__. When __separator__ is False, the `?` that separates
__path__ and __query__ isn't included.

```python
>>> f = furl('http://www.google.com/')
>>> f.fragment.path = '!'
>>> f.fragment.args = {'a':'dict', 'of':'args'}
>>> f.fragment.separator
True
>>> str(f.fragment)
'!?a=dict&of=args'

>>> f.fragment.separator = False
>>> str(f.fragment)
'!a=dict&of=args'
>>> f.url
'http://www.google.com/#!a=dict&of=args'
```

For a dictionary representation of a fragment, use __asdict()__.

```python
>>> f = furl('http://www.google.com/#path?args=args')
>>> f.fragment.asdict()
{ 'encoded': 'path?args=args',
'separator': True,
'path': { 'encoded': 'path',
'isabsolute': False,
'isdir': False,
'isfile': True,
'segments': ['path']},
'query': { 'encoded': 'args=args',
'params': [('args', 'args')]} }
```

### Encoding

Furl handles encoding for you, and furl's philosophy on encoding is simple: raw
URL strings should always be percent-encoded.

```python
>>> f = furl()
>>> f.netloc = '%40user:%[email protected]'
>>> f.username, f.password
'@user', ':pass'

>>> f = furl()
>>> f.path = 'supply%20percent%20encoded/path%20strings'
>>> f.path.segments
['supply percent encoded', 'path strings']

>>> f.set(query='supply+percent+encoded=query+strings,+too')
>>> f.query.params
omdict1D([('supply percent encoded', 'query strings, too')])

>>> f.set(fragment='percent%20encoded%20path?and+percent+encoded=query+too')
>>> f.fragment.path.segments
['percent encoded path']
>>> f.fragment.args
omdict1D([('and percent encoded', 'query too')])
```

Raw, non-URL strings should never be percent-encoded.

```python
>>> f = furl('http://google.com')
>>> f.set(username='@prap', password=':porps')
>>> f.url
'http://%40prap:%[email protected]'

>>> f = furl()
>>> f.set(path=['path segments are', 'decoded', '<>[]"#'])
>>> str(f.path)
'/path%20segments%20are/decoded/%3C%3E%5B%5D%22%23'

>>> f.set(args={'query parameters':'and values', 'are':'decoded, too'})
>>> str(f.query)
'query+parameters=and+values&are=decoded,+too'

>>> f.fragment.path.segments = ['decoded', 'path segments']
>>> f.fragment.args = {'and decoded':'query parameters and values'}
>>> str(f.fragment)
'decoded/path%20segments?and+decoded=query+parameters+and+values'
```

Python's
[urllib.quote()](http://docs.python.org/library/urllib.html#urllib.quote) and
[urllib.unquote()](http://docs.python.org/library/urllib.html#urllib.unquote)
can be used to percent-encode and percent-decode path strings. Similarly,
[urllib.quote_plus()](http://docs.python.org/library/urllib.html#urllib.quote_plus)
and
[urllib.unquote_plus()](http://docs.python.org/library/urllib.html#urllib.unquote_plus)
can be used to percent-encode and percent-decode query strings.

### Inline manipulation

For quick, single-line URL manipulation, the __add()__, __set()__, and
__remove()__ methods of furl objects manipulate various URL components and
return the furl object for method chaining.

```python
>>> url = 'http://www.google.com/#fragment'
>>> furl(url).add(args={'example':'arg'}).set(port=99).remove(fragment=True).url
'http://www.google.com:99/?example=arg'
```

__add()__ adds items to a furl object with the optional arguments

* __args__: Shortcut for __query_params__.
* __path__: A list of path segments to add to the existing path segments, or a
path string to join with the existing path string.
* __query_params__: A dictionary of query keys and values to add to the query.
* __fragment_path__: A list of path segments to add to the existing fragment
path segments, or a path string to join with the existing fragment path
string.
* __fragment_args__: A dictionary of query keys and values to add to the
fragment's query.

```python
>>> f = furl('http://www.google.com/').add(
... path='/search', fragment_path='frag/path', fragment_args={'frag':'arg'})
>>> f.url
'http://www.google.com/search#frag/path?frag=args'
```

__set()__ sets items of a furl object with the optional arguments

* __args__: Shortcut for __query_params__.
* __path__: List of path segments or a path string to adopt.
* __scheme__: Scheme string to adopt.
* __netloc__: Network location string to adopt.
* __origin__: Origin string to adopt.
* __query__: Query string to adopt.
* __query_params__: A dictionary of query keys and values to adopt.
* __fragment__: Fragment string to adopt.
* __fragment_path__: A list of path segments to adopt for the fragment's path
or a path string to adopt as the fragment's path.
* __fragment_args__: A dictionary of query keys and values for the fragment's
query to adopt.
* __fragment_separator__: Boolean whether or not there should be a `?`
separator between the fragment path and the fragment query.
* __host__: Host string to adopt.
* __port__: Port number to adopt.
* __username__: Username string to adopt.
* __password__: password string to adopt.

```python
>>> f = furl().set(
... scheme='https', host='secure.google.com', port=99, path='index.html',
... args={'some':'args'}, fragment='great job')
>>> f.url
'https://secure.google.com:99/index.html?some=args#great%20job'
```

__remove()__ removes items from a furl object with the optional arguments

* __args__: Shortcut for __query_params__.
* __path__: A list of path segments to remove from the end of the existing path
segments list, or a path string to remove from the end of the existing
path string, or True to remove the entire path portion of the URL.
* __query__: A list of query keys to remove from the query, if they exist, or
True to remove the entire query portion of the URL.
* __query_params__: A list of query keys to remove from the query, if they
exist.
* __fragment__: If True, remove the entire fragment portion of the URL.
* __fragment_path__: A list of path segments to remove from the end of the
fragment's path segments, or a path string to remove from the end of the
fragment's path string, or True to remove the entire fragment path.
* __fragment_args__: A list of query keys to remove from the fragment's query,
if they exist.
* __username__: If True, remove the username, if it exists.
* __password__: If True, remove the password, if it exists.

```python
>>> url = 'https://secure.google.com:99/a/path/?some=args#great job'
>>> furl(url).remove(args=['some'], path='path/', fragment=True, port=True).url
'https://secure.google.com/a/'
```

### Miscellaneous

Like [pathlib.Path](https://docs.python.org/3/library/pathlib.html#operators),
path segments can be appended to a furl object's Path with the slash operator.

```python
>>> from __future__ import division # For Python 2.x.
>>> f = furl('http://www.google.com/path?example=arg#frag')
>>> f /= 'add'
>>> f = f / 'seg ments/'
>>> f.url
'http://www.google.com/path/add/seg%20ments/?example=arg#frag'
```

__tostr(query_delimiter='&', query_quote_plus=True, query_dont_quote='')__
creates and returns a URL string. `query_delimiter`, `query_quote_plus`, and
`query_dont_quote` are passed unmodified to `Query.encode()` as `delimiter`,
`quote_plus`, and `dont_quote` respectively.

```python
>>> f = furl('http://spep.ru/?a+b=c+d&two%20tap=cat%20nap%24')
>>> f.tostr()
'http://spep.ru/?a+b=c+d&two+tap=cat+nap$'
>>> f.tostr(query_delimiter=';', query_quote_plus=False)
'http://spep.ru/?a%20b=c%20d;two%20tap=cat%20nap$'
>>> f.tostr(query_dont_quote='$')
'http://spep.ru/?a+b=c+d&two+tap=cat+nap$'
```

`furl.url` is a shortcut for `furl.tostr()`.

```python
>>> f.url
'http://spep.ru/?a+b=c+d&two+tap=cat+nap$'
>>> f.url == f.tostr() == str(f)
True
```

__copy()__ creates and returns a new furl object with an identical URL.

```python
>>> f = furl('http://www.google.com')
>>> f.copy().set(path='/new/path').url
'http://www.google.com/new/path'
>>> f.url
'http://www.google.com'
```

__join()__ joins the furl object's URL with the provided relative or absolute
URL and returns the furl object for method chaining. __join()__'s action is the
same as navigating to the provided URL from the current URL in a web browser.

```python
>>> f = furl('http://www.google.com')
>>> f.join('new/path').url
'http://www.google.com/new/path'
>>> f.join('replaced').url
'http://www.google.com/new/replaced'
>>> f.join('../parent').url
'http://www.google.com/parent'
>>> f.join('path?query=yes#fragment').url
'http://www.google.com/path?query=yes#fragment'
>>> f.join('unknown://www.yahoo.com/new/url/').url
'unknown://www.yahoo.com/new/url/'
```

For a dictionary representation of a URL, use __asdict()__.

```python
>>> f = furl('https://xn--eckwd4c7c.xn--zckzah/path?args=args#frag')
>>> f.asdict()
{ 'url': 'https://xn--eckwd4c7c.xn--zckzah/path?args=args#frag',
'scheme': 'https',
'username': None
'password': None,
'host': 'ドパむン.γƒ†γ‚Ήγƒˆ',
'host_encoded': 'xn--eckwd4c7c.xn--zckzah',
'port': 443,
'netloc': 'xn--eckwd4c7c.xn--zckzah',
'origin': 'https://xn--eckwd4c7c.xn--zckzah',
'path': { 'encoded': '/path',
'isabsolute': True,
'isdir': False,
'isfile': True,
'segments': ['path']},
'query': { 'encoded': 'args=args',
'params': [('args', 'args')]},
'fragment': { 'encoded': 'frag',
'path': { 'encoded': 'frag',
'isabsolute': False,
'isdir': False,
'isfile': True,
'segments': ['frag']},
'query': { 'encoded': '',
'params': []},
'separator': True} }
```