Quantcast
Channel: User Armin Ronacher - Stack Overflow
Viewing all articles
Browse latest Browse all 43

Answer by Armin Ronacher for How can I normalize a URL in python

$
0
0

Have a look at this module: werkzeug.utils. (now in werkzeug.urls)

The function you are looking for is called "url_fix" and works like this:

>>> from werkzeug.urls import url_fix>>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)')'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29'

It's implemented in Werkzeug as follows:

import urllibimport urlparsedef url_fix(s, charset='utf-8'):"""Sometimes you get an URL by a user that just isn't a real    URL because it contains unsafe characters like '' and so on.  This    function can fix some of the problems in a similar way browsers    handle data entered by the user:>>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)')'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29'    :param charset: The target charset for the URL if the url was                    given as unicode string."""    if isinstance(s, unicode):        s = s.encode(charset, 'ignore')    scheme, netloc, path, qs, anchor = urlparse.urlsplit(s)    path = urllib.quote(path, '/%')    qs = urllib.quote_plus(qs, ':&=')    return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))

Viewing all articles
Browse latest Browse all 43

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>