Have a look at this module: werkzeug.utils. (now in werkzeug.urls
)
The function you are looking for is called "url_fix" and works like this:
>>> from werkzeug.urls import url_fix>>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)')'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29'
It's implemented in Werkzeug as follows:
import urllibimport urlparsedef url_fix(s, charset='utf-8'):"""Sometimes you get an URL by a user that just isn't a real URL because it contains unsafe characters like '' and so on. This function can fix some of the problems in a similar way browsers handle data entered by the user:>>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)')'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29' :param charset: The target charset for the URL if the url was given as unicode string.""" if isinstance(s, unicode): s = s.encode(charset, 'ignore') scheme, netloc, path, qs, anchor = urlparse.urlsplit(s) path = urllib.quote(path, '/%') qs = urllib.quote_plus(qs, ':&=') return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))