Mercurial > kallithea
changeset 8888:f0fbb0fe4462
git: update check for invalid URL characters to work with Python versions that include an attempt at fixing the very same problem
With changes like
https://github.com/python/cpython/commit/76cd81d60310d65d01f9d7b48a8985d8ab89c8b4
making it to Python 3.10 and being backported to previous Python versions, the
approach in a8a51a3bdb61 no longer works when combined with
urllib.parse.urlparse in d2f59de17bef: path will never contain the invalid
characters.
To catch this case anyway, add a new check to verify that the parsed URL can
roundtrip back to the original representation with urllib.parse.urlunparse .
The actual exception might vary, but one of them should always fire.
There is a risk that the new check will reject some URLs that somehow isn't
normalized. No such cases have been found yet.
author | Mads Kiilerich <mads@kiilerich.com> |
---|---|
date | Tue, 18 May 2021 00:58:06 +0200 |
parents | 070b8c39736f |
children | de59ad8185e1 |
files | kallithea/lib/vcs/backends/git/repository.py |
diffstat | 1 files changed, 8 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/kallithea/lib/vcs/backends/git/repository.py Sun May 09 22:40:56 2021 +0200 +++ b/kallithea/lib/vcs/backends/git/repository.py Tue May 18 00:58:06 2021 +0200 @@ -192,7 +192,11 @@ >>> GitRepository._check_url('git://example.com/\t') Traceback (most recent call last): ... + urllib.error.URLError: <urlopen error Invalid ...> + + The failure above will be one of, depending on the level of WhatWG support: urllib.error.URLError: <urlopen error Invalid whitespace character in path: '\t'> + urllib.error.URLError: <urlopen error Invalid url: 'git://example.com/ ' normalizes to 'git://example.com/'> """ try: parsed_url = urllib.parse.urlparse(url) @@ -204,6 +208,10 @@ if os.path.isabs(url) and os.path.isdir(url): return + unparsed_url = urllib.parse.urlunparse(parsed_url) + if unparsed_url != url: + raise urllib.error.URLError("Invalid url: '%s' normalizes to '%s'" % (url, unparsed_url)) + if parsed_url.scheme == 'git': # Mitigate problems elsewhere with incorrect handling of encoded paths. # Don't trust urllib.parse.unquote but be prepared for more flexible implementations elsewhere.