changeset 8888:f0fbb0fe4462

git: update check for invalid URL characters to work with Python versions that include an attempt at fixing the very same problem With changes like https://github.com/python/cpython/commit/76cd81d60310d65d01f9d7b48a8985d8ab89c8b4 making it to Python 3.10 and being backported to previous Python versions, the approach in a8a51a3bdb61 no longer works when combined with urllib.parse.urlparse in d2f59de17bef: path will never contain the invalid characters. To catch this case anyway, add a new check to verify that the parsed URL can roundtrip back to the original representation with urllib.parse.urlunparse . The actual exception might vary, but one of them should always fire. There is a risk that the new check will reject some URLs that somehow isn't normalized. No such cases have been found yet.
author Mads Kiilerich <mads@kiilerich.com>
date Tue, 18 May 2021 00:58:06 +0200
parents 070b8c39736f
children de59ad8185e1
files kallithea/lib/vcs/backends/git/repository.py
diffstat 1 files changed, 8 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/kallithea/lib/vcs/backends/git/repository.py	Sun May 09 22:40:56 2021 +0200
+++ b/kallithea/lib/vcs/backends/git/repository.py	Tue May 18 00:58:06 2021 +0200
@@ -192,7 +192,11 @@
         >>> GitRepository._check_url('git://example.com/\t')
         Traceback (most recent call last):
         ...
+        urllib.error.URLError: <urlopen error Invalid ...>
+
+        The failure above will be one of, depending on the level of WhatWG support:
         urllib.error.URLError: <urlopen error Invalid whitespace character in path: '\t'>
+        urllib.error.URLError: <urlopen error Invalid url: 'git://example.com/	' normalizes to 'git://example.com/'>
         """
         try:
             parsed_url = urllib.parse.urlparse(url)
@@ -204,6 +208,10 @@
         if os.path.isabs(url) and os.path.isdir(url):
             return
 
+        unparsed_url = urllib.parse.urlunparse(parsed_url)
+        if unparsed_url != url:
+            raise urllib.error.URLError("Invalid url: '%s' normalizes to '%s'" % (url, unparsed_url))
+
         if parsed_url.scheme == 'git':
             # Mitigate problems elsewhere with incorrect handling of encoded paths.
             # Don't trust urllib.parse.unquote but be prepared for more flexible implementations elsewhere.