view docs/usage/performance.rst @ 6532:33b71a130b16

templates: properly escape inline JavaScript values TLDR: Kallithea has issues with escaping values for use in inline JS. Despite judicious poking of the code, no actual security vulnerabilities have been found, just lots of corner-case bugs. This patch fixes those, and hardens the code against actual security issues. The long version: To embed a Python value (typically a 'unicode' plain-text value) in a larger file, it must be escaped in a context specific manner. Example: >>> s = u'<script>alert("It\'s a trap!");</script>' 1) Escaped for insertion into HTML element context >>> print cgi.escape(s) &lt;script&gt;alert("It's a trap!");&lt;/script&gt; 2) Escaped for insertion into HTML element or attribute context >>> print h.escape(s) &lt;script&gt;alert(&#34;It&#39;s a trap!&#34;);&lt;/script&gt; This is the default Mako escaping, as usually used by Kallithea. 3) Encoded as JSON >>> print json.dumps(s) "<script>alert(\"It's a trap!\");</script>" 4) Escaped for insertion into a JavaScript file >>> print '(' + json.dumps(s) + ')' ("<script>alert(\"It's a trap!\");</script>") The parentheses are not actually required for strings, but may be needed to avoid syntax errors if the value is a number or dict (object). 5) Escaped for insertion into a HTML inline <script> element >>> print h.js(s) ("\x3cscript\x3ealert(\"It's a trap!\");\x3c/script\x3e") Here, we need to combine JS and HTML escaping, further complicated by the fact that "<script>" tag contents can either be parsed in XHTML mode (in which case '<', '>' and '&' must additionally be XML escaped) or HTML mode (in which case '</script>' must be escaped, but not using HTML escaping, which is not available in HTML "<script>" tags). Therefore, the XML special characters (which can only occur in string literals) are escaped using JavaScript string literal escape sequences. (This, incidentally, is why modern web security best practices ban all use of inline JavaScript...) Unsurprisingly, Kallithea does not do (5) correctly. In most cases, Kallithea might slap a pair of single quotes around the HTML escaped Python value. A typical benign example: $('#child_link').html('${_('No revisions')}'); This works in English, but if a localized version of the string contains an apostrophe, the result will be broken JavaScript. In the more severe cases, where the text is user controllable, it leaves the door open to injections. In this example, the script inserts the string as HTML, so Mako's implicit HTML escaping makes sense; but in many other cases, HTML escaping is actually an error, because the value is not used by the script in an HTML context. The good news is that the HTML escaping thwarts attempts at XSS, since it's impossible to inject syntactically valid JavaScript of any useful complexity. It does allow JavaScript errors and gibberish to appear on the page, though. In these cases, the escaping has been fixed to use either the new 'h.js' helper, which does JavaScript escaping (but not HTML escaping), OR the new 'h.jshtml' helper (which does both), in those cases where it was unclear if the value might be used (by the script) in an HTML context. Some of these can probably be "relaxed" from h.jshtml to h.js later, but for now, using h.jshtml fixes escaping and doesn't introduce new errors. In a few places, Kallithea JSON encodes values in the controller, then inserts the JSON (without any further escaping) into <script> tags. This is also wrong, and carries actual risk of XSS vulnerabilities. However, in all cases, security vulnerabilities were narrowly avoided due to other filtering in Kallithea. (E.g. many special characters are banned from appearing in usernames.) In these cases, the escaping has been fixed and moved to the template, making it immediately visible that proper escaping has been performed. Mini-FAQ (frequently anticipated questions): Q: Why do everything in one big, hard to review patch? Q: Why add escaping in specific case FOO, it doesn't seem needed? Because the goal here is to have "escape everywhere" as the default policy, rather than identifying individual bugs and fixing them one by one by adding escaping where needed. As such, this patch surely introduces a lot of needless escaping. This is no different from how Mako/Pylons HTML escape everything by default, even when not needed: it's errs on the side of needless work, to prevent erring on the side of skipping required (and security critical) work. As for reviewability, the most important thing to notice is not where escaping has been introduced, but any places where it might have been missed (or where h.jshtml is needed, but h.js is used). Q: The added escaping is kinda verbose/ugly. That is not a question, but yes, I agree. Hopefully it'll encourage us to move away from inline JavaScript altogether. That's a significantly larger job, though; with luck this patch will keep us safe and secure until such a time as we can implement the real fix. Q: Why not use Mako filter syntax ("${val|h.js}")? Because of long-standing Mako bug #140, preventing use of 'h' in filters. Q: Why not work around bug #140, or even use straight "${val|js}"? Because Mako still applies the default h.escape filter before the explicitly specified filters. Q: Where do we go from here? Longer term, we should stop doing variable expansions in script blocks, and instead pass data to JS via e.g. data attributes, or asynchronously using AJAX calls. Once we've done that, we can remove inline JavaScript altogether in favor of separate script files, and set a strict Content Security Policy explicitly blocking inline scripting, and thus also the most common kind of cross-site scripting attack.
author Søren Løvborg <sorenl@unity3d.com>
date Tue, 28 Feb 2017 17:19:00 +0100
parents 692dddf298e2
children 19af3fef3b34
line wrap: on
line source

.. _performance:

================================
Optimizing Kallithea performance
================================

When serving a large amount of big repositories, Kallithea can start performing
slower than expected. Because of the demanding nature of handling large amounts
of data from version control systems, here are some tips on how to get the best
performance.


Fast storage
------------

Kallithea is often I/O bound, and hence a fast disk (SSD/SAN) and plenty of RAM
is usually more important than a fast CPU.


Caching
-------

Tweak beaker cache settings in the ini file. The actual effect of that is
questionable.


Database
--------

SQLite is a good option when having a small load on the system. But due to
locking issues with SQLite, it is not recommended to use it for larger
deployments.

Switching to MySQL or PostgreSQL will result in an immediate performance
increase. A tool like SQLAlchemyGrate_ can be used for migrating to another
database platform.


Horizontal scaling
------------------

Scaling horizontally means running several Kallithea instances and let them
share the load. That can give huge performance benefits when dealing with large
amounts of traffic (many users, CI servers, etc.). Kallithea can be scaled
horizontally on one (recommended) or multiple machines.

It is generally possible to run WSGI applications multithreaded, so that
several HTTP requests are served from the same Python process at once. That can
in principle give better utilization of internal caches and less process
overhead.

One danger of running multithreaded is that program execution becomes much more
complex; programs must be written to consider all combinations of events and
problems might depend on timing and be impossible to reproduce.

Kallithea can't promise to be thread-safe, just like the embedded Mercurial
backend doesn't make any strong promises when used as Kallithea uses it.
Instead, we recommend scaling by using multiple server processes.

Web servers with multiple worker processes (such as ``mod_wsgi`` with the
``WSGIDaemonProcess`` ``processes`` parameter) will work out of the box.

In order to scale horizontally on multiple machines, you need to do the
following:

    - Each instance's ``data`` storage needs to be configured to be stored on a
      shared disk storage, preferably together with repositories. This ``data``
      dir contains template caches, sessions, whoosh index and is used for
      task locking (so it is safe across multiple instances). Set the
      ``cache_dir``, ``index_dir``, ``beaker.cache.data_dir``, ``beaker.cache.lock_dir``
      variables in each .ini file to a shared location across Kallithea instances
    - If using several Celery instances,
      the message broker should be common to all of them (e.g.,  one
      shared RabbitMQ server)
    - Load balance using round robin or IP hash, recommended is writing LB rules
      that will separate regular user traffic from automated processes like CI
      servers or build bots.


Serve static files directly from the web server
-----------------------------------------------

With the default ``static_files`` ini setting, the Kallithea WSGI application
will take care of serving the static files from ``kallithea/public/`` at the
root of the application URL.

The actual serving of the static files is very fast and unlikely to be a
problem in a Kallithea setup - the responses generated by Kallithea from
database and repository content will take significantly more time and
resources.

To serve static files from the web server, use something like this Apache config
snippet::

        Alias /images/ /srv/kallithea/kallithea/kallithea/public/images/
        Alias /css/ /srv/kallithea/kallithea/kallithea/public/css/
        Alias /js/ /srv/kallithea/kallithea/kallithea/public/js/
        Alias /codemirror/ /srv/kallithea/kallithea/kallithea/public/codemirror/
        Alias /fontello/ /srv/kallithea/kallithea/kallithea/public/fontello/

Then disable serving of static files in the ``.ini`` ``app:main`` section::

        static_files = false

If using Kallithea installed as a package, you should be able to find the files
under ``site-packages/kallithea``, either in your Python installation or in your
virtualenv. When upgrading, make sure to update the web server configuration
too if necessary.

It might also be possible to improve performance by configuring the web server
to compress responses (served from static files or generated by Kallithea) when
serving them. That might also imply buffering of responses - that is more
likely to be a problem; large responses (clones or pulls) will have to be fully
processed and spooled to disk or memory before the client will see any
response. See the documentation for your web server.


.. _SQLAlchemyGrate: https://github.com/shazow/sqlalchemygrate