view docs/contributing.rst @ 6532:33b71a130b16

templates: properly escape inline JavaScript values TLDR: Kallithea has issues with escaping values for use in inline JS. Despite judicious poking of the code, no actual security vulnerabilities have been found, just lots of corner-case bugs. This patch fixes those, and hardens the code against actual security issues. The long version: To embed a Python value (typically a 'unicode' plain-text value) in a larger file, it must be escaped in a context specific manner. Example: >>> s = u'<script>alert("It\'s a trap!");</script>' 1) Escaped for insertion into HTML element context >>> print cgi.escape(s) &lt;script&gt;alert("It's a trap!");&lt;/script&gt; 2) Escaped for insertion into HTML element or attribute context >>> print h.escape(s) &lt;script&gt;alert(&#34;It&#39;s a trap!&#34;);&lt;/script&gt; This is the default Mako escaping, as usually used by Kallithea. 3) Encoded as JSON >>> print json.dumps(s) "<script>alert(\"It's a trap!\");</script>" 4) Escaped for insertion into a JavaScript file >>> print '(' + json.dumps(s) + ')' ("<script>alert(\"It's a trap!\");</script>") The parentheses are not actually required for strings, but may be needed to avoid syntax errors if the value is a number or dict (object). 5) Escaped for insertion into a HTML inline <script> element >>> print h.js(s) ("\x3cscript\x3ealert(\"It's a trap!\");\x3c/script\x3e") Here, we need to combine JS and HTML escaping, further complicated by the fact that "<script>" tag contents can either be parsed in XHTML mode (in which case '<', '>' and '&' must additionally be XML escaped) or HTML mode (in which case '</script>' must be escaped, but not using HTML escaping, which is not available in HTML "<script>" tags). Therefore, the XML special characters (which can only occur in string literals) are escaped using JavaScript string literal escape sequences. (This, incidentally, is why modern web security best practices ban all use of inline JavaScript...) Unsurprisingly, Kallithea does not do (5) correctly. In most cases, Kallithea might slap a pair of single quotes around the HTML escaped Python value. A typical benign example: $('#child_link').html('${_('No revisions')}'); This works in English, but if a localized version of the string contains an apostrophe, the result will be broken JavaScript. In the more severe cases, where the text is user controllable, it leaves the door open to injections. In this example, the script inserts the string as HTML, so Mako's implicit HTML escaping makes sense; but in many other cases, HTML escaping is actually an error, because the value is not used by the script in an HTML context. The good news is that the HTML escaping thwarts attempts at XSS, since it's impossible to inject syntactically valid JavaScript of any useful complexity. It does allow JavaScript errors and gibberish to appear on the page, though. In these cases, the escaping has been fixed to use either the new 'h.js' helper, which does JavaScript escaping (but not HTML escaping), OR the new 'h.jshtml' helper (which does both), in those cases where it was unclear if the value might be used (by the script) in an HTML context. Some of these can probably be "relaxed" from h.jshtml to h.js later, but for now, using h.jshtml fixes escaping and doesn't introduce new errors. In a few places, Kallithea JSON encodes values in the controller, then inserts the JSON (without any further escaping) into <script> tags. This is also wrong, and carries actual risk of XSS vulnerabilities. However, in all cases, security vulnerabilities were narrowly avoided due to other filtering in Kallithea. (E.g. many special characters are banned from appearing in usernames.) In these cases, the escaping has been fixed and moved to the template, making it immediately visible that proper escaping has been performed. Mini-FAQ (frequently anticipated questions): Q: Why do everything in one big, hard to review patch? Q: Why add escaping in specific case FOO, it doesn't seem needed? Because the goal here is to have "escape everywhere" as the default policy, rather than identifying individual bugs and fixing them one by one by adding escaping where needed. As such, this patch surely introduces a lot of needless escaping. This is no different from how Mako/Pylons HTML escape everything by default, even when not needed: it's errs on the side of needless work, to prevent erring on the side of skipping required (and security critical) work. As for reviewability, the most important thing to notice is not where escaping has been introduced, but any places where it might have been missed (or where h.jshtml is needed, but h.js is used). Q: The added escaping is kinda verbose/ugly. That is not a question, but yes, I agree. Hopefully it'll encourage us to move away from inline JavaScript altogether. That's a significantly larger job, though; with luck this patch will keep us safe and secure until such a time as we can implement the real fix. Q: Why not use Mako filter syntax ("${val|h.js}")? Because of long-standing Mako bug #140, preventing use of 'h' in filters. Q: Why not work around bug #140, or even use straight "${val|js}"? Because Mako still applies the default h.escape filter before the explicitly specified filters. Q: Where do we go from here? Longer term, we should stop doing variable expansions in script blocks, and instead pass data to JS via e.g. data attributes, or asynchronously using AJAX calls. Once we've done that, we can remove inline JavaScript altogether in favor of separate script files, and set a strict Content Security Policy explicitly blocking inline scripting, and thus also the most common kind of cross-site scripting attack.
author Søren Løvborg <sorenl@unity3d.com>
date Tue, 28 Feb 2017 17:19:00 +0100
parents 9c6f717823e1
children 2c3d30095d5e
line wrap: on
line source

.. _contributing:

=========================
Contributing to Kallithea
=========================

Kallithea is developed and maintained by its users. Please join us and scratch
your own itch.


Infrastructure
--------------

The main repository is hosted on Our Own Kallithea (aka OOK) at
https://kallithea-scm.org/repos/kallithea/, our self-hosted instance
of Kallithea.

For now, we use Bitbucket_ for `pull requests`_ and `issue tracking`_. The
issue tracker is for tracking bugs, not for support, discussion, or ideas --
please use the `mailing list`_ or :ref:`IRC <readme>` to reach the community.

We use Weblate_ to translate the user interface messages into languages other
than English. Join our project on `Hosted Weblate`_ to help us.
To register, you can use your Bitbucket or GitHub account. See :ref:`translations`
for more details.


Getting started
---------------

To get started with development::

        hg clone https://kallithea-scm.org/repos/kallithea
        cd kallithea
        virtualenv ../kallithea-venv
        source ../kallithea-venv/bin/activate
        pip install --upgrade pip setuptools
        pip install -e .
        paster make-config Kallithea my.ini
        paster setup-db my.ini --user=user --email=user@example.com --password=password --repos=/tmp
        paster serve my.ini --reload &
        firefox http://127.0.0.1:5000/

You can also start out by forking https://bitbucket.org/conservancy/kallithea
on Bitbucket_ and create a local clone of your own fork.


Running tests
-------------

After finishing your changes make sure all tests pass cleanly. Install the test
dependencies, then run the testsuite by invoking ``py.test`` from the
project root::

    pip install -r dev_requirements.txt
    py.test

Note that testing on Python 2.6 also requires ``unittest2``.

You can also use ``tox`` to run the tests with all supported Python versions
(currently Python 2.6--2.7).

When running tests, Kallithea uses `kallithea/tests/test.ini` and populates the
SQLite database specified there.

It is possible to avoid recreating the full test database on each invocation of
the tests, thus eliminating the initial delay. To achieve this, run the tests as::

    paster serve kallithea/tests/test.ini --pid-file=test.pid --daemon
    KALLITHEA_WHOOSH_TEST_DISABLE=1 KALLITHEA_NO_TMP_PATH=1 py.test
    kill -9 $(cat test.pid)

In these commands, the following variables are used::

    KALLITHEA_WHOOSH_TEST_DISABLE=1 - skip whoosh index building and tests
    KALLITHEA_NO_TMP_PATH=1 - disable new temp path for tests, used mostly for testing_vcs_operations

You can run individual tests by specifying their path as argument to py.test.
py.test also has many more options, see `py.test -h`. Some useful options
are::

    -k EXPRESSION         only run tests which match the given substring
                          expression. An expression is a python evaluable
                          expression where all names are substring-matched
                          against test names and their parent classes. Example:
    -x, --exitfirst       exit instantly on first error or failed test.
    --lf                  rerun only the tests that failed at the last run (or
                          all if none failed)
    --ff                  run all tests but run the last failures first. This
                          may re-order tests and thus lead to repeated fixture
                          setup/teardown
    --pdb                 start the interactive Python debugger on errors.
    -s, --capture=no      don't capture stdout (any stdout output will be
                          printed immediately)


Contribution guidelines
-----------------------

Kallithea is GPLv3 and we assume all contributions are made by the
committer/contributor and under GPLv3 unless explicitly stated. We do care a
lot about preservation of copyright and license information for existing code
that is brought into the project.

Contributions will be accepted in most formats -- such as pull requests on
Bitbucket, something hosted on your own Kallithea instance, or patches sent by
email to the `kallithea-general`_ mailing list.

When contributing via Bitbucket, please make your fork of
https://bitbucket.org/conservancy/kallithea/ `non-publishing`_ -- it is one of
the settings on "Repository details" page. This ensures your commits are in
"draft" phase and makes it easier for you to address feedback and for project
maintainers to integrate your changes.

.. _non-publishing: https://www.mercurial-scm.org/wiki/Phases#Publishing_Repository

Make sure to test your changes both manually and with the automatic tests
before posting.

We care about quality and review and keeping a clean repository history. We
might give feedback that requests polishing contributions until they are
"perfect". We might also rebase and collapse and make minor adjustments to your
changes when we apply them.

We try to make sure we have consensus on the direction the project is taking.
Everything non-sensitive should be discussed in public -- preferably on the
mailing list.  We aim at having all non-trivial changes reviewed by at least
one other core developer before pushing. Obvious non-controversial changes will
be handled more casually.

For now we just have one official branch ("default") and will keep it so stable
that it can be (and is) used in production. Experimental changes should live
elsewhere (for example in a pull request) until they are ready.


Coding guidelines
-----------------

We don't have a formal coding/formatting standard. We are currently using a mix
of Mercurial's (https://www.mercurial-scm.org/wiki/CodingStyle), pep8, and
consistency with existing code. Run ``scripts/run-all-cleanup`` before
committing to ensure some basic code formatting consistency.

We support both Python 2.6.x and 2.7.x and nothing else. For now we don't care
about Python 3 compatibility.

We try to support the most common modern web browsers. IE9 is still supported
to the extent it is feasible, IE8 is not.

We primarily support Linux and OS X on the server side but Windows should also work.

HTML templates should use 2 spaces for indentation ... but be pragmatic. We
should use templates cleverly and avoid duplication. We should use reasonable
semantic markup with element classes and IDs that can be used for styling and testing.
We should only use inline styles in places where it really is semantic (such as
``display: none``).

JavaScript must use ``;`` between/after statements. Indentation 4 spaces. Inline
multiline functions should be indented two levels -- one for the ``()`` and one for
``{}``.
Variables holding jQuery objects should be named with a leading ``$``.

Commit messages should have a leading short line summarizing the changes. For
bug fixes, put ``(Issue #123)`` at the end of this line.

Use American English grammar and spelling overall. Use `English title case`_ for
page titles, button labels, headers, and 'labels' for fields in forms.

.. _English title case: https://en.wikipedia.org/wiki/Capitalization#Title_case

Template helpers (that is, everything in ``kallithea.lib.helpers``)
should only be referenced from templates. If you need to call a
helper from the Python code, consider moving the function somewhere
else (e.g. to the model).

Notes on the SQLAlchemy session
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Each HTTP request runs inside an independent SQLAlchemy session (as well
as in an independent database transaction). Database model objects
(almost) always belong to a particular SQLAlchemy session, which means
that SQLAlchemy will ensure that they're kept in sync with the database
(but also means that they cannot be shared across requests).

Objects can be added to the session using ``Session().add``, but this is
rarely needed:

* When creating a database object by calling the constructor directly,
  it must explicitly be added to the session.

* When creating an object using a factory function (like
  ``create_repo``), the returned object has already (by convention)
  been added to the session, and should not be added again.

* When getting an object from the session (via ``Session().query`` or
  any of the utility functions that look up objects in the database),
  it's already part of the session, and should not be added again.
  SQLAlchemy monitors attribute modifications automatically for all
  objects it knows about and syncs them to the database.

SQLAlchemy also flushes changes to the database automatically; manually
calling ``Session().flush`` is usually only necessary when the Python
code needs the database to assign an "auto-increment" primary key ID to
a freshly created model object (before flushing, the ID attribute will
be ``None``).


"Roadmap"
---------

We do not have a road map but are waiting for your contributions. Refer to the
wiki_ for some ideas of places we might want to go -- contributions in these
areas are very welcome.


Thank you for your contribution!
--------------------------------


.. _Weblate: http://weblate.org/
.. _issue tracking: https://bitbucket.org/conservancy/kallithea/issues?status=new&status=open
.. _pull requests: https://bitbucket.org/conservancy/kallithea/pull-requests
.. _bitbucket: http://bitbucket.org/
.. _mailing list: http://lists.sfconservancy.org/mailman/listinfo/kallithea-general
.. _kallithea-general: http://lists.sfconservancy.org/mailman/listinfo/kallithea-general
.. _Hosted Weblate: https://hosted.weblate.org/projects/kallithea/kallithea/
.. _wiki: https://bitbucket.org/conservancy/kallithea/wiki/Home