annotate kallithea/lib/indexers/__init__.py @ 7811:0a277465fddf

scripts: initial run of import cleanup using isort
author Mads Kiilerich <mads@kiilerich.com>
date Wed, 07 Aug 2019 00:25:02 +0200
parents 19af3fef3b34
children e2b9731cb2fb
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
1 # -*- coding: utf-8 -*-
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
2 # This program is free software: you can redistribute it and/or modify
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
3 # it under the terms of the GNU General Public License as published by
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
4 # the Free Software Foundation, either version 3 of the License, or
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
5 # (at your option) any later version.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
6 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
7 # This program is distributed in the hope that it will be useful,
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
10 # GNU General Public License for more details.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
11 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
12 # You should have received a copy of the GNU General Public License
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
13 # along with this program. If not, see <http://www.gnu.org/licenses/>.
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
14 """
5376
0ad053c172fa cleanup: make module self-naming consistent
Mads Kiilerich <madski@unity3d.com>
parents: 5375
diff changeset
15 kallithea.lib.indexers
0ad053c172fa cleanup: make module self-naming consistent
Mads Kiilerich <madski@unity3d.com>
parents: 5375
diff changeset
16 ~~~~~~~~~~~~~~~~~~~~~~
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
17
4212
24c0d584ba86 General renaming to Kallithea
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 4211
diff changeset
18 Whoosh indexing module for Kallithea
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
19
4211
1948ede028ef RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 4208
diff changeset
20 This file was forked by the Kallithea project in July 2014.
1948ede028ef RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 4208
diff changeset
21 Original author and date, and relevant copyright and licensing information is below:
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
22 :created_on: Aug 17, 2010
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
23 :author: marcink
4211
1948ede028ef RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 4208
diff changeset
24 :copyright: (c) 2013 RhodeCode GmbH, and others.
4208
ad38f9f93b3b Correct licensing information in individual files.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 4187
diff changeset
25 :license: GPLv3, see LICENSE.md for more details.
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
26 """
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
27
7811
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
28 import logging
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
29 import os
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
30 import sys
5998
037efd94e955 cleanup: get rid of dn as shortcut for os.path.dirname
domruf <dominikruf@gmail.com>
parents: 5997
diff changeset
31 from os.path import dirname
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
32
7811
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
33 from whoosh.analysis import IDTokenizer, LowercaseFilter, RegexTokenizer
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
34 from whoosh.fields import BOOLEAN, DATETIME, ID, NUMERIC, STORED, TEXT, FieldType, Schema
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
35 from whoosh.formats import Characters
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
36 from whoosh.highlight import ContextFragmenter, HtmlFormatter
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
37 from whoosh.highlight import highlight as whoosh_highlight
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
38
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
39 from kallithea.lib.utils2 import LazyProperty
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
40
0a277465fddf scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents: 7626
diff changeset
41
4175
e9f6b533a8f6 Remove wrong/unnecessary/unfixable comment(s)
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 4116
diff changeset
42 # Add location of top level folder to sys.path
5998
037efd94e955 cleanup: get rid of dn as shortcut for os.path.dirname
domruf <dominikruf@gmail.com>
parents: 5997
diff changeset
43 sys.path.append(dirname(dirname(dirname(os.path.realpath(__file__)))))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
44
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
45
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
46 log = logging.getLogger(__name__)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
47
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
48 # CUSTOM ANALYZER wordsplit + lowercase filter
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
49 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
50
6478
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
51 # CUSTOM ANALYZER wordsplit + lowercase filter, for emailaddr-like text
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
52 #
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
53 # This is useful to:
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
54 # - avoid removing "stop words" from text
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
55 # - search case-insensitively
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
56 #
6864
7691290837d2 codingstyle: trivial whitespace fixes
Lars Kruse <devel@sumpfralle.de>
parents: 6478
diff changeset
57 EMAILADDRANALYZER = RegexTokenizer() | LowercaseFilter()
6478
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
58
6475
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
59 # CUSTOM ANALYZER raw-string + lowercase filter
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
60 #
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
61 # This is useful to:
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
62 # - avoid tokenization
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
63 # - avoid removing "stop words" from text
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
64 # - search case-insensitively
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
65 #
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
66 ICASEIDANALYZER = IDTokenizer() | LowercaseFilter()
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
67
6476
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
68 # CUSTOM ANALYZER raw-string
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
69 #
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
70 # This is useful to:
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
71 # - avoid tokenization
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
72 # - avoid removing "stop words" from text
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
73 #
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
74 IDANALYZER = IDTokenizer()
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
75
6477
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
76 # CUSTOM ANALYZER wordsplit + lowercase filter, for pathname-like text
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
77 #
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
78 # This is useful to:
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
79 # - avoid removing "stop words" from text
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
80 # - search case-insensitively
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
81 #
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
82 PATHANALYZER = RegexTokenizer() | LowercaseFilter()
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
83
6864
7691290837d2 codingstyle: trivial whitespace fixes
Lars Kruse <devel@sumpfralle.de>
parents: 6478
diff changeset
84 # INDEX SCHEMA DEFINITION
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
85 SCHEMA = Schema(
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
86 fileid=ID(unique=True),
6478
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
87 owner=TEXT(analyzer=EMAILADDRANALYZER),
6476
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
88 # this field preserves case of repository name for exact matching
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
89 repository_rawname=TEXT(analyzer=IDANALYZER),
6475
caef0be39948 search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6474
diff changeset
90 repository=TEXT(stored=True, analyzer=ICASEIDANALYZER),
6477
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
91 path=TEXT(stored=True, analyzer=PATHANALYZER),
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
92 content=FieldType(format=Characters(), analyzer=ANALYZER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
93 scorable=True, stored=True),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
94 modtime=STORED(),
6477
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
95 extension=TEXT(stored=True, analyzer=PATHANALYZER)
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
96 )
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
97
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
98 IDX_NAME = 'HG_INDEX'
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
99 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
100 FRAGMENTER = ContextFragmenter(200)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
101
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
102 CHGSETS_SCHEMA = Schema(
2642
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2640
diff changeset
103 raw_id=ID(unique=True, stored=True),
2693
66c778b8cb54 Extended commit search schema with date of commit
Marcin Kuzminski <marcin@python-works.com>
parents: 2673
diff changeset
104 date=NUMERIC(stored=True),
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
105 last=BOOLEAN(),
6478
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
106 owner=TEXT(analyzer=EMAILADDRANALYZER),
6476
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
107 # this field preserves case of repository name for exact matching
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
108 # and unique-ness in index table
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
109 repository_rawname=ID(unique=True),
8b7c0ef62427 search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6475
diff changeset
110 repository=ID(stored=True, analyzer=ICASEIDANALYZER),
6478
c0b2410d63a5 search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6477
diff changeset
111 author=TEXT(stored=True, analyzer=EMAILADDRANALYZER),
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
112 message=FieldType(format=Characters(), analyzer=ANALYZER,
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
113 scorable=True, stored=True),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
114 parents=TEXT(),
6477
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
115 added=TEXT(analyzer=PATHANALYZER),
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
116 removed=TEXT(analyzer=PATHANALYZER),
168cc92c1b53 search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6476
diff changeset
117 changed=TEXT(analyzer=PATHANALYZER),
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
118 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
119
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
120 CHGSET_IDX_NAME = 'CHGSET_INDEX'
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
121
3062
a08624dd675e Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents: 2718
diff changeset
122 # used only to generate queries in journal
a08624dd675e Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents: 2718
diff changeset
123 JOURNAL_SCHEMA = Schema(
6474
2ff913970025 journal: make "username:" filtering condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 6473
diff changeset
124 username=ID(),
3062
a08624dd675e Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents: 2718
diff changeset
125 date=DATETIME(),
a08624dd675e Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents: 2718
diff changeset
126 action=TEXT(),
6473
73e3599971da journal: make "repository:" filtering condition work as expected (Issue #261)
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 5998
diff changeset
127 repository=ID(),
3062
a08624dd675e Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents: 2718
diff changeset
128 ip=TEXT(),
a08624dd675e Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents: 2718
diff changeset
129 )
a08624dd675e Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents: 2718
diff changeset
130
2718
82fb2a161ddf fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
131
2319
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
132 class WhooshResultWrapper(object):
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
133 def __init__(self, search_type, searcher, matcher, highlight_items,
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
134 repo_location):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
135 self.search_type = search_type
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
136 self.searcher = searcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
137 self.matcher = matcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
138 self.highlight_items = highlight_items
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
139 self.fragment_size = 200
2319
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
140 self.repo_location = repo_location
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
141
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
142 @LazyProperty
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
143 def doc_ids(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
144 docs_id = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
145 while self.matcher.is_active():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
146 docnum = self.matcher.id()
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
147 chunks = [offsets for offsets in self.get_chunks()]
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
148 docs_id.append([docnum, chunks])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
149 self.matcher.next()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
150 return docs_id
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
151
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
152 def __str__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
153 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
154
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
155 def __repr__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
156 return self.__str__()
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
157
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
158 def __len__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
159 return len(self.doc_ids)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
160
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
161 def __iter__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
162 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
163 Allows Iteration over results,and lazy generate content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
164
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
165 *Requires* implementation of ``__getitem__`` method.
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
166 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
167 for docid in self.doc_ids:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
168 yield self.get_full_content(docid)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
169
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
170 def __getitem__(self, key):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
171 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
172 Slicing of resultWrapper
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
173 """
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
174 i, j = key.start, key.stop
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
175
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
176 slices = []
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
177 for docid in self.doc_ids[i:j]:
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
178 slices.append(self.get_full_content(docid))
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
179 return slices
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
180
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
181 def get_full_content(self, docid):
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
182 res = self.searcher.stored_fields(docid[0])
5375
0210d0b769d4 cleanup: pass log strings unformatted - avoid unnecessary % formatting when not logging
Mads Kiilerich <madski@unity3d.com>
parents: 4422
diff changeset
183 log.debug('result: %s', res)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
184 if self.search_type == 'content':
5997
b313d735d9c8 cleanup: get rid of jn as shortcut for os.path.join
domruf <dominikruf@gmail.com>
parents: 5376
diff changeset
185 full_repo_path = os.path.join(self.repo_location, res['repository'])
2642
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2640
diff changeset
186 f_path = res['path'].split(full_repo_path)[-1]
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2640
diff changeset
187 f_path = f_path.lstrip(os.sep)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
188 content_short = self.get_short_content(res, docid[1])
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
189 res.update({'content_short': content_short,
2642
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2640
diff changeset
190 'content_short_hl': self.highlight(content_short),
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2640
diff changeset
191 'f_path': f_path
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
192 })
2718
82fb2a161ddf fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
193 elif self.search_type == 'path':
5997
b313d735d9c8 cleanup: get rid of jn as shortcut for os.path.join
domruf <dominikruf@gmail.com>
parents: 5376
diff changeset
194 full_repo_path = os.path.join(self.repo_location, res['repository'])
2718
82fb2a161ddf fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
195 f_path = res['path'].split(full_repo_path)[-1]
82fb2a161ddf fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
196 f_path = f_path.lstrip(os.sep)
82fb2a161ddf fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
197 res.update({'f_path': f_path})
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
198 elif self.search_type == 'message':
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
199 res.update({'message_hl': self.highlight(res['message'])})
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
200
5375
0210d0b769d4 cleanup: pass log strings unformatted - avoid unnecessary % formatting when not logging
Mads Kiilerich <madski@unity3d.com>
parents: 4422
diff changeset
201 log.debug('result: %s', res)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
202
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
203 return res
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
204
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
205 def get_short_content(self, res, chunks):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
206
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
207 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
208
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
209 def get_chunks(self):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
210 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
211 Smart function that implements chunking the content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
212 but not overlap chunks so it doesn't highlight the same
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
213 close occurrences twice.
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
214 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
215 memory = [(0, 0)]
2673
d5e42c00f3c1 white space cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
216 if self.matcher.supports('positions'):
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
217 for span in self.matcher.spans():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
218 start = span.startchar or 0
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
219 end = span.endchar or 0
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
220 start_offseted = max(0, start - self.fragment_size)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
221 end_offseted = end + self.fragment_size
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
222
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
223 if start_offseted < memory[-1][1]:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
224 start_offseted = memory[-1][1]
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
225 memory.append((start_offseted, end_offseted,))
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
226 yield (start_offseted, end_offseted,)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
227
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
228 def highlight(self, content, top=5):
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
229 if self.search_type not in ['content', 'message']:
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
230 return ''
3915
a42bfe8a9335 moved make-index command to paster_commands module
Marcin Kuzminski <marcin@python-works.com>
parents: 3339
diff changeset
231 hl = whoosh_highlight(
2389
324b838250c9 UI fixes for searching
Marcin Kuzminski <marcin@python-works.com>
parents: 2388
diff changeset
232 text=content,
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
233 terms=self.highlight_items,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
234 analyzer=ANALYZER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
235 fragmenter=FRAGMENTER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
236 formatter=FORMATTER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
237 top=top
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
238 )
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
239 return hl