Mercurial > kallithea
annotate kallithea/lib/indexers/__init__.py @ 7811:0a277465fddf
scripts: initial run of import cleanup using isort
author | Mads Kiilerich <mads@kiilerich.com> |
---|---|
date | Wed, 07 Aug 2019 00:25:02 +0200 |
parents | 19af3fef3b34 |
children | e2b9731cb2fb |
rev | line source |
---|---|
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
1 # -*- coding: utf-8 -*- |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
2 # This program is free software: you can redistribute it and/or modify |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
3 # it under the terms of the GNU General Public License as published by |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
4 # the Free Software Foundation, either version 3 of the License, or |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
5 # (at your option) any later version. |
1203
6832ef664673
source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents:
1198
diff
changeset
|
6 # |
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
7 # This program is distributed in the hope that it will be useful, |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
10 # GNU General Public License for more details. |
1203
6832ef664673
source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents:
1198
diff
changeset
|
11 # |
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
12 # You should have received a copy of the GNU General Public License |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
13 # along with this program. If not, see <http://www.gnu.org/licenses/>. |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
14 """ |
5376
0ad053c172fa
cleanup: make module self-naming consistent
Mads Kiilerich <madski@unity3d.com>
parents:
5375
diff
changeset
|
15 kallithea.lib.indexers |
0ad053c172fa
cleanup: make module self-naming consistent
Mads Kiilerich <madski@unity3d.com>
parents:
5375
diff
changeset
|
16 ~~~~~~~~~~~~~~~~~~~~~~ |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
17 |
4212
24c0d584ba86
General renaming to Kallithea
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4211
diff
changeset
|
18 Whoosh indexing module for Kallithea |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
19 |
4211
1948ede028ef
RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4208
diff
changeset
|
20 This file was forked by the Kallithea project in July 2014. |
1948ede028ef
RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4208
diff
changeset
|
21 Original author and date, and relevant copyright and licensing information is below: |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
22 :created_on: Aug 17, 2010 |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
23 :author: marcink |
4211
1948ede028ef
RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4208
diff
changeset
|
24 :copyright: (c) 2013 RhodeCode GmbH, and others. |
4208
ad38f9f93b3b
Correct licensing information in individual files.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4187
diff
changeset
|
25 :license: GPLv3, see LICENSE.md for more details. |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
26 """ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
27 |
7811
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
28 import logging |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
29 import os |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
30 import sys |
5998
037efd94e955
cleanup: get rid of dn as shortcut for os.path.dirname
domruf <dominikruf@gmail.com>
parents:
5997
diff
changeset
|
31 from os.path import dirname |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
32 |
7811
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
33 from whoosh.analysis import IDTokenizer, LowercaseFilter, RegexTokenizer |
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
34 from whoosh.fields import BOOLEAN, DATETIME, ID, NUMERIC, STORED, TEXT, FieldType, Schema |
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
35 from whoosh.formats import Characters |
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
36 from whoosh.highlight import ContextFragmenter, HtmlFormatter |
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
37 from whoosh.highlight import highlight as whoosh_highlight |
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
38 |
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
39 from kallithea.lib.utils2 import LazyProperty |
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
40 |
0a277465fddf
scripts: initial run of import cleanup using isort
Mads Kiilerich <mads@kiilerich.com>
parents:
7626
diff
changeset
|
41 |
4175
e9f6b533a8f6
Remove wrong/unnecessary/unfixable comment(s)
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4116
diff
changeset
|
42 # Add location of top level folder to sys.path |
5998
037efd94e955
cleanup: get rid of dn as shortcut for os.path.dirname
domruf <dominikruf@gmail.com>
parents:
5997
diff
changeset
|
43 sys.path.append(dirname(dirname(dirname(os.path.realpath(__file__))))) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
44 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
45 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
46 log = logging.getLogger(__name__) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
47 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
48 # CUSTOM ANALYZER wordsplit + lowercase filter |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
406
diff
changeset
|
49 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter() |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
50 |
6478
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
51 # CUSTOM ANALYZER wordsplit + lowercase filter, for emailaddr-like text |
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
52 # |
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
53 # This is useful to: |
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
54 # - avoid removing "stop words" from text |
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
55 # - search case-insensitively |
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
56 # |
6864
7691290837d2
codingstyle: trivial whitespace fixes
Lars Kruse <devel@sumpfralle.de>
parents:
6478
diff
changeset
|
57 EMAILADDRANALYZER = RegexTokenizer() | LowercaseFilter() |
6478
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
58 |
6475
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
59 # CUSTOM ANALYZER raw-string + lowercase filter |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
60 # |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
61 # This is useful to: |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
62 # - avoid tokenization |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
63 # - avoid removing "stop words" from text |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
64 # - search case-insensitively |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
65 # |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
66 ICASEIDANALYZER = IDTokenizer() | LowercaseFilter() |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
67 |
6476
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
68 # CUSTOM ANALYZER raw-string |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
69 # |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
70 # This is useful to: |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
71 # - avoid tokenization |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
72 # - avoid removing "stop words" from text |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
73 # |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
74 IDANALYZER = IDTokenizer() |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
75 |
6477
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
76 # CUSTOM ANALYZER wordsplit + lowercase filter, for pathname-like text |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
77 # |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
78 # This is useful to: |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
79 # - avoid removing "stop words" from text |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
80 # - search case-insensitively |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
81 # |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
82 PATHANALYZER = RegexTokenizer() | LowercaseFilter() |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
83 |
6864
7691290837d2
codingstyle: trivial whitespace fixes
Lars Kruse <devel@sumpfralle.de>
parents:
6478
diff
changeset
|
84 # INDEX SCHEMA DEFINITION |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
85 SCHEMA = Schema( |
2388
a0ef98f2520b
#453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents:
2373
diff
changeset
|
86 fileid=ID(unique=True), |
6478
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
87 owner=TEXT(analyzer=EMAILADDRANALYZER), |
6476
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
88 # this field preserves case of repository name for exact matching |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
89 repository_rawname=TEXT(analyzer=IDANALYZER), |
6475
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
90 repository=TEXT(stored=True, analyzer=ICASEIDANALYZER), |
6477
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
91 path=TEXT(stored=True, analyzer=PATHANALYZER), |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
92 content=FieldType(format=Characters(), analyzer=ANALYZER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
93 scorable=True, stored=True), |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
94 modtime=STORED(), |
6477
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
95 extension=TEXT(stored=True, analyzer=PATHANALYZER) |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
96 ) |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
97 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
98 IDX_NAME = 'HG_INDEX' |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
99 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n') |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
100 FRAGMENTER = ContextFragmenter(200) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
101 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
102 CHGSETS_SCHEMA = Schema( |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
103 raw_id=ID(unique=True, stored=True), |
2693
66c778b8cb54
Extended commit search schema with date of commit
Marcin Kuzminski <marcin@python-works.com>
parents:
2673
diff
changeset
|
104 date=NUMERIC(stored=True), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
105 last=BOOLEAN(), |
6478
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
106 owner=TEXT(analyzer=EMAILADDRANALYZER), |
6476
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
107 # this field preserves case of repository name for exact matching |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
108 # and unique-ness in index table |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
109 repository_rawname=ID(unique=True), |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
110 repository=ID(stored=True, analyzer=ICASEIDANALYZER), |
6478
c0b2410d63a5
search: prevent username related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6477
diff
changeset
|
111 author=TEXT(stored=True, analyzer=EMAILADDRANALYZER), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
112 message=FieldType(format=Characters(), analyzer=ANALYZER, |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
113 scorable=True, stored=True), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
114 parents=TEXT(), |
6477
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
115 added=TEXT(analyzer=PATHANALYZER), |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
116 removed=TEXT(analyzer=PATHANALYZER), |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
117 changed=TEXT(analyzer=PATHANALYZER), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
118 ) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
119 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
120 CHGSET_IDX_NAME = 'CHGSET_INDEX' |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
121 |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
122 # used only to generate queries in journal |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
123 JOURNAL_SCHEMA = Schema( |
6474
2ff913970025
journal: make "username:" filtering condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6473
diff
changeset
|
124 username=ID(), |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
125 date=DATETIME(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
126 action=TEXT(), |
6473
73e3599971da
journal: make "repository:" filtering condition work as expected (Issue #261)
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
5998
diff
changeset
|
127 repository=ID(), |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
128 ip=TEXT(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
129 ) |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
130 |
2718
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
131 |
2319
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
132 class WhooshResultWrapper(object): |
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
133 def __init__(self, search_type, searcher, matcher, highlight_items, |
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
134 repo_location): |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
135 self.search_type = search_type |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
136 self.searcher = searcher |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
137 self.matcher = matcher |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
138 self.highlight_items = highlight_items |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
139 self.fragment_size = 200 |
2319
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
140 self.repo_location = repo_location |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
141 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
142 @LazyProperty |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
143 def doc_ids(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
144 docs_id = [] |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
145 while self.matcher.is_active(): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
146 docnum = self.matcher.id() |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
147 chunks = [offsets for offsets in self.get_chunks()] |
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
148 docs_id.append([docnum, chunks]) |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
149 self.matcher.next() |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
150 return docs_id |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
151 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
152 def __str__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
153 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids)) |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
154 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
155 def __repr__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
156 return self.__str__() |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
157 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
158 def __len__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
159 return len(self.doc_ids) |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
160 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
161 def __iter__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
162 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
163 Allows Iteration over results,and lazy generate content |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
164 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
165 *Requires* implementation of ``__getitem__`` method. |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
166 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
167 for docid in self.doc_ids: |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
168 yield self.get_full_content(docid) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
169 |
1198
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
170 def __getitem__(self, key): |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
171 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
172 Slicing of resultWrapper |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
173 """ |
1198
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
174 i, j = key.start, key.stop |
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
175 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
176 slices = [] |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
177 for docid in self.doc_ids[i:j]: |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
178 slices.append(self.get_full_content(docid)) |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
179 return slices |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
180 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
181 def get_full_content(self, docid): |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
182 res = self.searcher.stored_fields(docid[0]) |
5375
0210d0b769d4
cleanup: pass log strings unformatted - avoid unnecessary % formatting when not logging
Mads Kiilerich <madski@unity3d.com>
parents:
4422
diff
changeset
|
183 log.debug('result: %s', res) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
184 if self.search_type == 'content': |
5997
b313d735d9c8
cleanup: get rid of jn as shortcut for os.path.join
domruf <dominikruf@gmail.com>
parents:
5376
diff
changeset
|
185 full_repo_path = os.path.join(self.repo_location, res['repository']) |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
186 f_path = res['path'].split(full_repo_path)[-1] |
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
187 f_path = f_path.lstrip(os.sep) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
188 content_short = self.get_short_content(res, docid[1]) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
189 res.update({'content_short': content_short, |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
190 'content_short_hl': self.highlight(content_short), |
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
191 'f_path': f_path |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
192 }) |
2718
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
193 elif self.search_type == 'path': |
5997
b313d735d9c8
cleanup: get rid of jn as shortcut for os.path.join
domruf <dominikruf@gmail.com>
parents:
5376
diff
changeset
|
194 full_repo_path = os.path.join(self.repo_location, res['repository']) |
2718
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
195 f_path = res['path'].split(full_repo_path)[-1] |
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
196 f_path = f_path.lstrip(os.sep) |
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
197 res.update({'f_path': f_path}) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
198 elif self.search_type == 'message': |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
199 res.update({'message_hl': self.highlight(res['message'])}) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
200 |
5375
0210d0b769d4
cleanup: pass log strings unformatted - avoid unnecessary % formatting when not logging
Mads Kiilerich <madski@unity3d.com>
parents:
4422
diff
changeset
|
201 log.debug('result: %s', res) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
202 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
203 return res |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
204 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
205 def get_short_content(self, res, chunks): |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
206 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
207 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks]) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
208 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
209 def get_chunks(self): |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
210 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
211 Smart function that implements chunking the content |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
212 but not overlap chunks so it doesn't highlight the same |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
213 close occurrences twice. |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
214 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
215 memory = [(0, 0)] |
2673
d5e42c00f3c1
white space cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
216 if self.matcher.supports('positions'): |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
217 for span in self.matcher.spans(): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
218 start = span.startchar or 0 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
219 end = span.endchar or 0 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
220 start_offseted = max(0, start - self.fragment_size) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
221 end_offseted = end + self.fragment_size |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
222 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
223 if start_offseted < memory[-1][1]: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
224 start_offseted = memory[-1][1] |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
225 memory.append((start_offseted, end_offseted,)) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
226 yield (start_offseted, end_offseted,) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
227 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
228 def highlight(self, content, top=5): |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
229 if self.search_type not in ['content', 'message']: |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
230 return '' |
3915
a42bfe8a9335
moved make-index command to paster_commands module
Marcin Kuzminski <marcin@python-works.com>
parents:
3339
diff
changeset
|
231 hl = whoosh_highlight( |
2389
324b838250c9
UI fixes for searching
Marcin Kuzminski <marcin@python-works.com>
parents:
2388
diff
changeset
|
232 text=content, |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
233 terms=self.highlight_items, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
234 analyzer=ANALYZER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
235 fragmenter=FRAGMENTER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
236 formatter=FORMATTER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
237 top=top |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
238 ) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
239 return hl |