Mercurial > kallithea
annotate kallithea/lib/indexers/__init__.py @ 6477:168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
Before this revision, pathname related conditions below cause
unintentional ignorance of "stop words".
- path:,extension: (for "File contents" or "File names")
- added:, removed:, changed: (for "Commit messages")
Therefore, pathname related conditions with "this", "a", "you", and so
on are completely ignored, even if they are valid pathname components.
To prevent pathname related conditions from removing "stop words",
this revision explicitly specifies "analyzer" for pathname related
fields of SCHEMA and CHGSETS_SCHEMA.
Difference between PATHANALYZER and default analyzer of TEXT is
whether "stop words" are preserved or not. Tokenization is still
applied on pathnames.
This revision requires full re-building index tables, because indexing
schemas are changed.
author | FUJIWARA Katsunori <foozy@lares.dti.ne.jp> |
---|---|
date | Mon, 23 Jan 2017 02:17:38 +0900 |
parents | 8b7c0ef62427 |
children | c0b2410d63a5 |
rev | line source |
---|---|
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
1 # -*- coding: utf-8 -*- |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
2 # This program is free software: you can redistribute it and/or modify |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
3 # it under the terms of the GNU General Public License as published by |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
4 # the Free Software Foundation, either version 3 of the License, or |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
5 # (at your option) any later version. |
1203
6832ef664673
source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents:
1198
diff
changeset
|
6 # |
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
7 # This program is distributed in the hope that it will be useful, |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
10 # GNU General Public License for more details. |
1203
6832ef664673
source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents:
1198
diff
changeset
|
11 # |
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
12 # You should have received a copy of the GNU General Public License |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
13 # along with this program. If not, see <http://www.gnu.org/licenses/>. |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
14 """ |
5376
0ad053c172fa
cleanup: make module self-naming consistent
Mads Kiilerich <madski@unity3d.com>
parents:
5375
diff
changeset
|
15 kallithea.lib.indexers |
0ad053c172fa
cleanup: make module self-naming consistent
Mads Kiilerich <madski@unity3d.com>
parents:
5375
diff
changeset
|
16 ~~~~~~~~~~~~~~~~~~~~~~ |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
17 |
4212
24c0d584ba86
General renaming to Kallithea
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4211
diff
changeset
|
18 Whoosh indexing module for Kallithea |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
19 |
4211
1948ede028ef
RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4208
diff
changeset
|
20 This file was forked by the Kallithea project in July 2014. |
1948ede028ef
RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4208
diff
changeset
|
21 Original author and date, and relevant copyright and licensing information is below: |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
22 :created_on: Aug 17, 2010 |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
23 :author: marcink |
4211
1948ede028ef
RhodeCode GmbH is not the sole author of this work
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4208
diff
changeset
|
24 :copyright: (c) 2013 RhodeCode GmbH, and others. |
4208
ad38f9f93b3b
Correct licensing information in individual files.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4187
diff
changeset
|
25 :license: GPLv3, see LICENSE.md for more details. |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
26 """ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
27 |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
28 import os |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
29 import sys |
2102
04d26165c3d9
Whoosh logging is now controlled by the .ini files logging setup
Marcin Kuzminski <marcin@python-works.com>
parents:
1995
diff
changeset
|
30 import logging |
5998
037efd94e955
cleanup: get rid of dn as shortcut for os.path.dirname
domruf <dominikruf@gmail.com>
parents:
5997
diff
changeset
|
31 from os.path import dirname |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
32 |
4175
e9f6b533a8f6
Remove wrong/unnecessary/unfixable comment(s)
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4116
diff
changeset
|
33 # Add location of top level folder to sys.path |
5998
037efd94e955
cleanup: get rid of dn as shortcut for os.path.dirname
domruf <dominikruf@gmail.com>
parents:
5997
diff
changeset
|
34 sys.path.append(dirname(dirname(dirname(os.path.realpath(__file__))))) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
35 |
6475
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
36 from whoosh.analysis import RegexTokenizer, LowercaseFilter, IDTokenizer |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
37 from whoosh.fields import TEXT, ID, STORED, NUMERIC, BOOLEAN, Schema, FieldType, DATETIME |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
38 from whoosh.formats import Characters |
3915
a42bfe8a9335
moved make-index command to paster_commands module
Marcin Kuzminski <marcin@python-works.com>
parents:
3339
diff
changeset
|
39 from whoosh.highlight import highlight as whoosh_highlight, HtmlFormatter, ContextFragmenter |
4186
7e5f8c12a3fc
First step in two-part process to rename directories to kallithea.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
4175
diff
changeset
|
40 from kallithea.lib.utils2 import LazyProperty |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
41 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
42 log = logging.getLogger(__name__) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
43 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
44 # CUSTOM ANALYZER wordsplit + lowercase filter |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
406
diff
changeset
|
45 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter() |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
46 |
6475
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
47 # CUSTOM ANALYZER raw-string + lowercase filter |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
48 # |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
49 # This is useful to: |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
50 # - avoid tokenization |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
51 # - avoid removing "stop words" from text |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
52 # - search case-insensitively |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
53 # |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
54 ICASEIDANALYZER = IDTokenizer() | LowercaseFilter() |
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
55 |
6476
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
56 # CUSTOM ANALYZER raw-string |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
57 # |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
58 # This is useful to: |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
59 # - avoid tokenization |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
60 # - avoid removing "stop words" from text |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
61 # |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
62 IDANALYZER = IDTokenizer() |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
63 |
6477
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
64 # CUSTOM ANALYZER wordsplit + lowercase filter, for pathname-like text |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
65 # |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
66 # This is useful to: |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
67 # - avoid removing "stop words" from text |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
68 # - search case-insensitively |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
69 # |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
70 PATHANALYZER = RegexTokenizer() | LowercaseFilter() |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
71 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
72 #INDEX SCHEMA DEFINITION |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
73 SCHEMA = Schema( |
2388
a0ef98f2520b
#453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents:
2373
diff
changeset
|
74 fileid=ID(unique=True), |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
75 owner=TEXT(), |
6476
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
76 # this field preserves case of repository name for exact matching |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
77 repository_rawname=TEXT(analyzer=IDANALYZER), |
6475
caef0be39948
search: make "repository:" condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6474
diff
changeset
|
78 repository=TEXT(stored=True, analyzer=ICASEIDANALYZER), |
6477
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
79 path=TEXT(stored=True, analyzer=PATHANALYZER), |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
80 content=FieldType(format=Characters(), analyzer=ANALYZER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
81 scorable=True, stored=True), |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
82 modtime=STORED(), |
6477
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
83 extension=TEXT(stored=True, analyzer=PATHANALYZER) |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
84 ) |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
85 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
86 IDX_NAME = 'HG_INDEX' |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
87 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n') |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
88 FRAGMENTER = ContextFragmenter(200) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
89 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
90 CHGSETS_SCHEMA = Schema( |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
91 raw_id=ID(unique=True, stored=True), |
2693
66c778b8cb54
Extended commit search schema with date of commit
Marcin Kuzminski <marcin@python-works.com>
parents:
2673
diff
changeset
|
92 date=NUMERIC(stored=True), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
93 last=BOOLEAN(), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
94 owner=TEXT(), |
6476
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
95 # this field preserves case of repository name for exact matching |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
96 # and unique-ness in index table |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
97 repository_rawname=ID(unique=True), |
8b7c0ef62427
search: make "repository:" condition work case-insensitively as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6475
diff
changeset
|
98 repository=ID(stored=True, analyzer=ICASEIDANALYZER), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
99 author=TEXT(stored=True), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
100 message=FieldType(format=Characters(), analyzer=ANALYZER, |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
101 scorable=True, stored=True), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
102 parents=TEXT(), |
6477
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
103 added=TEXT(analyzer=PATHANALYZER), |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
104 removed=TEXT(analyzer=PATHANALYZER), |
168cc92c1b53
search: prevent pathname related conditions from removing "stop words"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6476
diff
changeset
|
105 changed=TEXT(analyzer=PATHANALYZER), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
106 ) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
107 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
108 CHGSET_IDX_NAME = 'CHGSET_INDEX' |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
109 |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
110 # used only to generate queries in journal |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
111 JOURNAL_SCHEMA = Schema( |
6474
2ff913970025
journal: make "username:" filtering condition work as expected
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
6473
diff
changeset
|
112 username=ID(), |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
113 date=DATETIME(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
114 action=TEXT(), |
6473
73e3599971da
journal: make "repository:" filtering condition work as expected (Issue #261)
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
5998
diff
changeset
|
115 repository=ID(), |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
116 ip=TEXT(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
117 ) |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
118 |
2718
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
119 |
2319
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
120 class WhooshResultWrapper(object): |
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
121 def __init__(self, search_type, searcher, matcher, highlight_items, |
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
122 repo_location): |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
123 self.search_type = search_type |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
124 self.searcher = searcher |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
125 self.matcher = matcher |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
126 self.highlight_items = highlight_items |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
127 self.fragment_size = 200 |
2319
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
128 self.repo_location = repo_location |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
129 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
130 @LazyProperty |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
131 def doc_ids(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
132 docs_id = [] |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
133 while self.matcher.is_active(): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
134 docnum = self.matcher.id() |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
135 chunks = [offsets for offsets in self.get_chunks()] |
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
136 docs_id.append([docnum, chunks]) |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
137 self.matcher.next() |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
138 return docs_id |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
139 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
140 def __str__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
141 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids)) |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
142 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
143 def __repr__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
144 return self.__str__() |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
145 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
146 def __len__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
147 return len(self.doc_ids) |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
148 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
149 def __iter__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
150 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
151 Allows Iteration over results,and lazy generate content |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
152 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
153 *Requires* implementation of ``__getitem__`` method. |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
154 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
155 for docid in self.doc_ids: |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
156 yield self.get_full_content(docid) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
157 |
1198
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
158 def __getitem__(self, key): |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
159 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
160 Slicing of resultWrapper |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
161 """ |
1198
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
162 i, j = key.start, key.stop |
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
163 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
164 slices = [] |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
165 for docid in self.doc_ids[i:j]: |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
166 slices.append(self.get_full_content(docid)) |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
167 return slices |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
168 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
169 def get_full_content(self, docid): |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
170 res = self.searcher.stored_fields(docid[0]) |
5375
0210d0b769d4
cleanup: pass log strings unformatted - avoid unnecessary % formatting when not logging
Mads Kiilerich <madski@unity3d.com>
parents:
4422
diff
changeset
|
171 log.debug('result: %s', res) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
172 if self.search_type == 'content': |
5997
b313d735d9c8
cleanup: get rid of jn as shortcut for os.path.join
domruf <dominikruf@gmail.com>
parents:
5376
diff
changeset
|
173 full_repo_path = os.path.join(self.repo_location, res['repository']) |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
174 f_path = res['path'].split(full_repo_path)[-1] |
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
175 f_path = f_path.lstrip(os.sep) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
176 content_short = self.get_short_content(res, docid[1]) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
177 res.update({'content_short': content_short, |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
178 'content_short_hl': self.highlight(content_short), |
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
179 'f_path': f_path |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
180 }) |
2718
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
181 elif self.search_type == 'path': |
5997
b313d735d9c8
cleanup: get rid of jn as shortcut for os.path.join
domruf <dominikruf@gmail.com>
parents:
5376
diff
changeset
|
182 full_repo_path = os.path.join(self.repo_location, res['repository']) |
2718
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
183 f_path = res['path'].split(full_repo_path)[-1] |
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
184 f_path = f_path.lstrip(os.sep) |
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
185 res.update({'f_path': f_path}) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
186 elif self.search_type == 'message': |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
187 res.update({'message_hl': self.highlight(res['message'])}) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
188 |
5375
0210d0b769d4
cleanup: pass log strings unformatted - avoid unnecessary % formatting when not logging
Mads Kiilerich <madski@unity3d.com>
parents:
4422
diff
changeset
|
189 log.debug('result: %s', res) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
190 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
191 return res |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
192 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
193 def get_short_content(self, res, chunks): |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
194 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
195 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks]) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
196 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
197 def get_chunks(self): |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
198 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
199 Smart function that implements chunking the content |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
200 but not overlap chunks so it doesn't highlight the same |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
201 close occurrences twice. |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
202 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
203 memory = [(0, 0)] |
2673
d5e42c00f3c1
white space cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
204 if self.matcher.supports('positions'): |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
205 for span in self.matcher.spans(): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
206 start = span.startchar or 0 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
207 end = span.endchar or 0 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
208 start_offseted = max(0, start - self.fragment_size) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
209 end_offseted = end + self.fragment_size |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
210 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
211 if start_offseted < memory[-1][1]: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
212 start_offseted = memory[-1][1] |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
213 memory.append((start_offseted, end_offseted,)) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
214 yield (start_offseted, end_offseted,) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
215 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
216 def highlight(self, content, top=5): |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
217 if self.search_type not in ['content', 'message']: |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
218 return '' |
3915
a42bfe8a9335
moved make-index command to paster_commands module
Marcin Kuzminski <marcin@python-works.com>
parents:
3339
diff
changeset
|
219 hl = whoosh_highlight( |
2389
324b838250c9
UI fixes for searching
Marcin Kuzminski <marcin@python-works.com>
parents:
2388
diff
changeset
|
220 text=content, |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
221 terms=self.highlight_items, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
222 analyzer=ANALYZER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
223 fragmenter=FRAGMENTER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
224 formatter=FORMATTER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
225 top=top |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
226 ) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
227 return hl |