Mercurial > kallithea
annotate rhodecode/lib/indexers/__init__.py @ 4116:ffd45b185016 rhodecode-2.2.5-gpl
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
This imports changes between changesets 21af6c4eab3d and 6177597791c2 in
RhodeCode's original repository, including only changes to Python files and HTML.
RhodeCode clearly licensed its changes to these files under GPLv3
in their /LICENSE file, which states the following:
The Python code and integrated HTML are licensed under the GPLv3 license.
(See:
https://code.rhodecode.com/rhodecode/files/v2.2.5/LICENSE
or
http://web.archive.org/web/20140512193334/https://code.rhodecode.com/rhodecode/files/f3b123159901f15426d18e3dc395e8369f70ebe0/LICENSE
for an online copy of that LICENSE file)
Conservancy reviewed these changes and confirmed that they can be licensed as
a whole to the Kallithea project under GPLv3-only.
While some of the contents committed herein are clearly licensed
GPLv3-or-later, on the whole we must assume the are GPLv3-only, since the
statement above from RhodeCode indicates that they intend GPLv3-only as their
license, per GPLv3ยง14 and other relevant sections of GPLv3.
author | Bradley M. Kuhn <bkuhn@sfconservancy.org> |
---|---|
date | Wed, 02 Jul 2014 19:03:13 -0400 |
parents | 5293d4bbb1ea |
children | e9f6b533a8f6 |
rev | line source |
---|---|
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
1 # -*- coding: utf-8 -*- |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
2 # This program is free software: you can redistribute it and/or modify |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
3 # it under the terms of the GNU General Public License as published by |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
4 # the Free Software Foundation, either version 3 of the License, or |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
5 # (at your option) any later version. |
1203
6832ef664673
source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents:
1198
diff
changeset
|
6 # |
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
7 # This program is distributed in the hope that it will be useful, |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
10 # GNU General Public License for more details. |
1203
6832ef664673
source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents:
1198
diff
changeset
|
11 # |
903
04c9bb9ca6d6
code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents:
894
diff
changeset
|
12 # You should have received a copy of the GNU General Public License |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1203
diff
changeset
|
13 # along with this program. If not, see <http://www.gnu.org/licenses/>. |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
14 """ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
15 rhodecode.lib.indexers.__init__ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
17 |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
18 Whoosh indexing module for RhodeCode |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
19 |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
20 :created_on: Aug 17, 2010 |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
21 :author: marcink |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
22 :copyright: (c) 2013 RhodeCode GmbH. |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
23 :license: GPLv3, see LICENSE for more details. |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
24 """ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
25 |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
26 import os |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
27 import sys |
2102
04d26165c3d9
Whoosh logging is now controlled by the .ini files logging setup
Marcin Kuzminski <marcin@python-works.com>
parents:
1995
diff
changeset
|
28 import logging |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
29 from os.path import dirname as dn, join as jn |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
30 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
31 #to get the rhodecode import |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
32 sys.path.append(dn(dn(dn(os.path.realpath(__file__))))) |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
33 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
34 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
35 from whoosh.fields import TEXT, ID, STORED, NUMERIC, BOOLEAN, Schema, FieldType, DATETIME |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
36 from whoosh.formats import Characters |
3915
a42bfe8a9335
moved make-index command to paster_commands module
Marcin Kuzminski <marcin@python-works.com>
parents:
3339
diff
changeset
|
37 from whoosh.highlight import highlight as whoosh_highlight, HtmlFormatter, ContextFragmenter |
2109 | 38 from rhodecode.lib.utils2 import LazyProperty |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
39 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
40 log = logging.getLogger(__name__) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
41 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
42 # CUSTOM ANALYZER wordsplit + lowercase filter |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
406
diff
changeset
|
43 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter() |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
44 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
45 #INDEX SCHEMA DEFINITION |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
46 SCHEMA = Schema( |
2388
a0ef98f2520b
#453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents:
2373
diff
changeset
|
47 fileid=ID(unique=True), |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
48 owner=TEXT(), |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
49 repository=TEXT(stored=True), |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
50 path=TEXT(stored=True), |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
51 content=FieldType(format=Characters(), analyzer=ANALYZER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
52 scorable=True, stored=True), |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
53 modtime=STORED(), |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
54 extension=TEXT(stored=True) |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
55 ) |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
56 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
57 IDX_NAME = 'HG_INDEX' |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
58 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n') |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
59 FRAGMENTER = ContextFragmenter(200) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
60 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
61 CHGSETS_SCHEMA = Schema( |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
62 raw_id=ID(unique=True, stored=True), |
2693
66c778b8cb54
Extended commit search schema with date of commit
Marcin Kuzminski <marcin@python-works.com>
parents:
2673
diff
changeset
|
63 date=NUMERIC(stored=True), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
64 last=BOOLEAN(), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
65 owner=TEXT(), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
66 repository=ID(unique=True, stored=True), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
67 author=TEXT(stored=True), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
68 message=FieldType(format=Characters(), analyzer=ANALYZER, |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
69 scorable=True, stored=True), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
70 parents=TEXT(), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
71 added=TEXT(), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
72 removed=TEXT(), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
73 changed=TEXT(), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
74 ) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
75 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
76 CHGSET_IDX_NAME = 'CHGSET_INDEX' |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
77 |
3062
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
78 # used only to generate queries in journal |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
79 JOURNAL_SCHEMA = Schema( |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
80 username=TEXT(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
81 date=DATETIME(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
82 action=TEXT(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
83 repository=TEXT(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
84 ip=TEXT(), |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
85 ) |
a08624dd675e
Implemented filtering of admin journal based on Whoosh Query language
Marcin Kuzminski <marcin@python-works.com>
parents:
2718
diff
changeset
|
86 |
2718
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
87 |
2319
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
88 class WhooshResultWrapper(object): |
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
89 def __init__(self, search_type, searcher, matcher, highlight_items, |
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
90 repo_location): |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
91 self.search_type = search_type |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
92 self.searcher = searcher |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
93 self.matcher = matcher |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
94 self.highlight_items = highlight_items |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
95 self.fragment_size = 200 |
2319
4c239e0dcbb7
fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents:
2109
diff
changeset
|
96 self.repo_location = repo_location |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
97 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
98 @LazyProperty |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
99 def doc_ids(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
100 docs_id = [] |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
101 while self.matcher.is_active(): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
102 docnum = self.matcher.id() |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
103 chunks = [offsets for offsets in self.get_chunks()] |
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
104 docs_id.append([docnum, chunks]) |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
105 self.matcher.next() |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
106 return docs_id |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
107 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
108 def __str__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
109 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids)) |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
110 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
111 def __repr__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
112 return self.__str__() |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
113 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
114 def __len__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
115 return len(self.doc_ids) |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
116 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
117 def __iter__(self): |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
118 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
119 Allows Iteration over results,and lazy generate content |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
120 |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
121 *Requires* implementation of ``__getitem__`` method. |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
122 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
123 for docid in self.doc_ids: |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
124 yield self.get_full_content(docid) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
125 |
1198
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
126 def __getitem__(self, key): |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
127 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
128 Slicing of resultWrapper |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
129 """ |
1198
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
130 i, j = key.start, key.stop |
02a7f263a849
fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
131 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
132 slices = [] |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
133 for docid in self.doc_ids[i:j]: |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
134 slices.append(self.get_full_content(docid)) |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
135 return slices |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
136 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
137 def get_full_content(self, docid): |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
138 res = self.searcher.stored_fields(docid[0]) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
139 log.debug('result: %s' % res) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
140 if self.search_type == 'content': |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
141 full_repo_path = jn(self.repo_location, res['repository']) |
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
142 f_path = res['path'].split(full_repo_path)[-1] |
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
143 f_path = f_path.lstrip(os.sep) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
144 content_short = self.get_short_content(res, docid[1]) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
145 res.update({'content_short': content_short, |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
146 'content_short_hl': self.highlight(content_short), |
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
147 'f_path': f_path |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
148 }) |
2718
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
149 elif self.search_type == 'path': |
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
150 full_repo_path = jn(self.repo_location, res['repository']) |
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
151 f_path = res['path'].split(full_repo_path)[-1] |
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
152 f_path = f_path.lstrip(os.sep) |
82fb2a161ddf
fixes issue #524
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
153 res.update({'f_path': f_path}) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
154 elif self.search_type == 'message': |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
155 res.update({'message_hl': self.highlight(res['message'])}) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
156 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
157 log.debug('result: %s' % res) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
158 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
159 return res |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
160 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
161 def get_short_content(self, res, chunks): |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
162 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
163 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks]) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
164 |
479
149940ba96d9
fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents:
478
diff
changeset
|
165 def get_chunks(self): |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
166 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
167 Smart function that implements chunking the content |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
168 but not overlap chunks so it doesn't highlight the same |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
169 close occurrences twice. |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
170 """ |
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
171 memory = [(0, 0)] |
2673
d5e42c00f3c1
white space cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
172 if self.matcher.supports('positions'): |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
173 for span in self.matcher.spans(): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
174 start = span.startchar or 0 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
175 end = span.endchar or 0 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
176 start_offseted = max(0, start - self.fragment_size) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
177 end_offseted = end + self.fragment_size |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
178 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
179 if start_offseted < memory[-1][1]: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
180 start_offseted = memory[-1][1] |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
181 memory.append((start_offseted, end_offseted,)) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
182 yield (start_offseted, end_offseted,) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
183 |
478
7010af6efde5
Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
184 def highlight(self, content, top=5): |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2389
diff
changeset
|
185 if self.search_type not in ['content', 'message']: |
556
65b2f150beb7
Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
547
diff
changeset
|
186 return '' |
3915
a42bfe8a9335
moved make-index command to paster_commands module
Marcin Kuzminski <marcin@python-works.com>
parents:
3339
diff
changeset
|
187 hl = whoosh_highlight( |
2389
324b838250c9
UI fixes for searching
Marcin Kuzminski <marcin@python-works.com>
parents:
2388
diff
changeset
|
188 text=content, |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
189 terms=self.highlight_items, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
190 analyzer=ANALYZER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
191 fragmenter=FRAGMENTER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
192 formatter=FORMATTER, |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
193 top=top |
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
194 ) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
195 return hl |