annotate rhodecode/lib/indexers/__init__.py @ 1540:191f3f08236d beta

fixes #258 RhodeCode 1.2 assumes egg folder is writable
author Marcin Kuzminski <marcin@python-works.com>
date Tue, 11 Oct 2011 01:08:23 +0200
parents c3172bc09503
children 752b0a7b7679 203af05539e0
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
1 # -*- coding: utf-8 -*-
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
2 """
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
3 rhodecode.lib.indexers.__init__
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
5
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
6 Whoosh indexing module for RhodeCode
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
7
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
8 :created_on: Aug 17, 2010
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
9 :author: marcink
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
10 :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
11 :license: GPLv3, see COPYING for more details.
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
12 """
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
13 # This program is free software: you can redistribute it and/or modify
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
14 # it under the terms of the GNU General Public License as published by
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
15 # the Free Software Foundation, either version 3 of the License, or
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
16 # (at your option) any later version.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
17 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
18 # This program is distributed in the hope that it will be useful,
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
21 # GNU General Public License for more details.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
22 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
23 # You should have received a copy of the GNU General Public License
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
24 # along with this program. If not, see <http://www.gnu.org/licenses/>.
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
25 import os
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
26 import sys
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
27 import traceback
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
28 from os.path import dirname as dn, join as jn
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
29
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
30 #to get the rhodecode import
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
31 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
32
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
33 from string import strip
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
34 from shutil import rmtree
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
35
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
36 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
37 from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
38 from whoosh.index import create_in, open_dir
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
39 from whoosh.formats import Characters
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
40 from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
41
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
42 from webhelpers.html.builder import escape
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
43 from sqlalchemy import engine_from_config
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
44 from vcs.utils.lazy import LazyProperty
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
45
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
46 from rhodecode.model import init_model
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
47 from rhodecode.model.scm import ScmModel
1407
2744f5b01d00 Allowing indexing job to resolve repos path on its own if not given.
Jared Bunting <jared.bunting@peachjean.com>
parents: 1354
diff changeset
48 from rhodecode.model.repo import RepoModel
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
49 from rhodecode.config.environment import load_environment
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
50 from rhodecode.lib import LANGUAGES_EXTENSIONS_MAP
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
51 from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
52
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
53 #EXTENSIONS WE WANT TO INDEX CONTENT OFF
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
54 INDEX_EXTENSIONS = LANGUAGES_EXTENSIONS_MAP.keys()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
55
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
56 #CUSTOM ANALYZER wordsplit + lowercase filter
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
57 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
58
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
59
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
60 #INDEX SCHEMA DEFINITION
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
61 SCHEMA = Schema(owner=TEXT(),
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
62 repository=TEXT(stored=True),
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
63 path=TEXT(stored=True),
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
64 content=FieldType(format=Characters(ANALYZER),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
65 scorable=True, stored=True),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
66 modtime=STORED(), extension=TEXT(stored=True))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
67
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
68
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
69 IDX_NAME = 'HG_INDEX'
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
70 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
71 FRAGMENTER = SimpleFragmenter(200)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
72
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
73
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
74 class MakeIndex(BasePasterCommand):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
75
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
76 max_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
77 min_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
78
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
79 usage = "CONFIG_FILE"
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
80 summary = "Creates index for full text search given configuration file"
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
81 group_name = "RhodeCode"
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
82 takes_config_file = -1
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
83 parser = Command.standard_parser(verbose=True)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
84
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
85 def command(self):
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
86
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
87 from pylons import config
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
88 add_cache(config)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
89 engine = engine_from_config(config, 'sqlalchemy.db1.')
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
90 init_model(engine)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
91
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
92 index_location = config['index_dir']
1409
c3172bc09503 Updated contributors and fixed index line length
Marcin Kuzminski <marcin@python-works.com>
parents: 1408
diff changeset
93 repo_location = self.options.repo_location \
c3172bc09503 Updated contributors and fixed index line length
Marcin Kuzminski <marcin@python-works.com>
parents: 1408
diff changeset
94 if self.options.repo_location else RepoModel().repos_path
1183
514efe34c255 fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents: 903
diff changeset
95 repo_list = map(strip, self.options.repo_list.split(',')) \
514efe34c255 fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents: 903
diff changeset
96 if self.options.repo_list else None
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
97
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
98 #======================================================================
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
99 # WHOOSH DAEMON
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
100 #======================================================================
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
101 from rhodecode.lib.pidlock import LockHeld, DaemonLock
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
102 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
103 try:
1540
191f3f08236d fixes #258 RhodeCode 1.2 assumes egg folder is writable
Marcin Kuzminski <marcin@python-works.com>
parents: 1409
diff changeset
104 l = DaemonLock(file_=jn(dn(dn(index_location)), 'make_index.lock'))
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
105 WhooshIndexingDaemon(index_location=index_location,
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
106 repo_location=repo_location,
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
107 repo_list=repo_list)\
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
108 .run(full_index=self.options.full_index)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
109 l.release()
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
110 except LockHeld:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
111 sys.exit(1)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
112
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
113 def update_parser(self):
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
114 self.parser.add_option('--repo-location',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
115 action='store',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
116 dest='repo_location',
1408
93cffcb6fd54 Adding documentation for indexer's self-resolving repos location.
Jared Bunting <jared.bunting@peachjean.com>
parents: 1407
diff changeset
117 help="Specifies repositories location to index OPTIONAL",
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
118 )
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
119 self.parser.add_option('--index-only',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
120 action='store',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
121 dest='repo_list',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
122 help="Specifies a comma separated list of repositores "
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
123 "to build index on OPTIONAL",
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
124 )
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
125 self.parser.add_option('-f',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
126 action='store_true',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
127 dest='full_index',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
128 help="Specifies that index should be made full i.e"
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
129 " destroy old and build from scratch",
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
130 default=False)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
131
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
132 class ResultWrapper(object):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
133 def __init__(self, search_type, searcher, matcher, highlight_items):
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
134 self.search_type = search_type
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
135 self.searcher = searcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
136 self.matcher = matcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
137 self.highlight_items = highlight_items
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
138 self.fragment_size = 200 / 2
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
139
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
140 @LazyProperty
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
141 def doc_ids(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
142 docs_id = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
143 while self.matcher.is_active():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
144 docnum = self.matcher.id()
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
145 chunks = [offsets for offsets in self.get_chunks()]
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
146 docs_id.append([docnum, chunks])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
147 self.matcher.next()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
148 return docs_id
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
149
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
150 def __str__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
151 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
152
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
153 def __repr__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
154 return self.__str__()
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
155
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
156 def __len__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
157 return len(self.doc_ids)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
158
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
159 def __iter__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
160 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
161 Allows Iteration over results,and lazy generate content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
162
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
163 *Requires* implementation of ``__getitem__`` method.
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
164 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
165 for docid in self.doc_ids:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
166 yield self.get_full_content(docid)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
167
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
168 def __getitem__(self, key):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
169 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
170 Slicing of resultWrapper
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
171 """
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
172 i, j = key.start, key.stop
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
173
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
174 slice = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
175 for docid in self.doc_ids[i:j]:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
176 slice.append(self.get_full_content(docid))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
177 return slice
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
178
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
179
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
180 def get_full_content(self, docid):
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
181 res = self.searcher.stored_fields(docid[0])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
182 f_path = res['path'][res['path'].find(res['repository']) \
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
183 + len(res['repository']):].lstrip('/')
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
184
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
185 content_short = self.get_short_content(res, docid[1])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
186 res.update({'content_short':content_short,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
187 'content_short_hl':self.highlight(content_short),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
188 'f_path':f_path})
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
189
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
190 return res
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
191
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
192 def get_short_content(self, res, chunks):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
193
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
194 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
195
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
196 def get_chunks(self):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
197 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
198 Smart function that implements chunking the content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
199 but not overlap chunks so it doesn't highlight the same
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
200 close occurrences twice.
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
201
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
202 :param matcher:
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
203 :param size:
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
204 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
205 memory = [(0, 0)]
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
206 for span in self.matcher.spans():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
207 start = span.startchar or 0
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
208 end = span.endchar or 0
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
209 start_offseted = max(0, start - self.fragment_size)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
210 end_offseted = end + self.fragment_size
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
211
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
212 if start_offseted < memory[-1][1]:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
213 start_offseted = memory[-1][1]
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
214 memory.append((start_offseted, end_offseted,))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
215 yield (start_offseted, end_offseted,)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
216
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
217 def highlight(self, content, top=5):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
218 if self.search_type != 'content':
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
219 return ''
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
220 hl = highlight(escape(content),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
221 self.highlight_items,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
222 analyzer=ANALYZER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
223 fragmenter=FRAGMENTER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
224 formatter=FORMATTER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
225 top=top)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
226 return hl