annotate rhodecode/lib/indexers/__init__.py @ 2165:dc2584ba5fbc

merged beta into default branch
author Marcin Kuzminski <marcin@python-works.com>
date Wed, 28 Mar 2012 19:54:16 +0200
parents 82a88013a3fd 8ecfed1d8f8b
children 63e58ef80ef1
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
1 # -*- coding: utf-8 -*-
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
2 """
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
3 rhodecode.lib.indexers.__init__
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
5
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
6 Whoosh indexing module for RhodeCode
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
7
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
8 :created_on: Aug 17, 2010
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
9 :author: marcink
1824
89efedac4e6c 2012 copyrights
Marcin Kuzminski <marcin@python-works.com>
parents: 1810
diff changeset
10 :copyright: (C) 2010-2012 Marcin Kuzminski <marcin@python-works.com>
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
11 :license: GPLv3, see COPYING for more details.
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
12 """
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
13 # This program is free software: you can redistribute it and/or modify
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
14 # it under the terms of the GNU General Public License as published by
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
15 # the Free Software Foundation, either version 3 of the License, or
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
16 # (at your option) any later version.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
17 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
18 # This program is distributed in the hope that it will be useful,
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
21 # GNU General Public License for more details.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
22 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
23 # You should have received a copy of the GNU General Public License
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
24 # along with this program. If not, see <http://www.gnu.org/licenses/>.
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
25 import os
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
26 import sys
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
27 import traceback
2102
04d26165c3d9 Whoosh logging is now controlled by the .ini files logging setup
Marcin Kuzminski <marcin@python-works.com>
parents: 1995
diff changeset
28 import logging
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
29 from os.path import dirname as dn, join as jn
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
30
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
31 #to get the rhodecode import
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
32 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
33
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
34 from string import strip
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
35 from shutil import rmtree
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
36
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
37 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
38 from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
39 from whoosh.index import create_in, open_dir
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
40 from whoosh.formats import Characters
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
41 from whoosh.highlight import highlight, HtmlFormatter, ContextFragmenter
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
42
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
43 from webhelpers.html.builder import escape
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
44 from sqlalchemy import engine_from_config
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
45
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
46 from rhodecode.model import init_model
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
47 from rhodecode.model.scm import ScmModel
1407
2744f5b01d00 Allowing indexing job to resolve repos path on its own if not given.
Jared Bunting <jared.bunting@peachjean.com>
parents: 1354
diff changeset
48 from rhodecode.model.repo import RepoModel
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
49 from rhodecode.config.environment import load_environment
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
50 from rhodecode.lib.utils2 import LazyProperty
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
51 from rhodecode.lib.utils import BasePasterCommand, Command, add_cache,\
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
52 load_rcextensions
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
53
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
54 # CUSTOM ANALYZER wordsplit + lowercase filter
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
55 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
56
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
57
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
58 #INDEX SCHEMA DEFINITION
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
59 SCHEMA = Schema(
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
60 owner=TEXT(),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
61 repository=TEXT(stored=True),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
62 path=TEXT(stored=True),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
63 content=FieldType(format=Characters(), analyzer=ANALYZER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
64 scorable=True, stored=True),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
65 modtime=STORED(),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
66 extension=TEXT(stored=True)
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
67 )
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
68
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
69 IDX_NAME = 'HG_INDEX'
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
70 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
71 FRAGMENTER = ContextFragmenter(200)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
72
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
73
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
74 class MakeIndex(BasePasterCommand):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
75
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
76 max_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
77 min_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
78
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
79 usage = "CONFIG_FILE"
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
80 summary = "Creates index for full text search given configuration file"
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
81 group_name = "RhodeCode"
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
82 takes_config_file = -1
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
83 parser = Command.standard_parser(verbose=True)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
84
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
85 def command(self):
2102
04d26165c3d9 Whoosh logging is now controlled by the .ini files logging setup
Marcin Kuzminski <marcin@python-works.com>
parents: 1995
diff changeset
86 logging.config.fileConfig(self.path_to_ini_file)
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
87 from pylons import config
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
88 add_cache(config)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
89 engine = engine_from_config(config, 'sqlalchemy.db1.')
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
90 init_model(engine)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
91 index_location = config['index_dir']
1409
c3172bc09503 Updated contributors and fixed index line length
Marcin Kuzminski <marcin@python-works.com>
parents: 1408
diff changeset
92 repo_location = self.options.repo_location \
c3172bc09503 Updated contributors and fixed index line length
Marcin Kuzminski <marcin@python-works.com>
parents: 1408
diff changeset
93 if self.options.repo_location else RepoModel().repos_path
1183
514efe34c255 fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents: 903
diff changeset
94 repo_list = map(strip, self.options.repo_list.split(',')) \
514efe34c255 fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents: 903
diff changeset
95 if self.options.repo_list else None
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
96 load_rcextensions(config['here'])
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
97 #======================================================================
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
98 # WHOOSH DAEMON
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
99 #======================================================================
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
100 from rhodecode.lib.pidlock import LockHeld, DaemonLock
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
101 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
102 try:
1540
191f3f08236d fixes #258 RhodeCode 1.2 assumes egg folder is writable
Marcin Kuzminski <marcin@python-works.com>
parents: 1409
diff changeset
103 l = DaemonLock(file_=jn(dn(dn(index_location)), 'make_index.lock'))
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
104 WhooshIndexingDaemon(index_location=index_location,
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
105 repo_location=repo_location,
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
106 repo_list=repo_list,)\
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
107 .run(full_index=self.options.full_index)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
108 l.release()
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
109 except LockHeld:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
110 sys.exit(1)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
111
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
112 def update_parser(self):
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
113 self.parser.add_option('--repo-location',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
114 action='store',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
115 dest='repo_location',
1408
93cffcb6fd54 Adding documentation for indexer's self-resolving repos location.
Jared Bunting <jared.bunting@peachjean.com>
parents: 1407
diff changeset
116 help="Specifies repositories location to index OPTIONAL",
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
117 )
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
118 self.parser.add_option('--index-only',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
119 action='store',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
120 dest='repo_list',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
121 help="Specifies a comma separated list of repositores "
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
122 "to build index on OPTIONAL",
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
123 )
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
124 self.parser.add_option('-f',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
125 action='store_true',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
126 dest='full_index',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
127 help="Specifies that index should be made full i.e"
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
128 " destroy old and build from scratch",
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
129 default=False)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
130
1810
203af05539e0 implements #330 api method for listing nodes at particular revision
Marcin Kuzminski <marcin@python-works.com>
parents: 1540
diff changeset
131
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
132 class ResultWrapper(object):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
133 def __init__(self, search_type, searcher, matcher, highlight_items):
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
134 self.search_type = search_type
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
135 self.searcher = searcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
136 self.matcher = matcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
137 self.highlight_items = highlight_items
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
138 self.fragment_size = 200
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
139
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
140 @LazyProperty
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
141 def doc_ids(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
142 docs_id = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
143 while self.matcher.is_active():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
144 docnum = self.matcher.id()
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
145 chunks = [offsets for offsets in self.get_chunks()]
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
146 docs_id.append([docnum, chunks])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
147 self.matcher.next()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
148 return docs_id
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
149
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
150 def __str__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
151 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
152
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
153 def __repr__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
154 return self.__str__()
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
155
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
156 def __len__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
157 return len(self.doc_ids)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
158
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
159 def __iter__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
160 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
161 Allows Iteration over results,and lazy generate content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
162
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
163 *Requires* implementation of ``__getitem__`` method.
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
164 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
165 for docid in self.doc_ids:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
166 yield self.get_full_content(docid)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
167
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
168 def __getitem__(self, key):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
169 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
170 Slicing of resultWrapper
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
171 """
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
172 i, j = key.start, key.stop
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
173
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
174 slices = []
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
175 for docid in self.doc_ids[i:j]:
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
176 slices.append(self.get_full_content(docid))
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
177 return slices
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
178
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
179 def get_full_content(self, docid):
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
180 res = self.searcher.stored_fields(docid[0])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
181 f_path = res['path'][res['path'].find(res['repository']) \
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
182 + len(res['repository']):].lstrip('/')
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
183
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
184 content_short = self.get_short_content(res, docid[1])
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
185 res.update({'content_short': content_short,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
186 'content_short_hl': self.highlight(content_short),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
187 'f_path': f_path})
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
188
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
189 return res
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
190
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
191 def get_short_content(self, res, chunks):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
192
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
193 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
194
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
195 def get_chunks(self):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
196 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
197 Smart function that implements chunking the content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
198 but not overlap chunks so it doesn't highlight the same
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
199 close occurrences twice.
1810
203af05539e0 implements #330 api method for listing nodes at particular revision
Marcin Kuzminski <marcin@python-works.com>
parents: 1540
diff changeset
200
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
201 :param matcher:
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
202 :param size:
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
203 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
204 memory = [(0, 0)]
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
205 for span in self.matcher.spans():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
206 start = span.startchar or 0
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
207 end = span.endchar or 0
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
208 start_offseted = max(0, start - self.fragment_size)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
209 end_offseted = end + self.fragment_size
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
210
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
211 if start_offseted < memory[-1][1]:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
212 start_offseted = memory[-1][1]
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
213 memory.append((start_offseted, end_offseted,))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
214 yield (start_offseted, end_offseted,)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
215
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
216 def highlight(self, content, top=5):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
217 if self.search_type != 'content':
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
218 return ''
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
219 hl = highlight(
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
220 text=escape(content),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
221 terms=self.highlight_items,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
222 analyzer=ANALYZER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
223 fragmenter=FRAGMENTER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
224 formatter=FORMATTER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
225 top=top
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
226 )
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
227 return hl