annotate rhodecode/lib/indexers/__init__.py @ 2640:5f21a9dcb09d beta

create an index for commit messages and the ability to search them and see results
author Indra Talip <indra.talip@gmail.com>
date Fri, 20 Jul 2012 12:50:56 +0200
parents 324b838250c9
children 88b0e82bcba4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
1 # -*- coding: utf-8 -*-
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
2 """
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
3 rhodecode.lib.indexers.__init__
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
5
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
6 Whoosh indexing module for RhodeCode
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
7
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
8 :created_on: Aug 17, 2010
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
9 :author: marcink
1824
89efedac4e6c 2012 copyrights
Marcin Kuzminski <marcin@python-works.com>
parents: 1810
diff changeset
10 :copyright: (C) 2010-2012 Marcin Kuzminski <marcin@python-works.com>
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
11 :license: GPLv3, see COPYING for more details.
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
12 """
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
13 # This program is free software: you can redistribute it and/or modify
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
14 # it under the terms of the GNU General Public License as published by
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
15 # the Free Software Foundation, either version 3 of the License, or
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
16 # (at your option) any later version.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
17 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
18 # This program is distributed in the hope that it will be useful,
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
21 # GNU General Public License for more details.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
22 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
23 # You should have received a copy of the GNU General Public License
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1203
diff changeset
24 # along with this program. If not, see <http://www.gnu.org/licenses/>.
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
25 import os
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
26 import sys
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
27 import traceback
2102
04d26165c3d9 Whoosh logging is now controlled by the .ini files logging setup
Marcin Kuzminski <marcin@python-works.com>
parents: 1995
diff changeset
28 import logging
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
29 from os.path import dirname as dn, join as jn
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
30
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
31 #to get the rhodecode import
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
32 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
33
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
34 from string import strip
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
35 from shutil import rmtree
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
36
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
37 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
38 from whoosh.fields import TEXT, ID, STORED, NUMERIC, BOOLEAN, Schema, FieldType
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
39 from whoosh.index import create_in, open_dir
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
40 from whoosh.formats import Characters
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
41 from whoosh.highlight import highlight, HtmlFormatter, ContextFragmenter
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
42
2389
324b838250c9 UI fixes for searching
Marcin Kuzminski <marcin@python-works.com>
parents: 2388
diff changeset
43 from webhelpers.html.builder import escape, literal
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
44 from sqlalchemy import engine_from_config
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
45
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
46 from rhodecode.model import init_model
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
47 from rhodecode.model.scm import ScmModel
1407
2744f5b01d00 Allowing indexing job to resolve repos path on its own if not given.
Jared Bunting <jared.bunting@peachjean.com>
parents: 1354
diff changeset
48 from rhodecode.model.repo import RepoModel
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
49 from rhodecode.config.environment import load_environment
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
50 from rhodecode.lib.utils2 import LazyProperty
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
51 from rhodecode.lib.utils import BasePasterCommand, Command, add_cache,\
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
52 load_rcextensions
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
53
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
54 log = logging.getLogger(__name__)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
55
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
56 # CUSTOM ANALYZER wordsplit + lowercase filter
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
57 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
58
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
59 #INDEX SCHEMA DEFINITION
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
60 SCHEMA = Schema(
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
61 fileid=ID(unique=True),
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
62 owner=TEXT(),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
63 repository=TEXT(stored=True),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
64 path=TEXT(stored=True),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
65 content=FieldType(format=Characters(), analyzer=ANALYZER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
66 scorable=True, stored=True),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
67 modtime=STORED(),
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
68 extension=TEXT(stored=True)
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
69 )
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
70
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
71 IDX_NAME = 'HG_INDEX'
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
72 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
73 FRAGMENTER = ContextFragmenter(200)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
74
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
75 CHGSETS_SCHEMA = Schema(
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
76 path=ID(unique=True, stored=True),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
77 revision=NUMERIC(unique=True, stored=True),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
78 last=BOOLEAN(),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
79 owner=TEXT(),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
80 repository=ID(unique=True, stored=True),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
81 author=TEXT(stored=True),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
82 message=FieldType(format=Characters(), analyzer=ANALYZER,
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
83 scorable=True, stored=True),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
84 parents=TEXT(),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
85 added=TEXT(),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
86 removed=TEXT(),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
87 changed=TEXT(),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
88 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
89
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
90 CHGSET_IDX_NAME = 'CHGSET_INDEX'
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
91
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
92 class MakeIndex(BasePasterCommand):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
93
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
94 max_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
95 min_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
96
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
97 usage = "CONFIG_FILE"
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
98 summary = "Creates index for full text search given configuration file"
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
99 group_name = "RhodeCode"
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
100 takes_config_file = -1
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
101 parser = Command.standard_parser(verbose=True)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
102
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
103 def command(self):
2102
04d26165c3d9 Whoosh logging is now controlled by the .ini files logging setup
Marcin Kuzminski <marcin@python-works.com>
parents: 1995
diff changeset
104 logging.config.fileConfig(self.path_to_ini_file)
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
105 from pylons import config
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
106 add_cache(config)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
107 engine = engine_from_config(config, 'sqlalchemy.db1.')
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
108 init_model(engine)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
109 index_location = config['index_dir']
1409
c3172bc09503 Updated contributors and fixed index line length
Marcin Kuzminski <marcin@python-works.com>
parents: 1408
diff changeset
110 repo_location = self.options.repo_location \
c3172bc09503 Updated contributors and fixed index line length
Marcin Kuzminski <marcin@python-works.com>
parents: 1408
diff changeset
111 if self.options.repo_location else RepoModel().repos_path
1183
514efe34c255 fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents: 903
diff changeset
112 repo_list = map(strip, self.options.repo_list.split(',')) \
514efe34c255 fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents: 903
diff changeset
113 if self.options.repo_list else None
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
114 repo_update_list = map(strip, self.options.repo_update_list.split(',')) \
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
115 if self.options.repo_update_list else None
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2105
diff changeset
116 load_rcextensions(config['here'])
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
117 #======================================================================
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
118 # WHOOSH DAEMON
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
119 #======================================================================
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
120 from rhodecode.lib.pidlock import LockHeld, DaemonLock
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
121 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
122 try:
1540
191f3f08236d fixes #258 RhodeCode 1.2 assumes egg folder is writable
Marcin Kuzminski <marcin@python-works.com>
parents: 1409
diff changeset
123 l = DaemonLock(file_=jn(dn(dn(index_location)), 'make_index.lock'))
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
124 WhooshIndexingDaemon(index_location=index_location,
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
125 repo_location=repo_location,
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
126 repo_list=repo_list,
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
127 repo_update_list=repo_update_list)\
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
128 .run(full_index=self.options.full_index)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
129 l.release()
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
130 except LockHeld:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
131 sys.exit(1)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
132
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
133 def update_parser(self):
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
134 self.parser.add_option('--repo-location',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
135 action='store',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
136 dest='repo_location',
1408
93cffcb6fd54 Adding documentation for indexer's self-resolving repos location.
Jared Bunting <jared.bunting@peachjean.com>
parents: 1407
diff changeset
137 help="Specifies repositories location to index OPTIONAL",
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
138 )
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
139 self.parser.add_option('--index-only',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
140 action='store',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
141 dest='repo_list',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
142 help="Specifies a comma separated list of repositores "
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
143 "to build index on. If not given all repositories "
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
144 "are scanned for indexing. OPTIONAL",
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
145 )
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
146 self.parser.add_option('--update-only',
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
147 action='store',
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
148 dest='repo_update_list',
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
149 help="Specifies a comma separated list of repositores "
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2319
diff changeset
150 "to re-build index on. OPTIONAL",
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
151 )
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
152 self.parser.add_option('-f',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
153 action='store_true',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
154 dest='full_index',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
155 help="Specifies that index should be made full i.e"
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
156 " destroy old and build from scratch",
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
157 default=False)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
158
1810
203af05539e0 implements #330 api method for listing nodes at particular revision
Marcin Kuzminski <marcin@python-works.com>
parents: 1540
diff changeset
159
2319
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
160 class WhooshResultWrapper(object):
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
161 def __init__(self, search_type, searcher, matcher, highlight_items,
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
162 repo_location):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
163 self.search_type = search_type
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
164 self.searcher = searcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
165 self.matcher = matcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
166 self.highlight_items = highlight_items
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
167 self.fragment_size = 200
2319
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
168 self.repo_location = repo_location
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
169
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
170 @LazyProperty
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
171 def doc_ids(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
172 docs_id = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
173 while self.matcher.is_active():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
174 docnum = self.matcher.id()
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
175 chunks = [offsets for offsets in self.get_chunks()]
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
176 docs_id.append([docnum, chunks])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
177 self.matcher.next()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
178 return docs_id
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
179
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
180 def __str__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
181 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
182
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
183 def __repr__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
184 return self.__str__()
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
185
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
186 def __len__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
187 return len(self.doc_ids)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
188
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
189 def __iter__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
190 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
191 Allows Iteration over results,and lazy generate content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
192
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
193 *Requires* implementation of ``__getitem__`` method.
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
194 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
195 for docid in self.doc_ids:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
196 yield self.get_full_content(docid)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
197
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
198 def __getitem__(self, key):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
199 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
200 Slicing of resultWrapper
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
201 """
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
202 i, j = key.start, key.stop
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
203
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
204 slices = []
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
205 for docid in self.doc_ids[i:j]:
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
206 slices.append(self.get_full_content(docid))
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
207 return slices
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
208
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
209 def get_full_content(self, docid):
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
210 res = self.searcher.stored_fields(docid[0])
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
211 log.debug('result: %s' % res)
2319
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
212 full_repo_path = jn(self.repo_location, res['repository'])
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
213 f_path = res['path'].split(full_repo_path)[-1]
4c239e0dcbb7 fixes issue #454 Search results under Windows include preceeding backslash
Marcin Kuzminski <marcin@python-works.com>
parents: 2109
diff changeset
214 f_path = f_path.lstrip(os.sep)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
215 res.update({'f_path': f_path})
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
216
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
217 if self.search_type == 'content':
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
218 content_short = self.get_short_content(res, docid[1])
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
219 res.update({'content_short': content_short,
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
220 'content_short_hl': self.highlight(content_short)})
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
221 elif self.search_type == 'message':
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
222 res.update({'message_hl': self.highlight(res['message'])})
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
223
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
224 log.debug('result: %s' % res)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
225
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
226 return res
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
227
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
228 def get_short_content(self, res, chunks):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
229
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
230 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
231
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
232 def get_chunks(self):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
233 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
234 Smart function that implements chunking the content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
235 but not overlap chunks so it doesn't highlight the same
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
236 close occurrences twice.
1810
203af05539e0 implements #330 api method for listing nodes at particular revision
Marcin Kuzminski <marcin@python-works.com>
parents: 1540
diff changeset
237
1302
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
238 :param matcher:
f0e904651f21 moved LANGUAGE_EXTENSION_MAP to lib, and made whoosh indexer use the same map
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
239 :param size:
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
240 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
241 memory = [(0, 0)]
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
242 if self.matcher.supports('positions'):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
243 for span in self.matcher.spans():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
244 start = span.startchar or 0
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
245 end = span.endchar or 0
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
246 start_offseted = max(0, start - self.fragment_size)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
247 end_offseted = end + self.fragment_size
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
248
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
249 if start_offseted < memory[-1][1]:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
250 start_offseted = memory[-1][1]
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
251 memory.append((start_offseted, end_offseted,))
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
252 yield (start_offseted, end_offseted,)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
253
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
254 def highlight(self, content, top=5):
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2389
diff changeset
255 if self.search_type not in ['content', 'message']:
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
256 return ''
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
257 hl = highlight(
2389
324b838250c9 UI fixes for searching
Marcin Kuzminski <marcin@python-works.com>
parents: 2388
diff changeset
258 text=content,
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
259 terms=self.highlight_items,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
260 analyzer=ANALYZER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
261 fragmenter=FRAGMENTER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
262 formatter=FORMATTER,
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
263 top=top
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
264 )
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
265 return hl