annotate rhodecode/lib/indexers/__init__.py @ 1203:6832ef664673 beta

source code cleanup: remove trailing white space, normalize file endings
author Marcin Kuzminski <marcin@python-works.com>
date Sun, 03 Apr 2011 18:23:15 +0200
parents 02a7f263a849
children a671db5bdd58
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
1 # -*- coding: utf-8 -*-
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
2 """
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
3 rhodecode.lib.indexers.__init__
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
5
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
6 Whoosh indexing module for RhodeCode
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
7
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
8 :created_on: Aug 17, 2010
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
9 :author: marcink
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
10 :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
11 :license: GPLv3, see COPYING for more details.
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
12 """
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
13 # This program is free software; you can redistribute it and/or
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
14 # modify it under the terms of the GNU General Public License
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
15 # as published by the Free Software Foundation; version 2
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
16 # of the License or (at your opinion) any later version of the license.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
17 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
18 # This program is distributed in the hope that it will be useful,
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
21 # GNU General Public License for more details.
1203
6832ef664673 source code cleanup: remove trailing white space, normalize file endings
Marcin Kuzminski <marcin@python-works.com>
parents: 1198
diff changeset
22 #
903
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
23 # You should have received a copy of the GNU General Public License
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
24 # along with this program; if not, write to the Free Software
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
25 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
04c9bb9ca6d6 code docs, updates
Marcin Kuzminski <marcin@python-works.com>
parents: 894
diff changeset
26 # MA 02110-1301, USA.
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
27 import os
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
28 import sys
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
29 import traceback
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
30 from os.path import dirname as dn, join as jn
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
31
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
32 #to get the rhodecode import
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
33 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
34
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
35 from string import strip
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
36
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
37 from rhodecode.model import init_model
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
38 from rhodecode.model.scm import ScmModel
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
39 from rhodecode.config.environment import load_environment
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
40 from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
41
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
42 from shutil import rmtree
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
43 from webhelpers.html.builder import escape
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
44 from vcs.utils.lazy import LazyProperty
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
45
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
46 from sqlalchemy import engine_from_config
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
47
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
48 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
49 from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
50 from whoosh.index import create_in, open_dir
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
51 from whoosh.formats import Characters
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
52 from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
53
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
54
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
55 #EXTENSIONS WE WANT TO INDEX CONTENT OFF
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
56 INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
57 'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
58 'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
59 'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
60 'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
61 'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
62 'yaws']
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
63
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
64 #CUSTOM ANALYZER wordsplit + lowercase filter
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
65 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
66
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
67
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
68 #INDEX SCHEMA DEFINITION
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
69 SCHEMA = Schema(owner=TEXT(),
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
70 repository=TEXT(stored=True),
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
71 path=TEXT(stored=True),
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
72 content=FieldType(format=Characters(ANALYZER),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
73 scorable=True, stored=True),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
74 modtime=STORED(), extension=TEXT(stored=True))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
75
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
76
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
77 IDX_NAME = 'HG_INDEX'
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
78 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
79 FRAGMENTER = SimpleFragmenter(200)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
80
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
81
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
82 class MakeIndex(BasePasterCommand):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
83
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
84 max_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
85 min_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
86
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
87 usage = "CONFIG_FILE"
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
88 summary = "Creates index for full text search given configuration file"
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
89 group_name = "RhodeCode"
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
90 takes_config_file = -1
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
91 parser = Command.standard_parser(verbose=True)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
92
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
93 def command(self):
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
94
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
95 from pylons import config
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
96 add_cache(config)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
97 engine = engine_from_config(config, 'sqlalchemy.db1.')
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
98 init_model(engine)
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
99
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
100 index_location = config['index_dir']
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
101 repo_location = self.options.repo_location
1183
514efe34c255 fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents: 903
diff changeset
102 repo_list = map(strip, self.options.repo_list.split(',')) \
514efe34c255 fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents: 903
diff changeset
103 if self.options.repo_list else None
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
104
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
105 #======================================================================
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
106 # WHOOSH DAEMON
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
107 #======================================================================
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
108 from rhodecode.lib.pidlock import LockHeld, DaemonLock
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
109 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
110 try:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
111 l = DaemonLock()
683
341beaa9edba Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents: 631
diff changeset
112 WhooshIndexingDaemon(index_location=index_location,
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
113 repo_location=repo_location,
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
114 repo_list=repo_list)\
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
115 .run(full_index=self.options.full_index)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
116 l.release()
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
117 except LockHeld:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
118 sys.exit(1)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
119
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
120 def update_parser(self):
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
121 self.parser.add_option('--repo-location',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
122 action='store',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
123 dest='repo_location',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
124 help="Specifies repositories location to index REQUIRED",
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
125 )
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
126 self.parser.add_option('--index-only',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
127 action='store',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
128 dest='repo_list',
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
129 help="Specifies a comma separated list of repositores "
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
130 "to build index on OPTIONAL",
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 785
diff changeset
131 )
785
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
132 self.parser.add_option('-f',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
133 action='store_true',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
134 dest='full_index',
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
135 help="Specifies that index should be made full i.e"
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
136 " destroy old and build from scratch",
277427ac29a9 complete rewrite of paster commands,
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
137 default=False)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
138
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
139 class ResultWrapper(object):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
140 def __init__(self, search_type, searcher, matcher, highlight_items):
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
141 self.search_type = search_type
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
142 self.searcher = searcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
143 self.matcher = matcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
144 self.highlight_items = highlight_items
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
145 self.fragment_size = 200 / 2
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
146
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
147 @LazyProperty
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
148 def doc_ids(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
149 docs_id = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
150 while self.matcher.is_active():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
151 docnum = self.matcher.id()
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
152 chunks = [offsets for offsets in self.get_chunks()]
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
153 docs_id.append([docnum, chunks])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
154 self.matcher.next()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
155 return docs_id
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
156
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
157 def __str__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
158 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
159
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
160 def __repr__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
161 return self.__str__()
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
162
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
163 def __len__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
164 return len(self.doc_ids)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
165
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
166 def __iter__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
167 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
168 Allows Iteration over results,and lazy generate content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
169
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
170 *Requires* implementation of ``__getitem__`` method.
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
171 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
172 for docid in self.doc_ids:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
173 yield self.get_full_content(docid)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
174
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
175 def __getitem__(self, key):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
176 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
177 Slicing of resultWrapper
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
178 """
1198
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
179 i, j = key.start, key.stop
02a7f263a849 fixed issue with latest webhelpers pagination module
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
180
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
181 slice = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
182 for docid in self.doc_ids[i:j]:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
183 slice.append(self.get_full_content(docid))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
184 return slice
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
185
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
186
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
187 def get_full_content(self, docid):
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
188 res = self.searcher.stored_fields(docid[0])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
189 f_path = res['path'][res['path'].find(res['repository']) \
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
190 + len(res['repository']):].lstrip('/')
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
191
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
192 content_short = self.get_short_content(res, docid[1])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
193 res.update({'content_short':content_short,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
194 'content_short_hl':self.highlight(content_short),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
195 'f_path':f_path})
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
196
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
197 return res
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
198
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
199 def get_short_content(self, res, chunks):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
200
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
201 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
202
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
203 def get_chunks(self):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
204 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
205 Smart function that implements chunking the content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
206 but not overlap chunks so it doesn't highlight the same
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
207 close occurrences twice.
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
208 @param matcher:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
209 @param size:
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
210 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
211 memory = [(0, 0)]
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
212 for span in self.matcher.spans():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
213 start = span.startchar or 0
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
214 end = span.endchar or 0
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
215 start_offseted = max(0, start - self.fragment_size)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
216 end_offseted = end + self.fragment_size
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
217
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
218 if start_offseted < memory[-1][1]:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
219 start_offseted = memory[-1][1]
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
220 memory.append((start_offseted, end_offseted,))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
221 yield (start_offseted, end_offseted,)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
222
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
223 def highlight(self, content, top=5):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
224 if self.search_type != 'content':
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
225 return ''
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
226 hl = highlight(escape(content),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
227 self.highlight_items,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
228 analyzer=ANALYZER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
229 fragmenter=FRAGMENTER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
230 formatter=FORMATTER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
231 top=top)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
232 return hl