annotate rhodecode/lib/indexers/daemon.py @ 3921:932c84e8fa92 beta

fixed #851 and #563 make-index crashes on non-ascii files
author Marcin Kuzminski <marcin@python-works.com>
date Thu, 30 May 2013 22:37:08 +0200
parents ba08786c49ef
children d8e02de53bbc
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
1 # -*- coding: utf-8 -*-
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
2 """
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
3 rhodecode.lib.indexers.daemon
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
5
1377
78e5853df5c8 fixed daemon typos
Marcin Kuzminski <marcin@python-works.com>
parents: 1206
diff changeset
6 A daemon will read from task table and run tasks
947
99850ac883d1 Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents: 902
diff changeset
7
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
8 :created_on: Jan 26, 2010
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
9 :author: marcink
1824
89efedac4e6c 2012 copyrights
Marcin Kuzminski <marcin@python-works.com>
parents: 1711
diff changeset
10 :copyright: (C) 2010-2012 Marcin Kuzminski <marcin@python-works.com>
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
11 :license: GPLv3, see COPYING for more details.
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
12 """
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
13 # This program is free software: you can redistribute it and/or modify
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
14 # it under the terms of the GNU General Public License as published by
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
15 # the Free Software Foundation, either version 3 of the License, or
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
16 # (at your option) any later version.
947
99850ac883d1 Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents: 902
diff changeset
17 #
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
18 # This program is distributed in the hope that it will be useful,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
21 # GNU General Public License for more details.
947
99850ac883d1 Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents: 902
diff changeset
22 #
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
23 # You should have received a copy of the GNU General Public License
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
24 # along with this program. If not, see <http://www.gnu.org/licenses/>.
2641
cfcd981d6679 import with_statment to make daemon.py python 2.5 compatible
Indra Talip <indra.talip@gmail.com>
parents: 2640
diff changeset
25 from __future__ import with_statement
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
26
1154
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
27 import os
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
28 import sys
1154
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
29 import logging
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
30 import traceback
1154
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
31
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
32 from shutil import rmtree
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
33 from time import mktime
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
34
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
35 from os.path import dirname as dn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
36 from os.path import join as jn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
37
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
38 #to get the rhodecode import
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
39 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
40 sys.path.append(project_path)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
41
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
42 from rhodecode.config.conf import INDEX_EXTENSIONS
691
7486da5f0628 Refactor codes for scm model
Marcin Kuzminski <marcin@python-works.com>
parents: 683
diff changeset
43 from rhodecode.model.scm import ScmModel
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
44 from rhodecode.model.db import Repository
3016
b3c8a3a5ce5f fixed issues some people had with whoosh indexers and unicode decode
Marcin Kuzminski <marcin@python-works.com>
parents: 2841
diff changeset
45 from rhodecode.lib.utils2 import safe_unicode, safe_str
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
46 from rhodecode.lib.indexers import SCHEMA, IDX_NAME, CHGSETS_SCHEMA, \
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
47 CHGSET_IDX_NAME
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
48
2007
324ac367a4da Added VCS into rhodecode core for faster and easier deployments of new versions
Marcin Kuzminski <marcin@python-works.com>
parents: 1995
diff changeset
49 from rhodecode.lib.vcs.exceptions import ChangesetError, RepositoryError, \
1711
b369bec5d468 fixes issue with whoosh reindexing files that were removed or renamed
Marcin Kuzminski <marcin@python-works.com>
parents: 1451
diff changeset
50 NodeDoesNotExistError
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
51
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
52 from whoosh.index import create_in, open_dir, exists_in
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
53 from whoosh.query import *
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
54 from whoosh.qparser import QueryParser
1154
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
55
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
56 log = logging.getLogger('whoosh_indexer')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
57
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
58
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
59 class WhooshIndexingDaemon(object):
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
60 """
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
61 Daemon for atomic indexing jobs
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
62 """
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
63
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
64 def __init__(self, indexname=IDX_NAME, index_location=None,
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
65 repo_location=None, sa=None, repo_list=None,
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
66 repo_update_list=None):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
67 self.indexname = indexname
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
68
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
69 self.index_location = index_location
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
70 if not index_location:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
71 raise Exception('You have to provide index location')
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
72
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
73 self.repo_location = repo_location
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
74 if not repo_location:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
75 raise Exception('You have to provide repositories location')
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
76
1036
405b80e4ccd5 Major refactoring, removed when possible calls to app globals.
Marcin Kuzminski <marcin@python-works.com>
parents: 947
diff changeset
77 self.repo_paths = ScmModel(sa).repo_scan(self.repo_location)
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
78
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
79 #filter repo list
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
80 if repo_list:
2841
2fa3c09f63e0 fixed problems with re-indexing non-ascii names of repositories
Marcin Kuzminski <marcin@python-works.com>
parents: 2840
diff changeset
81 #Fix non-ascii repo names to unicode
2fa3c09f63e0 fixed problems with re-indexing non-ascii names of repositories
Marcin Kuzminski <marcin@python-works.com>
parents: 2840
diff changeset
82 repo_list = map(safe_unicode, repo_list)
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
83 self.filtered_repo_paths = {}
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
84 for repo_name, repo in self.repo_paths.items():
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
85 if repo_name in repo_list:
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
86 self.filtered_repo_paths[repo_name] = repo
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
87
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
88 self.repo_paths = self.filtered_repo_paths
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
89
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
90 #filter update repo list
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
91 self.filtered_repo_update_paths = {}
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
92 if repo_update_list:
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
93 self.filtered_repo_update_paths = {}
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
94 for repo_name, repo in self.repo_paths.items():
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
95 if repo_name in repo_update_list:
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
96 self.filtered_repo_update_paths[repo_name] = repo
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
97 self.repo_paths = self.filtered_repo_update_paths
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
98
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
99 self.initial = True
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
100 if not os.path.isdir(self.index_location):
763
0dad296d2a57 extended trending languages to more entries, implemented new faster and "fancy"
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
101 os.makedirs(self.index_location)
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
102 log.info('Cannot run incremental index since it does not '
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
103 'yet exist running full build')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
104 elif not exists_in(self.index_location, IDX_NAME):
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
105 log.info('Running full index build as the file content '
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
106 'index does not exist')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
107 elif not exists_in(self.index_location, CHGSET_IDX_NAME):
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
108 log.info('Running full index build as the changeset '
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
109 'index does not exist')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
110 else:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
111 self.initial = False
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
112
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
113 def _get_index_revision(self, repo):
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
114 db_repo = Repository.get_by_repo_name(repo.name)
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
115 landing_rev = 'tip'
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
116 if db_repo:
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
117 landing_rev = db_repo.landing_rev
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
118 return landing_rev
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
119
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
120 def _get_index_changeset(self, repo):
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
121 index_rev = self._get_index_revision(repo)
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
122 cs = repo.get_changeset(index_rev)
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
123 return cs
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
124
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
125 def get_paths(self, repo):
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
126 """
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
127 recursive walk in root dir and return a set of all path in that dir
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
128 based on repository walk function
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
129 """
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
130 index_paths_ = set()
567
80dc0a23edf7 fixed whoosh failure on new repository
Marcin Kuzminski <marcin@python-works.com>
parents: 561
diff changeset
131 try:
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
132 cs = self._get_index_changeset(repo)
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
133 for _topnode, _dirs, files in cs.walk('/'):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
134 for f in files:
3016
b3c8a3a5ce5f fixed issues some people had with whoosh indexers and unicode decode
Marcin Kuzminski <marcin@python-works.com>
parents: 2841
diff changeset
135 index_paths_.add(jn(safe_str(repo.path), safe_str(f.path)))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
136
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
137 except RepositoryError:
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
138 log.debug(traceback.format_exc())
567
80dc0a23edf7 fixed whoosh failure on new repository
Marcin Kuzminski <marcin@python-works.com>
parents: 561
diff changeset
139 pass
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
140 return index_paths_
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
141
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
142 def get_node(self, repo, path):
3921
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
143 """
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
144 gets a filenode based on given full path.It operates on string for
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
145 hg git compatability.
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
146
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
147 :param repo: scm repo instance
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
148 :param path: full path including root location
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
149 :return: FileNode
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
150 """
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
151 root_path = safe_str(repo.path)+'/'
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
152 parts = safe_str(path).partition(root_path)
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
153 cs = self._get_index_changeset(repo)
3921
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
154 node = cs.get_node(parts[-1])
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
155 return node
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
156
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
157 def get_node_mtime(self, node):
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
158 return mktime(node.last_changeset.date.timetuple())
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
159
1171
2ab211e0aecd changes for #56
Marcin Kuzminski <marcin@python-works.com>
parents: 1154
diff changeset
160 def add_doc(self, writer, path, repo, repo_name):
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
161 """
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
162 Adding doc to writer this function itself fetches data from
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
163 the instance of vcs backend
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
164 """
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
165
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
166 node = self.get_node(repo, path)
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
167 indexed = indexed_w_content = 0
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
168 # we just index the content of chosen files, and skip binary files
886
0736230c7d91 #92 removed content of binary files for whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 885
diff changeset
169 if node.extension in INDEX_EXTENSIONS and not node.is_binary:
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
170 u_content = node.content
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
171 if not isinstance(u_content, unicode):
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
172 log.warning(' >> %s Could not get this content as unicode '
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
173 'replacing with empty content' % path)
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
174 u_content = u''
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
175 else:
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
176 log.debug(' >> %s [WITH CONTENT]' % path)
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
177 indexed_w_content += 1
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
178
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
179 else:
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
180 log.debug(' >> %s' % path)
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
181 # just index file name without it's content
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
182 u_content = u''
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
183 indexed += 1
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
184
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
185 p = safe_unicode(path)
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
186 writer.add_document(
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
187 fileid=p,
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
188 owner=unicode(repo.contact),
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
189 repository=safe_unicode(repo_name),
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
190 path=p,
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
191 content=u_content,
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
192 modtime=self.get_node_mtime(node),
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
193 extension=node.extension
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
194 )
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
195 return indexed, indexed_w_content
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
196
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
197 def index_changesets(self, writer, repo_name, repo, start_rev=None):
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
198 """
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
199 Add all changeset in the vcs repo starting at start_rev
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
200 to the index writer
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
201
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
202 :param writer: the whoosh index writer to add to
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
203 :param repo_name: name of the repository from whence the
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
204 changeset originates including the repository group
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
205 :param repo: the vcs repository instance to index changesets for,
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
206 the presumption is the repo has changesets to index
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
207 :param start_rev=None: the full sha id to start indexing from
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
208 if start_rev is None then index from the first changeset in
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
209 the repo
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
210 """
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
211
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
212 if start_rev is None:
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
213 start_rev = repo[0].raw_id
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
214
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
215 log.debug('indexing changesets in %s starting at rev: %s' %
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
216 (repo_name, start_rev))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
217
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
218 indexed = 0
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
219 for cs in repo.get_changesets(start=start_rev):
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
220 log.debug(' >> %s' % cs)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
221 writer.add_document(
2642
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2641
diff changeset
222 raw_id=unicode(cs.raw_id),
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
223 owner=unicode(repo.contact),
2693
66c778b8cb54 Extended commit search schema with date of commit
Marcin Kuzminski <marcin@python-works.com>
parents: 2648
diff changeset
224 date=cs._timestamp,
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
225 repository=safe_unicode(repo_name),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
226 author=cs.author,
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
227 message=cs.message,
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
228 last=cs.last,
2763
81624c8a1035 #548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
229 added=u' '.join([safe_unicode(node.path) for node in cs.added]).lower(),
81624c8a1035 #548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
230 removed=u' '.join([safe_unicode(node.path) for node in cs.removed]).lower(),
81624c8a1035 #548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
231 changed=u' '.join([safe_unicode(node.path) for node in cs.changed]).lower(),
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
232 parents=u' '.join([cs.raw_id for cs in cs.parents]),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
233 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
234 indexed += 1
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
235
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
236 log.debug('indexed %d changesets for repo %s' % (indexed, repo_name))
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
237 return indexed
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
238
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
239 def index_files(self, file_idx_writer, repo_name, repo):
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
240 """
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
241 Index files for given repo_name
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
242
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
243 :param file_idx_writer: the whoosh index writer to add to
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
244 :param repo_name: name of the repository we're indexing
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
245 :param repo: instance of vcs repo
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
246 """
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
247 i_cnt = iwc_cnt = 0
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
248 log.debug('building index for %s @revision:%s' % (repo.path,
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
249 self._get_index_revision(repo)))
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
250 for idx_path in self.get_paths(repo):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
251 i, iwc = self.add_doc(file_idx_writer, idx_path, repo, repo_name)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
252 i_cnt += i
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
253 iwc_cnt += iwc
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
254
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
255 log.debug('added %s files %s with content for repo %s' %
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
256 (i_cnt + iwc_cnt, iwc_cnt, repo.path))
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
257 return i_cnt, iwc_cnt
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
258
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
259 def update_changeset_index(self):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
260 idx = open_dir(self.index_location, indexname=CHGSET_IDX_NAME)
2569
b98fd6fc67f9 Little better logging in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2388
diff changeset
261
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
262 with idx.searcher() as searcher:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
263 writer = idx.writer()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
264 writer_is_dirty = False
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
265 try:
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
266 indexed_total = 0
2839
c0ddc86b4654 Fix possible exception about repo_name not defined, on whoosh indexer when using index-only option
Marcin Kuzminski <marcin@python-works.com>
parents: 2763
diff changeset
267 repo_name = None
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
268 for repo_name, repo in self.repo_paths.items():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
269 # skip indexing if there aren't any revs in the repo
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
270 num_of_revs = len(repo)
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
271 if num_of_revs < 1:
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
272 continue
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
273
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
274 qp = QueryParser('repository', schema=CHGSETS_SCHEMA)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
275 q = qp.parse(u"last:t AND %s" % repo_name)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
276
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
277 results = searcher.search(q)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
278
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
279 # default to scanning the entire repo
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
280 last_rev = 0
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
281 start_id = None
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
282
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
283 if len(results) > 0:
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
284 # assuming that there is only one result, if not this
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
285 # may require a full re-index.
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
286 start_id = results[0]['raw_id']
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
287 last_rev = repo.get_changeset(revision=start_id).revision
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
288
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
289 # there are new changesets to index or a new repo to index
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
290 if last_rev == 0 or num_of_revs > last_rev + 1:
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
291 # delete the docs in the index for the previous
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
292 # last changeset(s)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
293 for hit in results:
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
294 q = qp.parse(u"last:t AND %s AND raw_id:%s" %
2642
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2641
diff changeset
295 (repo_name, hit['raw_id']))
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
296 writer.delete_by_query(q)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
297
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
298 # index from the previous last changeset + all new ones
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
299 indexed_total += self.index_changesets(writer,
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
300 repo_name, repo, start_id)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
301 writer_is_dirty = True
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
302 log.debug('indexed %s changesets for repo %s' % (
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
303 indexed_total, repo_name)
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
304 )
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
305 finally:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
306 if writer_is_dirty:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
307 log.debug('>> COMMITING CHANGES TO CHANGESET INDEX<<')
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
308 writer.commit(merge=True)
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
309 log.debug('>>> FINISHED REBUILDING CHANGESET INDEX <<<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
310 else:
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
311 log.debug('>> NOTHING TO COMMIT TO CHANGESET INDEX<<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
312
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
313 def update_file_index(self):
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
314 log.debug((u'STARTING INCREMENTAL INDEXING UPDATE FOR EXTENSIONS %s '
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
315 'AND REPOS %s') % (INDEX_EXTENSIONS, self.repo_paths.keys()))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
316
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
317 idx = open_dir(self.index_location, indexname=self.indexname)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
318 # The set of all paths in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
319 indexed_paths = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
320 # The set of all paths we need to re-index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
321 to_index = set()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
322
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
323 writer = idx.writer()
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
324 writer_is_dirty = False
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
325 try:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
326 with idx.reader() as reader:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
327
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
328 # Loop over the stored fields in the index
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
329 for fields in reader.all_stored_fields():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
330 indexed_path = fields['path']
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
331 indexed_repo_path = fields['repository']
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
332 indexed_paths.add(indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
333
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
334 if not indexed_repo_path in self.filtered_repo_update_paths:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
335 continue
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
336
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
337 repo = self.repo_paths[indexed_repo_path]
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
338
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
339 try:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
340 node = self.get_node(repo, indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
341 # Check if this file was changed since it was indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
342 indexed_time = fields['modtime']
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
343 mtime = self.get_node_mtime(node)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
344 if mtime > indexed_time:
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
345 # The file has changed, delete it and add it to
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
346 # the list of files to reindex
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
347 log.debug(
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
348 'adding to reindex list %s mtime: %s vs %s' % (
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
349 indexed_path, mtime, indexed_time)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
350 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
351 writer.delete_by_term('fileid', indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
352 writer_is_dirty = True
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
353
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
354 to_index.add(indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
355 except (ChangesetError, NodeDoesNotExistError):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
356 # This file was deleted since it was indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
357 log.debug('removing from index %s' % indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
358 writer.delete_by_term('path', indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
359 writer_is_dirty = True
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
360
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
361 # Loop over the files in the filesystem
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
362 # Assume we have a function that gathers the filenames of the
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
363 # documents to be indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
364 ri_cnt_total = 0 # indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
365 riwc_cnt_total = 0 # indexed with content
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
366 for repo_name, repo in self.repo_paths.items():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
367 # skip indexing if there aren't any revisions
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
368 if len(repo) < 1:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
369 continue
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
370 ri_cnt = 0 # indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
371 riwc_cnt = 0 # indexed with content
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
372 for path in self.get_paths(repo):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
373 path = safe_unicode(path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
374 if path in to_index or path not in indexed_paths:
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
375
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
376 # This is either a file that's changed, or a new file
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
377 # that wasn't indexed before. So index it!
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
378 i, iwc = self.add_doc(writer, path, repo, repo_name)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
379 writer_is_dirty = True
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
380 log.debug('re indexing %s' % path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
381 ri_cnt += i
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
382 ri_cnt_total += 1
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
383 riwc_cnt += iwc
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
384 riwc_cnt_total += iwc
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
385 log.debug('added %s files %s with content for repo %s' % (
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
386 ri_cnt + riwc_cnt, riwc_cnt, repo.path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
387 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
388 log.debug('indexed %s files in total and %s with content' % (
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
389 ri_cnt_total, riwc_cnt_total)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
390 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
391 finally:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
392 if writer_is_dirty:
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
393 log.debug('>> COMMITING CHANGES TO FILE INDEX <<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
394 writer.commit(merge=True)
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
395 log.debug('>>> FINISHED REBUILDING FILE INDEX <<<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
396 else:
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
397 log.debug('>> NOTHING TO COMMIT TO FILE INDEX <<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
398 writer.cancel()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
399
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
400 def build_indexes(self):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
401 if os.path.exists(self.index_location):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
402 log.debug('removing previous index')
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
403 rmtree(self.index_location)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
404
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
405 if not os.path.exists(self.index_location):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
406 os.mkdir(self.index_location)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
407
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
408 chgset_idx = create_in(self.index_location, CHGSETS_SCHEMA,
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
409 indexname=CHGSET_IDX_NAME)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
410 chgset_idx_writer = chgset_idx.writer()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
411
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
412 file_idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
413 file_idx_writer = file_idx.writer()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
414 log.debug('BUILDING INDEX FOR EXTENSIONS %s '
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
415 'AND REPOS %s' % (INDEX_EXTENSIONS, self.repo_paths.keys()))
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
416
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
417 for repo_name, repo in self.repo_paths.items():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
418 # skip indexing if there aren't any revisions
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
419 if len(repo) < 1:
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
420 continue
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
421
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
422 self.index_files(file_idx_writer, repo_name, repo)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
423 self.index_changesets(chgset_idx_writer, repo_name, repo)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
424
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
425 log.debug('>> COMMITING CHANGES <<')
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
426 file_idx_writer.commit(merge=True)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
427 chgset_idx_writer.commit(merge=True)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
428 log.debug('>>> FINISHED BUILDING INDEX <<<')
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
429
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
430 def update_indexes(self):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
431 self.update_file_index()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
432 self.update_changeset_index()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
433
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
434 def run(self, full_index=False):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
435 """Run daemon"""
465
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
436 if full_index or self.initial:
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
437 self.build_indexes()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
438 else:
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
439 self.update_indexes()