annotate rhodecode/lib/indexers/daemon.py @ 4116:ffd45b185016 rhodecode-2.2.5-gpl

Imported some of the GPLv3'd changes from RhodeCode v2.2.5. This imports changes between changesets 21af6c4eab3d and 6177597791c2 in RhodeCode's original repository, including only changes to Python files and HTML. RhodeCode clearly licensed its changes to these files under GPLv3 in their /LICENSE file, which states the following: The Python code and integrated HTML are licensed under the GPLv3 license. (See: https://code.rhodecode.com/rhodecode/files/v2.2.5/LICENSE or http://web.archive.org/web/20140512193334/https://code.rhodecode.com/rhodecode/files/f3b123159901f15426d18e3dc395e8369f70ebe0/LICENSE for an online copy of that LICENSE file) Conservancy reviewed these changes and confirmed that they can be licensed as a whole to the Kallithea project under GPLv3-only. While some of the contents committed herein are clearly licensed GPLv3-or-later, on the whole we must assume the are GPLv3-only, since the statement above from RhodeCode indicates that they intend GPLv3-only as their license, per GPLv3ยง14 and other relevant sections of GPLv3.
author Bradley M. Kuhn <bkuhn@sfconservancy.org>
date Wed, 02 Jul 2014 19:03:13 -0400
parents 5293d4bbb1ea
children e9f6b533a8f6
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
1 # -*- coding: utf-8 -*-
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
2 # This program is free software: you can redistribute it and/or modify
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
3 # it under the terms of the GNU General Public License as published by
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
4 # the Free Software Foundation, either version 3 of the License, or
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
5 # (at your option) any later version.
947
99850ac883d1 Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents: 902
diff changeset
6 #
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
7 # This program is distributed in the hope that it will be useful,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
10 # GNU General Public License for more details.
947
99850ac883d1 Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents: 902
diff changeset
11 #
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
12 # You should have received a copy of the GNU General Public License
1206
a671db5bdd58 fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents: 1183
diff changeset
13 # along with this program. If not, see <http://www.gnu.org/licenses/>.
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
14 """
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
15 rhodecode.lib.indexers.daemon
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
17
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
18 A daemon will read from task table and run tasks
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
19
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
20 :created_on: Jan 26, 2010
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
21 :author: marcink
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
22 :copyright: (c) 2013 RhodeCode GmbH.
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
23 :license: GPLv3, see LICENSE for more details.
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
24 """
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
25
2641
cfcd981d6679 import with_statment to make daemon.py python 2.5 compatible
Indra Talip <indra.talip@gmail.com>
parents: 2640
diff changeset
26 from __future__ import with_statement
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
27
1154
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
28 import os
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
29 import sys
1154
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
30 import logging
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
31 import traceback
1154
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
32
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
33 from shutil import rmtree
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
34 from time import mktime
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
35
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
36 from os.path import dirname as dn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
37 from os.path import join as jn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
38
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
39 #to get the rhodecode import
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
40 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
41 sys.path.append(project_path)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
42
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
43 from rhodecode.config.conf import INDEX_EXTENSIONS
691
7486da5f0628 Refactor codes for scm model
Marcin Kuzminski <marcin@python-works.com>
parents: 683
diff changeset
44 from rhodecode.model.scm import ScmModel
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
45 from rhodecode.model.db import Repository
3016
b3c8a3a5ce5f fixed issues some people had with whoosh indexers and unicode decode
Marcin Kuzminski <marcin@python-works.com>
parents: 2841
diff changeset
46 from rhodecode.lib.utils2 import safe_unicode, safe_str
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
47 from rhodecode.lib.indexers import SCHEMA, IDX_NAME, CHGSETS_SCHEMA, \
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
48 CHGSET_IDX_NAME
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
49
2007
324ac367a4da Added VCS into rhodecode core for faster and easier deployments of new versions
Marcin Kuzminski <marcin@python-works.com>
parents: 1995
diff changeset
50 from rhodecode.lib.vcs.exceptions import ChangesetError, RepositoryError, \
1711
b369bec5d468 fixes issue with whoosh reindexing files that were removed or renamed
Marcin Kuzminski <marcin@python-works.com>
parents: 1451
diff changeset
51 NodeDoesNotExistError
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
52
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
53 from whoosh.index import create_in, open_dir, exists_in
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
54 from whoosh.query import *
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
55 from whoosh.qparser import QueryParser
1154
36fe593dfe4b simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents: 1036
diff changeset
56
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
57 log = logging.getLogger('whoosh_indexer')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
58
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
59
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
60 class WhooshIndexingDaemon(object):
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
61 """
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
62 Daemon for atomic indexing jobs
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
63 """
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
64
1995
b6c902d88472 bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents: 1824
diff changeset
65 def __init__(self, indexname=IDX_NAME, index_location=None,
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
66 repo_location=None, sa=None, repo_list=None,
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
67 repo_update_list=None):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
68 self.indexname = indexname
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
69
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
70 self.index_location = index_location
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
71 if not index_location:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
72 raise Exception('You have to provide index location')
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
73
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
74 self.repo_location = repo_location
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
75 if not repo_location:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
76 raise Exception('You have to provide repositories location')
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
77
1036
405b80e4ccd5 Major refactoring, removed when possible calls to app globals.
Marcin Kuzminski <marcin@python-works.com>
parents: 947
diff changeset
78 self.repo_paths = ScmModel(sa).repo_scan(self.repo_location)
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
79
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
80 #filter repo list
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
81 if repo_list:
2841
2fa3c09f63e0 fixed problems with re-indexing non-ascii names of repositories
Marcin Kuzminski <marcin@python-works.com>
parents: 2840
diff changeset
82 #Fix non-ascii repo names to unicode
2fa3c09f63e0 fixed problems with re-indexing non-ascii names of repositories
Marcin Kuzminski <marcin@python-works.com>
parents: 2840
diff changeset
83 repo_list = map(safe_unicode, repo_list)
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
84 self.filtered_repo_paths = {}
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
85 for repo_name, repo in self.repo_paths.items():
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
86 if repo_name in repo_list:
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
87 self.filtered_repo_paths[repo_name] = repo
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
88
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
89 self.repo_paths = self.filtered_repo_paths
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
90
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
91 #filter update repo list
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
92 self.filtered_repo_update_paths = {}
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
93 if repo_update_list:
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
94 self.filtered_repo_update_paths = {}
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
95 for repo_name, repo in self.repo_paths.items():
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
96 if repo_name in repo_update_list:
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
97 self.filtered_repo_update_paths[repo_name] = repo
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
98 self.repo_paths = self.filtered_repo_update_paths
894
1fed3c9161bb fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents: 886
diff changeset
99
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
100 self.initial = True
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
101 if not os.path.isdir(self.index_location):
763
0dad296d2a57 extended trending languages to more entries, implemented new faster and "fancy"
Marcin Kuzminski <marcin@python-works.com>
parents: 691
diff changeset
102 os.makedirs(self.index_location)
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
103 log.info('Cannot run incremental index since it does not '
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
104 'yet exist running full build')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
105 elif not exists_in(self.index_location, IDX_NAME):
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
106 log.info('Running full index build as the file content '
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
107 'index does not exist')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
108 elif not exists_in(self.index_location, CHGSET_IDX_NAME):
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
109 log.info('Running full index build as the changeset '
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
110 'index does not exist')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
111 else:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
112 self.initial = False
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
113
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
114 def _get_index_revision(self, repo):
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
115 db_repo = Repository.get_by_repo_name(repo.name_unicode)
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
116 landing_rev = 'tip'
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
117 if db_repo:
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
118 _rev_type, _rev = db_repo.landing_rev
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
119 landing_rev = _rev
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
120 return landing_rev
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
121
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
122 def _get_index_changeset(self, repo, index_rev=None):
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
123 if not index_rev:
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
124 index_rev = self._get_index_revision(repo)
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
125 cs = repo.get_changeset(index_rev)
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
126 return cs
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
127
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
128 def get_paths(self, repo):
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
129 """
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
130 recursive walk in root dir and return a set of all path in that dir
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
131 based on repository walk function
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
132 """
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
133 index_paths_ = set()
567
80dc0a23edf7 fixed whoosh failure on new repository
Marcin Kuzminski <marcin@python-works.com>
parents: 561
diff changeset
134 try:
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
135 cs = self._get_index_changeset(repo)
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
136 for _topnode, _dirs, files in cs.walk('/'):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
137 for f in files:
3016
b3c8a3a5ce5f fixed issues some people had with whoosh indexers and unicode decode
Marcin Kuzminski <marcin@python-works.com>
parents: 2841
diff changeset
138 index_paths_.add(jn(safe_str(repo.path), safe_str(f.path)))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
139
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
140 except RepositoryError:
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
141 log.debug(traceback.format_exc())
567
80dc0a23edf7 fixed whoosh failure on new repository
Marcin Kuzminski <marcin@python-works.com>
parents: 561
diff changeset
142 pass
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
143 return index_paths_
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
144
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
145 def get_node(self, repo, path, index_rev=None):
3921
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
146 """
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
147 gets a filenode based on given full path.It operates on string for
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
148 hg git compatability.
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
149
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
150 :param repo: scm repo instance
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
151 :param path: full path including root location
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
152 :return: FileNode
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
153 """
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
154 root_path = safe_str(repo.path)+'/'
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
155 parts = safe_str(path).partition(root_path)
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
156 cs = self._get_index_changeset(repo, index_rev=index_rev)
3921
932c84e8fa92 fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents: 3916
diff changeset
157 node = cs.get_node(parts[-1])
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
158 return node
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
159
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
160 def get_node_mtime(self, node):
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
161 return mktime(node.last_changeset.date.timetuple())
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
162
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
163 def add_doc(self, writer, path, repo, repo_name, index_rev=None):
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
164 """
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
165 Adding doc to writer this function itself fetches data from
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
166 the instance of vcs backend
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
167 """
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
168
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
169 node = self.get_node(repo, path, index_rev)
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
170 indexed = indexed_w_content = 0
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
171 # we just index the content of chosen files, and skip binary files
886
0736230c7d91 #92 removed content of binary files for whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 885
diff changeset
172 if node.extension in INDEX_EXTENSIONS and not node.is_binary:
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
173 u_content = node.content
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
174 if not isinstance(u_content, unicode):
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
175 log.warning(' >> %s Could not get this content as unicode '
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
176 'replacing with empty content' % path)
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
177 u_content = u''
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
178 else:
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
179 log.debug(' >> %s [WITH CONTENT]' % path)
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
180 indexed_w_content += 1
885
94f7585af8a1 fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents: 777
diff changeset
181
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
182 else:
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
183 log.debug(' >> %s' % path)
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
184 # just index file name without it's content
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
185 u_content = u''
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
186 indexed += 1
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
187
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
188 p = safe_unicode(path)
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
189 writer.add_document(
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
190 fileid=p,
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
191 owner=unicode(repo.contact),
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
192 repository=safe_unicode(repo_name),
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
193 path=p,
2101
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
194 content=u_content,
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
195 modtime=self.get_node_mtime(node),
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
196 extension=node.extension
df96adcbb1f7 code garden
Marcin Kuzminski <marcin@python-works.com>
parents: 2007
diff changeset
197 )
2109
8ecfed1d8f8b utils/conf
Marcin Kuzminski <marcin@python-works.com>
parents: 2101
diff changeset
198 return indexed, indexed_w_content
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
199
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
200 def index_changesets(self, writer, repo_name, repo, start_rev=None):
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
201 """
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
202 Add all changeset in the vcs repo starting at start_rev
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
203 to the index writer
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
204
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
205 :param writer: the whoosh index writer to add to
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
206 :param repo_name: name of the repository from whence the
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
207 changeset originates including the repository group
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
208 :param repo: the vcs repository instance to index changesets for,
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
209 the presumption is the repo has changesets to index
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
210 :param start_rev=None: the full sha id to start indexing from
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
211 if start_rev is None then index from the first changeset in
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
212 the repo
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
213 """
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
214
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
215 if start_rev is None:
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
216 start_rev = repo[0].raw_id
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
217
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
218 log.debug('indexing changesets in %s starting at rev: %s' %
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
219 (repo_name, start_rev))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
220
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
221 indexed = 0
3922
d8e02de53bbc show number of revisions to parse in whoosh indexing deamon logging
Marcin Kuzminski <marcin@python-works.com>
parents: 3921
diff changeset
222 cs_iter = repo.get_changesets(start=start_rev)
d8e02de53bbc show number of revisions to parse in whoosh indexing deamon logging
Marcin Kuzminski <marcin@python-works.com>
parents: 3921
diff changeset
223 total = len(cs_iter)
d8e02de53bbc show number of revisions to parse in whoosh indexing deamon logging
Marcin Kuzminski <marcin@python-works.com>
parents: 3921
diff changeset
224 for cs in cs_iter:
d8e02de53bbc show number of revisions to parse in whoosh indexing deamon logging
Marcin Kuzminski <marcin@python-works.com>
parents: 3921
diff changeset
225 log.debug(' >> %s/%s' % (cs, total))
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
226 writer.add_document(
2642
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2641
diff changeset
227 raw_id=unicode(cs.raw_id),
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
228 owner=unicode(repo.contact),
2693
66c778b8cb54 Extended commit search schema with date of commit
Marcin Kuzminski <marcin@python-works.com>
parents: 2648
diff changeset
229 date=cs._timestamp,
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
230 repository=safe_unicode(repo_name),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
231 author=cs.author,
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
232 message=cs.message,
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
233 last=cs.last,
2763
81624c8a1035 #548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
234 added=u' '.join([safe_unicode(node.path) for node in cs.added]).lower(),
81624c8a1035 #548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
235 removed=u' '.join([safe_unicode(node.path) for node in cs.removed]).lower(),
81624c8a1035 #548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2693
diff changeset
236 changed=u' '.join([safe_unicode(node.path) for node in cs.changed]).lower(),
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
237 parents=u' '.join([cs.raw_id for cs in cs.parents]),
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
238 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
239 indexed += 1
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
240
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
241 log.debug('indexed %d changesets for repo %s' % (indexed, repo_name))
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
242 return indexed
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
243
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
244 def index_files(self, file_idx_writer, repo_name, repo):
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
245 """
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
246 Index files for given repo_name
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
247
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
248 :param file_idx_writer: the whoosh index writer to add to
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
249 :param repo_name: name of the repository we're indexing
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
250 :param repo: instance of vcs repo
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
251 """
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
252 i_cnt = iwc_cnt = 0
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
253 log.debug('building index for %s @revision:%s' % (repo.path,
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
254 self._get_index_revision(repo)))
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
255 index_rev = self._get_index_revision(repo)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
256 for idx_path in self.get_paths(repo):
4116
ffd45b185016 Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents: 3960
diff changeset
257 i, iwc = self.add_doc(file_idx_writer, idx_path, repo, repo_name, index_rev)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
258 i_cnt += i
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
259 iwc_cnt += iwc
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
260
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
261 log.debug('added %s files %s with content for repo %s' %
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
262 (i_cnt + iwc_cnt, iwc_cnt, repo.path))
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
263 return i_cnt, iwc_cnt
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
264
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
265 def update_changeset_index(self):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
266 idx = open_dir(self.index_location, indexname=CHGSET_IDX_NAME)
2569
b98fd6fc67f9 Little better logging in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2388
diff changeset
267
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
268 with idx.searcher() as searcher:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
269 writer = idx.writer()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
270 writer_is_dirty = False
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
271 try:
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
272 indexed_total = 0
2839
c0ddc86b4654 Fix possible exception about repo_name not defined, on whoosh indexer when using index-only option
Marcin Kuzminski <marcin@python-works.com>
parents: 2763
diff changeset
273 repo_name = None
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
274 for repo_name, repo in self.repo_paths.items():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
275 # skip indexing if there aren't any revs in the repo
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
276 num_of_revs = len(repo)
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
277 if num_of_revs < 1:
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
278 continue
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
279
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
280 qp = QueryParser('repository', schema=CHGSETS_SCHEMA)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
281 q = qp.parse(u"last:t AND %s" % repo_name)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
282
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
283 results = searcher.search(q)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
284
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
285 # default to scanning the entire repo
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
286 last_rev = 0
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
287 start_id = None
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
288
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
289 if len(results) > 0:
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
290 # assuming that there is only one result, if not this
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
291 # may require a full re-index.
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
292 start_id = results[0]['raw_id']
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
293 last_rev = repo.get_changeset(revision=start_id).revision
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
294
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
295 # there are new changesets to index or a new repo to index
2643
2ad50c44b025 when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents: 2642
diff changeset
296 if last_rev == 0 or num_of_revs > last_rev + 1:
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
297 # delete the docs in the index for the previous
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
298 # last changeset(s)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
299 for hit in results:
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
300 q = qp.parse(u"last:t AND %s AND raw_id:%s" %
2642
88b0e82bcba4 rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents: 2641
diff changeset
301 (repo_name, hit['raw_id']))
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
302 writer.delete_by_query(q)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
303
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
304 # index from the previous last changeset + all new ones
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
305 indexed_total += self.index_changesets(writer,
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
306 repo_name, repo, start_id)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
307 writer_is_dirty = True
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
308 log.debug('indexed %s changesets for repo %s' % (
3916
ba08786c49ef fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents: 3016
diff changeset
309 indexed_total, repo_name)
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
310 )
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
311 finally:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
312 if writer_is_dirty:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
313 log.debug('>> COMMITING CHANGES TO CHANGESET INDEX<<')
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
314 writer.commit(merge=True)
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
315 log.debug('>>> FINISHED REBUILDING CHANGESET INDEX <<<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
316 else:
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
317 log.debug('>> NOTHING TO COMMIT TO CHANGESET INDEX<<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
318
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
319 def update_file_index(self):
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
320 log.debug((u'STARTING INCREMENTAL INDEXING UPDATE FOR EXTENSIONS %s '
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
321 'AND REPOS %s') % (INDEX_EXTENSIONS, self.repo_paths.keys()))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
322
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
323 idx = open_dir(self.index_location, indexname=self.indexname)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
324 # The set of all paths in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
325 indexed_paths = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
326 # The set of all paths we need to re-index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
327 to_index = set()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
328
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
329 writer = idx.writer()
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
330 writer_is_dirty = False
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
331 try:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
332 with idx.reader() as reader:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
333
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
334 # Loop over the stored fields in the index
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
335 for fields in reader.all_stored_fields():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
336 indexed_path = fields['path']
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
337 indexed_repo_path = fields['repository']
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
338 indexed_paths.add(indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
339
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
340 if not indexed_repo_path in self.filtered_repo_update_paths:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
341 continue
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
342
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
343 repo = self.repo_paths[indexed_repo_path]
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
344
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
345 try:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
346 node = self.get_node(repo, indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
347 # Check if this file was changed since it was indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
348 indexed_time = fields['modtime']
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
349 mtime = self.get_node_mtime(node)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
350 if mtime > indexed_time:
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
351 # The file has changed, delete it and add it to
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
352 # the list of files to reindex
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
353 log.debug(
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
354 'adding to reindex list %s mtime: %s vs %s' % (
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
355 indexed_path, mtime, indexed_time)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
356 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
357 writer.delete_by_term('fileid', indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
358 writer_is_dirty = True
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
359
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
360 to_index.add(indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
361 except (ChangesetError, NodeDoesNotExistError):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
362 # This file was deleted since it was indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
363 log.debug('removing from index %s' % indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
364 writer.delete_by_term('path', indexed_path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
365 writer_is_dirty = True
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
366
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
367 # Loop over the files in the filesystem
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
368 # Assume we have a function that gathers the filenames of the
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
369 # documents to be indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
370 ri_cnt_total = 0 # indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
371 riwc_cnt_total = 0 # indexed with content
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
372 for repo_name, repo in self.repo_paths.items():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
373 # skip indexing if there aren't any revisions
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
374 if len(repo) < 1:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
375 continue
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
376 ri_cnt = 0 # indexed
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
377 riwc_cnt = 0 # indexed with content
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
378 for path in self.get_paths(repo):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
379 path = safe_unicode(path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
380 if path in to_index or path not in indexed_paths:
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
381
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
382 # This is either a file that's changed, or a new file
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
383 # that wasn't indexed before. So index it!
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
384 i, iwc = self.add_doc(writer, path, repo, repo_name)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
385 writer_is_dirty = True
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
386 log.debug('re indexing %s' % path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
387 ri_cnt += i
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
388 ri_cnt_total += 1
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
389 riwc_cnt += iwc
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
390 riwc_cnt_total += iwc
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
391 log.debug('added %s files %s with content for repo %s' % (
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
392 ri_cnt + riwc_cnt, riwc_cnt, repo.path)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
393 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
394 log.debug('indexed %s files in total and %s with content' % (
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
395 ri_cnt_total, riwc_cnt_total)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
396 )
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
397 finally:
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
398 if writer_is_dirty:
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
399 log.debug('>> COMMITING CHANGES TO FILE INDEX <<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
400 writer.commit(merge=True)
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
401 log.debug('>>> FINISHED REBUILDING FILE INDEX <<<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
402 else:
2840
c7c5825299fe fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 2839
diff changeset
403 log.debug('>> NOTHING TO COMMIT TO FILE INDEX <<')
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
404 writer.cancel()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
405
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
406 def build_indexes(self):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
407 if os.path.exists(self.index_location):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
408 log.debug('removing previous index')
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
409 rmtree(self.index_location)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
410
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
411 if not os.path.exists(self.index_location):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
412 os.mkdir(self.index_location)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
413
2648
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
414 chgset_idx = create_in(self.index_location, CHGSETS_SCHEMA,
0911cf6940af little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents: 2643
diff changeset
415 indexname=CHGSET_IDX_NAME)
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
416 chgset_idx_writer = chgset_idx.writer()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
417
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
418 file_idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
419 file_idx_writer = file_idx.writer()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
420 log.debug('BUILDING INDEX FOR EXTENSIONS %s '
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
421 'AND REPOS %s' % (INDEX_EXTENSIONS, self.repo_paths.keys()))
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
422
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
423 for repo_name, repo in self.repo_paths.items():
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
424 # skip indexing if there aren't any revisions
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
425 if len(repo) < 1:
2373
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
426 continue
1828eb7fa688 #469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents: 2372
diff changeset
427
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
428 self.index_files(file_idx_writer, repo_name, repo)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
429 self.index_changesets(chgset_idx_writer, repo_name, repo)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
430
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
431 log.debug('>> COMMITING CHANGES <<')
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
432 file_idx_writer.commit(merge=True)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
433 chgset_idx_writer.commit(merge=True)
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
434 log.debug('>>> FINISHED BUILDING INDEX <<<')
2388
a0ef98f2520b #453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents: 2373
diff changeset
435
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
436 def update_indexes(self):
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
437 self.update_file_index()
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
438 self.update_changeset_index()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
439
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
440 def run(self, full_index=False):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
441 """Run daemon"""
465
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
442 if full_index or self.initial:
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
443 self.build_indexes()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
444 else:
2640
5f21a9dcb09d create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents: 2569
diff changeset
445 self.update_indexes()