Mercurial > kallithea
annotate rhodecode/lib/indexers/daemon.py @ 4116:ffd45b185016 rhodecode-2.2.5-gpl
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
This imports changes between changesets 21af6c4eab3d and 6177597791c2 in
RhodeCode's original repository, including only changes to Python files and HTML.
RhodeCode clearly licensed its changes to these files under GPLv3
in their /LICENSE file, which states the following:
The Python code and integrated HTML are licensed under the GPLv3 license.
(See:
https://code.rhodecode.com/rhodecode/files/v2.2.5/LICENSE
or
http://web.archive.org/web/20140512193334/https://code.rhodecode.com/rhodecode/files/f3b123159901f15426d18e3dc395e8369f70ebe0/LICENSE
for an online copy of that LICENSE file)
Conservancy reviewed these changes and confirmed that they can be licensed as
a whole to the Kallithea project under GPLv3-only.
While some of the contents committed herein are clearly licensed
GPLv3-or-later, on the whole we must assume the are GPLv3-only, since the
statement above from RhodeCode indicates that they intend GPLv3-only as their
license, per GPLv3ยง14 and other relevant sections of GPLv3.
author | Bradley M. Kuhn <bkuhn@sfconservancy.org> |
---|---|
date | Wed, 02 Jul 2014 19:03:13 -0400 |
parents | 5293d4bbb1ea |
children | e9f6b533a8f6 |
rev | line source |
---|---|
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
1 # -*- coding: utf-8 -*- |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
2 # This program is free software: you can redistribute it and/or modify |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
3 # it under the terms of the GNU General Public License as published by |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
4 # the Free Software Foundation, either version 3 of the License, or |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
5 # (at your option) any later version. |
947
99850ac883d1
Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents:
902
diff
changeset
|
6 # |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
7 # This program is distributed in the hope that it will be useful, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
10 # GNU General Public License for more details. |
947
99850ac883d1
Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents:
902
diff
changeset
|
11 # |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
12 # You should have received a copy of the GNU General Public License |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
13 # along with this program. If not, see <http://www.gnu.org/licenses/>. |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
14 """ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
15 rhodecode.lib.indexers.daemon |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
17 |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
18 A daemon will read from task table and run tasks |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
19 |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
20 :created_on: Jan 26, 2010 |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
21 :author: marcink |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
22 :copyright: (c) 2013 RhodeCode GmbH. |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
23 :license: GPLv3, see LICENSE for more details. |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
24 """ |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
25 |
2641
cfcd981d6679
import with_statment to make daemon.py python 2.5 compatible
Indra Talip <indra.talip@gmail.com>
parents:
2640
diff
changeset
|
26 from __future__ import with_statement |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
27 |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
28 import os |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
29 import sys |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
30 import logging |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
31 import traceback |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
32 |
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
33 from shutil import rmtree |
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
34 from time import mktime |
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
35 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
36 from os.path import dirname as dn |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
37 from os.path import join as jn |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
38 |
547
1e757ac98988
renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
497
diff
changeset
|
39 #to get the rhodecode import |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
40 project_path = dn(dn(dn(dn(os.path.realpath(__file__))))) |
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
41 sys.path.append(project_path) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
42 |
2109 | 43 from rhodecode.config.conf import INDEX_EXTENSIONS |
691
7486da5f0628
Refactor codes for scm model
Marcin Kuzminski <marcin@python-works.com>
parents:
683
diff
changeset
|
44 from rhodecode.model.scm import ScmModel |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
45 from rhodecode.model.db import Repository |
3016
b3c8a3a5ce5f
fixed issues some people had with whoosh indexers and unicode decode
Marcin Kuzminski <marcin@python-works.com>
parents:
2841
diff
changeset
|
46 from rhodecode.lib.utils2 import safe_unicode, safe_str |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
47 from rhodecode.lib.indexers import SCHEMA, IDX_NAME, CHGSETS_SCHEMA, \ |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
48 CHGSET_IDX_NAME |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
49 |
2007
324ac367a4da
Added VCS into rhodecode core for faster and easier deployments of new versions
Marcin Kuzminski <marcin@python-works.com>
parents:
1995
diff
changeset
|
50 from rhodecode.lib.vcs.exceptions import ChangesetError, RepositoryError, \ |
1711
b369bec5d468
fixes issue with whoosh reindexing files that were removed or renamed
Marcin Kuzminski <marcin@python-works.com>
parents:
1451
diff
changeset
|
51 NodeDoesNotExistError |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
52 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
53 from whoosh.index import create_in, open_dir, exists_in |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
54 from whoosh.query import * |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
55 from whoosh.qparser import QueryParser |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
56 |
2101 | 57 log = logging.getLogger('whoosh_indexer') |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
58 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
59 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
60 class WhooshIndexingDaemon(object): |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
61 """ |
2373
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
62 Daemon for atomic indexing jobs |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
63 """ |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
64 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
65 def __init__(self, indexname=IDX_NAME, index_location=None, |
2373
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
66 repo_location=None, sa=None, repo_list=None, |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
67 repo_update_list=None): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
68 self.indexname = indexname |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
69 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
70 self.index_location = index_location |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
71 if not index_location: |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
72 raise Exception('You have to provide index location') |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
73 |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
74 self.repo_location = repo_location |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
75 if not repo_location: |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
76 raise Exception('You have to provide repositories location') |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
77 |
1036
405b80e4ccd5
Major refactoring, removed when possible calls to app globals.
Marcin Kuzminski <marcin@python-works.com>
parents:
947
diff
changeset
|
78 self.repo_paths = ScmModel(sa).repo_scan(self.repo_location) |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
79 |
2373
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
80 #filter repo list |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
81 if repo_list: |
2841
2fa3c09f63e0
fixed problems with re-indexing non-ascii names of repositories
Marcin Kuzminski <marcin@python-works.com>
parents:
2840
diff
changeset
|
82 #Fix non-ascii repo names to unicode |
2fa3c09f63e0
fixed problems with re-indexing non-ascii names of repositories
Marcin Kuzminski <marcin@python-works.com>
parents:
2840
diff
changeset
|
83 repo_list = map(safe_unicode, repo_list) |
2373
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
84 self.filtered_repo_paths = {} |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
85 for repo_name, repo in self.repo_paths.items(): |
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
86 if repo_name in repo_list: |
2373
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
87 self.filtered_repo_paths[repo_name] = repo |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
88 |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
89 self.repo_paths = self.filtered_repo_paths |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
90 |
2373
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
91 #filter update repo list |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
92 self.filtered_repo_update_paths = {} |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
93 if repo_update_list: |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
94 self.filtered_repo_update_paths = {} |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
95 for repo_name, repo in self.repo_paths.items(): |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
96 if repo_name in repo_update_list: |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
97 self.filtered_repo_update_paths[repo_name] = repo |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
98 self.repo_paths = self.filtered_repo_update_paths |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
99 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
100 self.initial = True |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
101 if not os.path.isdir(self.index_location): |
763
0dad296d2a57
extended trending languages to more entries, implemented new faster and "fancy"
Marcin Kuzminski <marcin@python-works.com>
parents:
691
diff
changeset
|
102 os.makedirs(self.index_location) |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
103 log.info('Cannot run incremental index since it does not ' |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
104 'yet exist running full build') |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
105 elif not exists_in(self.index_location, IDX_NAME): |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
106 log.info('Running full index build as the file content ' |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
107 'index does not exist') |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
108 elif not exists_in(self.index_location, CHGSET_IDX_NAME): |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
109 log.info('Running full index build as the changeset ' |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
110 'index does not exist') |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
111 else: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
112 self.initial = False |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
113 |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
114 def _get_index_revision(self, repo): |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
115 db_repo = Repository.get_by_repo_name(repo.name_unicode) |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
116 landing_rev = 'tip' |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
117 if db_repo: |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
118 _rev_type, _rev = db_repo.landing_rev |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
119 landing_rev = _rev |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
120 return landing_rev |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
121 |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
122 def _get_index_changeset(self, repo, index_rev=None): |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
123 if not index_rev: |
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
124 index_rev = self._get_index_revision(repo) |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
125 cs = repo.get_changeset(index_rev) |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
126 return cs |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
127 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
128 def get_paths(self, repo): |
2101 | 129 """ |
130 recursive walk in root dir and return a set of all path in that dir | |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
131 based on repository walk function |
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
132 """ |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
133 index_paths_ = set() |
567
80dc0a23edf7
fixed whoosh failure on new repository
Marcin Kuzminski <marcin@python-works.com>
parents:
561
diff
changeset
|
134 try: |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
135 cs = self._get_index_changeset(repo) |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
136 for _topnode, _dirs, files in cs.walk('/'): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
137 for f in files: |
3016
b3c8a3a5ce5f
fixed issues some people had with whoosh indexers and unicode decode
Marcin Kuzminski <marcin@python-works.com>
parents:
2841
diff
changeset
|
138 index_paths_.add(jn(safe_str(repo.path), safe_str(f.path))) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
139 |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
140 except RepositoryError: |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
141 log.debug(traceback.format_exc()) |
567
80dc0a23edf7
fixed whoosh failure on new repository
Marcin Kuzminski <marcin@python-works.com>
parents:
561
diff
changeset
|
142 pass |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
143 return index_paths_ |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
144 |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
145 def get_node(self, repo, path, index_rev=None): |
3921
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
146 """ |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
147 gets a filenode based on given full path.It operates on string for |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
148 hg git compatability. |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
149 |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
150 :param repo: scm repo instance |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
151 :param path: full path including root location |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
152 :return: FileNode |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
153 """ |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
154 root_path = safe_str(repo.path)+'/' |
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
155 parts = safe_str(path).partition(root_path) |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
156 cs = self._get_index_changeset(repo, index_rev=index_rev) |
3921
932c84e8fa92
fixed #851 and #563 make-index crashes on non-ascii files
Marcin Kuzminski <marcin@python-works.com>
parents:
3916
diff
changeset
|
157 node = cs.get_node(parts[-1]) |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
158 return node |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
159 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
160 def get_node_mtime(self, node): |
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
161 return mktime(node.last_changeset.date.timetuple()) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
162 |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
163 def add_doc(self, writer, path, repo, repo_name, index_rev=None): |
2101 | 164 """ |
165 Adding doc to writer this function itself fetches data from | |
166 the instance of vcs backend | |
167 """ | |
168 | |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
169 node = self.get_node(repo, path, index_rev) |
2109 | 170 indexed = indexed_w_content = 0 |
2101 | 171 # we just index the content of chosen files, and skip binary files |
886
0736230c7d91
#92 removed content of binary files for whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
885
diff
changeset
|
172 if node.extension in INDEX_EXTENSIONS and not node.is_binary: |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
173 u_content = node.content |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
174 if not isinstance(u_content, unicode): |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
175 log.warning(' >> %s Could not get this content as unicode ' |
2101 | 176 'replacing with empty content' % path) |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
177 u_content = u'' |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
178 else: |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
179 log.debug(' >> %s [WITH CONTENT]' % path) |
2109 | 180 indexed_w_content += 1 |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
181 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
182 else: |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
183 log.debug(' >> %s' % path) |
2101 | 184 # just index file name without it's content |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
185 u_content = u'' |
2109 | 186 indexed += 1 |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
187 |
2388
a0ef98f2520b
#453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents:
2373
diff
changeset
|
188 p = safe_unicode(path) |
2101 | 189 writer.add_document( |
2388
a0ef98f2520b
#453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents:
2373
diff
changeset
|
190 fileid=p, |
2101 | 191 owner=unicode(repo.contact), |
192 repository=safe_unicode(repo_name), | |
2388
a0ef98f2520b
#453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents:
2373
diff
changeset
|
193 path=p, |
2101 | 194 content=u_content, |
195 modtime=self.get_node_mtime(node), | |
196 extension=node.extension | |
197 ) | |
2109 | 198 return indexed, indexed_w_content |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
199 |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
200 def index_changesets(self, writer, repo_name, repo, start_rev=None): |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
201 """ |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
202 Add all changeset in the vcs repo starting at start_rev |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
203 to the index writer |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
204 |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
205 :param writer: the whoosh index writer to add to |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
206 :param repo_name: name of the repository from whence the |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
207 changeset originates including the repository group |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
208 :param repo: the vcs repository instance to index changesets for, |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
209 the presumption is the repo has changesets to index |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
210 :param start_rev=None: the full sha id to start indexing from |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
211 if start_rev is None then index from the first changeset in |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
212 the repo |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
213 """ |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
214 |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
215 if start_rev is None: |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
216 start_rev = repo[0].raw_id |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
217 |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
218 log.debug('indexing changesets in %s starting at rev: %s' % |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
219 (repo_name, start_rev)) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
220 |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
221 indexed = 0 |
3922
d8e02de53bbc
show number of revisions to parse in whoosh indexing deamon logging
Marcin Kuzminski <marcin@python-works.com>
parents:
3921
diff
changeset
|
222 cs_iter = repo.get_changesets(start=start_rev) |
d8e02de53bbc
show number of revisions to parse in whoosh indexing deamon logging
Marcin Kuzminski <marcin@python-works.com>
parents:
3921
diff
changeset
|
223 total = len(cs_iter) |
d8e02de53bbc
show number of revisions to parse in whoosh indexing deamon logging
Marcin Kuzminski <marcin@python-works.com>
parents:
3921
diff
changeset
|
224 for cs in cs_iter: |
d8e02de53bbc
show number of revisions to parse in whoosh indexing deamon logging
Marcin Kuzminski <marcin@python-works.com>
parents:
3921
diff
changeset
|
225 log.debug(' >> %s/%s' % (cs, total)) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
226 writer.add_document( |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2641
diff
changeset
|
227 raw_id=unicode(cs.raw_id), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
228 owner=unicode(repo.contact), |
2693
66c778b8cb54
Extended commit search schema with date of commit
Marcin Kuzminski <marcin@python-works.com>
parents:
2648
diff
changeset
|
229 date=cs._timestamp, |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
230 repository=safe_unicode(repo_name), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
231 author=cs.author, |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
232 message=cs.message, |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
233 last=cs.last, |
2763
81624c8a1035
#548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
234 added=u' '.join([safe_unicode(node.path) for node in cs.added]).lower(), |
81624c8a1035
#548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
235 removed=u' '.join([safe_unicode(node.path) for node in cs.removed]).lower(), |
81624c8a1035
#548 Fixed issue with non-ascii paths in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2693
diff
changeset
|
236 changed=u' '.join([safe_unicode(node.path) for node in cs.changed]).lower(), |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
237 parents=u' '.join([cs.raw_id for cs in cs.parents]), |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
238 ) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
239 indexed += 1 |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
240 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
241 log.debug('indexed %d changesets for repo %s' % (indexed, repo_name)) |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
242 return indexed |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
243 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
244 def index_files(self, file_idx_writer, repo_name, repo): |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
245 """ |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
246 Index files for given repo_name |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
247 |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
248 :param file_idx_writer: the whoosh index writer to add to |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
249 :param repo_name: name of the repository we're indexing |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
250 :param repo: instance of vcs repo |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
251 """ |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
252 i_cnt = iwc_cnt = 0 |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
253 log.debug('building index for %s @revision:%s' % (repo.path, |
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
254 self._get_index_revision(repo))) |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
255 index_rev = self._get_index_revision(repo) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
256 for idx_path in self.get_paths(repo): |
4116
ffd45b185016
Imported some of the GPLv3'd changes from RhodeCode v2.2.5.
Bradley M. Kuhn <bkuhn@sfconservancy.org>
parents:
3960
diff
changeset
|
257 i, iwc = self.add_doc(file_idx_writer, idx_path, repo, repo_name, index_rev) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
258 i_cnt += i |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
259 iwc_cnt += iwc |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
260 |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
261 log.debug('added %s files %s with content for repo %s' % |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
262 (i_cnt + iwc_cnt, iwc_cnt, repo.path)) |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
263 return i_cnt, iwc_cnt |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
264 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
265 def update_changeset_index(self): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
266 idx = open_dir(self.index_location, indexname=CHGSET_IDX_NAME) |
2569
b98fd6fc67f9
Little better logging in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2388
diff
changeset
|
267 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
268 with idx.searcher() as searcher: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
269 writer = idx.writer() |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
270 writer_is_dirty = False |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
271 try: |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
272 indexed_total = 0 |
2839
c0ddc86b4654
Fix possible exception about repo_name not defined, on whoosh indexer when using index-only option
Marcin Kuzminski <marcin@python-works.com>
parents:
2763
diff
changeset
|
273 repo_name = None |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
274 for repo_name, repo in self.repo_paths.items(): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
275 # skip indexing if there aren't any revs in the repo |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
276 num_of_revs = len(repo) |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
277 if num_of_revs < 1: |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
278 continue |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
279 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
280 qp = QueryParser('repository', schema=CHGSETS_SCHEMA) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
281 q = qp.parse(u"last:t AND %s" % repo_name) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
282 |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
283 results = searcher.search(q) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
284 |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
285 # default to scanning the entire repo |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
286 last_rev = 0 |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
287 start_id = None |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
288 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
289 if len(results) > 0: |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
290 # assuming that there is only one result, if not this |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
291 # may require a full re-index. |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
292 start_id = results[0]['raw_id'] |
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
293 last_rev = repo.get_changeset(revision=start_id).revision |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
294 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
295 # there are new changesets to index or a new repo to index |
2643
2ad50c44b025
when indexing changesets use the raw_id to locate the point from
Indra Talip <indra.talip@gmail.com>
parents:
2642
diff
changeset
|
296 if last_rev == 0 or num_of_revs > last_rev + 1: |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
297 # delete the docs in the index for the previous |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
298 # last changeset(s) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
299 for hit in results: |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
300 q = qp.parse(u"last:t AND %s AND raw_id:%s" % |
2642
88b0e82bcba4
rename changeset index key to match raw_id rather than path for greater consistency
Indra Talip <indra.talip@gmail.com>
parents:
2641
diff
changeset
|
301 (repo_name, hit['raw_id'])) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
302 writer.delete_by_query(q) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
303 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
304 # index from the previous last changeset + all new ones |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
305 indexed_total += self.index_changesets(writer, |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
306 repo_name, repo, start_id) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
307 writer_is_dirty = True |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
308 log.debug('indexed %s changesets for repo %s' % ( |
3916
ba08786c49ef
fixed #850 Whoosh indexer should use the default revision flag to make index
Marcin Kuzminski <marcin@python-works.com>
parents:
3016
diff
changeset
|
309 indexed_total, repo_name) |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
310 ) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
311 finally: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
312 if writer_is_dirty: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
313 log.debug('>> COMMITING CHANGES TO CHANGESET INDEX<<') |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
314 writer.commit(merge=True) |
2840
c7c5825299fe
fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2839
diff
changeset
|
315 log.debug('>>> FINISHED REBUILDING CHANGESET INDEX <<<') |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
316 else: |
2840
c7c5825299fe
fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2839
diff
changeset
|
317 log.debug('>> NOTHING TO COMMIT TO CHANGESET INDEX<<') |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
318 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
319 def update_file_index(self): |
2373
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
320 log.debug((u'STARTING INCREMENTAL INDEXING UPDATE FOR EXTENSIONS %s ' |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
321 'AND REPOS %s') % (INDEX_EXTENSIONS, self.repo_paths.keys())) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
322 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
323 idx = open_dir(self.index_location, indexname=self.indexname) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
324 # The set of all paths in the index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
325 indexed_paths = set() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
326 # The set of all paths we need to re-index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
327 to_index = set() |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
328 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
329 writer = idx.writer() |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
330 writer_is_dirty = False |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
331 try: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
332 with idx.reader() as reader: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
333 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
334 # Loop over the stored fields in the index |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
335 for fields in reader.all_stored_fields(): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
336 indexed_path = fields['path'] |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
337 indexed_repo_path = fields['repository'] |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
338 indexed_paths.add(indexed_path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
339 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
340 if not indexed_repo_path in self.filtered_repo_update_paths: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
341 continue |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
342 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
343 repo = self.repo_paths[indexed_repo_path] |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
344 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
345 try: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
346 node = self.get_node(repo, indexed_path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
347 # Check if this file was changed since it was indexed |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
348 indexed_time = fields['modtime'] |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
349 mtime = self.get_node_mtime(node) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
350 if mtime > indexed_time: |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
351 # The file has changed, delete it and add it to |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
352 # the list of files to reindex |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
353 log.debug( |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
354 'adding to reindex list %s mtime: %s vs %s' % ( |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
355 indexed_path, mtime, indexed_time) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
356 ) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
357 writer.delete_by_term('fileid', indexed_path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
358 writer_is_dirty = True |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
359 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
360 to_index.add(indexed_path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
361 except (ChangesetError, NodeDoesNotExistError): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
362 # This file was deleted since it was indexed |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
363 log.debug('removing from index %s' % indexed_path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
364 writer.delete_by_term('path', indexed_path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
365 writer_is_dirty = True |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
366 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
367 # Loop over the files in the filesystem |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
368 # Assume we have a function that gathers the filenames of the |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
369 # documents to be indexed |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
370 ri_cnt_total = 0 # indexed |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
371 riwc_cnt_total = 0 # indexed with content |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
372 for repo_name, repo in self.repo_paths.items(): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
373 # skip indexing if there aren't any revisions |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
374 if len(repo) < 1: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
375 continue |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
376 ri_cnt = 0 # indexed |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
377 riwc_cnt = 0 # indexed with content |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
378 for path in self.get_paths(repo): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
379 path = safe_unicode(path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
380 if path in to_index or path not in indexed_paths: |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
381 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
382 # This is either a file that's changed, or a new file |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
383 # that wasn't indexed before. So index it! |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
384 i, iwc = self.add_doc(writer, path, repo, repo_name) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
385 writer_is_dirty = True |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
386 log.debug('re indexing %s' % path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
387 ri_cnt += i |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
388 ri_cnt_total += 1 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
389 riwc_cnt += iwc |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
390 riwc_cnt_total += iwc |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
391 log.debug('added %s files %s with content for repo %s' % ( |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
392 ri_cnt + riwc_cnt, riwc_cnt, repo.path) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
393 ) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
394 log.debug('indexed %s files in total and %s with content' % ( |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
395 ri_cnt_total, riwc_cnt_total) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
396 ) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
397 finally: |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
398 if writer_is_dirty: |
2840
c7c5825299fe
fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2839
diff
changeset
|
399 log.debug('>> COMMITING CHANGES TO FILE INDEX <<') |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
400 writer.commit(merge=True) |
2840
c7c5825299fe
fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2839
diff
changeset
|
401 log.debug('>>> FINISHED REBUILDING FILE INDEX <<<') |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
402 else: |
2840
c7c5825299fe
fixed logging messages on whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
2839
diff
changeset
|
403 log.debug('>> NOTHING TO COMMIT TO FILE INDEX <<') |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
404 writer.cancel() |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
405 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
406 def build_indexes(self): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
407 if os.path.exists(self.index_location): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
408 log.debug('removing previous index') |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
409 rmtree(self.index_location) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
410 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
411 if not os.path.exists(self.index_location): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
412 os.mkdir(self.index_location) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
413 |
2648
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
414 chgset_idx = create_in(self.index_location, CHGSETS_SCHEMA, |
0911cf6940af
little code cleanup
Marcin Kuzminski <marcin@python-works.com>
parents:
2643
diff
changeset
|
415 indexname=CHGSET_IDX_NAME) |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
416 chgset_idx_writer = chgset_idx.writer() |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
417 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
418 file_idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
419 file_idx_writer = file_idx.writer() |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
420 log.debug('BUILDING INDEX FOR EXTENSIONS %s ' |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
421 'AND REPOS %s' % (INDEX_EXTENSIONS, self.repo_paths.keys())) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
422 |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
423 for repo_name, repo in self.repo_paths.items(): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
424 # skip indexing if there aren't any revisions |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
425 if len(repo) < 1: |
2373
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
426 continue |
1828eb7fa688
#469 added --update-only option to whoosh to re-index only given list
Marcin Kuzminski <marcin@python-works.com>
parents:
2372
diff
changeset
|
427 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
428 self.index_files(file_idx_writer, repo_name, repo) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
429 self.index_changesets(chgset_idx_writer, repo_name, repo) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
430 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
431 log.debug('>> COMMITING CHANGES <<') |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
432 file_idx_writer.commit(merge=True) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
433 chgset_idx_writer.commit(merge=True) |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
434 log.debug('>>> FINISHED BUILDING INDEX <<<') |
2388
a0ef98f2520b
#453 added ID field in whoosh SCHEMA that solves the issue of reindexing modified files
Marcin Kuzminski <marcin@python-works.com>
parents:
2373
diff
changeset
|
435 |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
436 def update_indexes(self): |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
437 self.update_file_index() |
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
438 self.update_changeset_index() |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
439 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
440 def run(self, full_index=False): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
441 """Run daemon""" |
465
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
442 if full_index or self.initial: |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
443 self.build_indexes() |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
444 else: |
2640
5f21a9dcb09d
create an index for commit messages and the ability to search them and see results
Indra Talip <indra.talip@gmail.com>
parents:
2569
diff
changeset
|
445 self.update_indexes() |