Mercurial > kallithea
annotate rhodecode/lib/indexers/daemon.py @ 2031:82a88013a3fd
merge 1.3 into stable
author | Marcin Kuzminski <marcin@python-works.com> |
---|---|
date | Sun, 26 Feb 2012 17:25:09 +0200 |
parents | bf263968da47 324ac367a4da |
children | dc2584ba5fbc |
rev | line source |
---|---|
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
1 # -*- coding: utf-8 -*- |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
2 """ |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
3 rhodecode.lib.indexers.daemon |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
5 |
1377
78e5853df5c8
fixed daemon typos
Marcin Kuzminski <marcin@python-works.com>
parents:
1206
diff
changeset
|
6 A daemon will read from task table and run tasks |
947
99850ac883d1
Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents:
902
diff
changeset
|
7 |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
8 :created_on: Jan 26, 2010 |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
9 :author: marcink |
1824
89efedac4e6c
2012 copyrights
Marcin Kuzminski <marcin@python-works.com>
parents:
1711
diff
changeset
|
10 :copyright: (C) 2010-2012 Marcin Kuzminski <marcin@python-works.com> |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
11 :license: GPLv3, see COPYING for more details. |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
12 """ |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
13 # This program is free software: you can redistribute it and/or modify |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
14 # it under the terms of the GNU General Public License as published by |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
15 # the Free Software Foundation, either version 3 of the License, or |
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
16 # (at your option) any later version. |
947
99850ac883d1
Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents:
902
diff
changeset
|
17 # |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
18 # This program is distributed in the hope that it will be useful, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
21 # GNU General Public License for more details. |
947
99850ac883d1
Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents:
902
diff
changeset
|
22 # |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
23 # You should have received a copy of the GNU General Public License |
1206
a671db5bdd58
fixed license issue #149
Marcin Kuzminski <marcin@python-works.com>
parents:
1183
diff
changeset
|
24 # along with this program. If not, see <http://www.gnu.org/licenses/>. |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
25 |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
26 import os |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
27 import sys |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
28 import logging |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
29 import traceback |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
30 |
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
31 from shutil import rmtree |
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
32 from time import mktime |
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
33 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
34 from os.path import dirname as dn |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
35 from os.path import join as jn |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
36 |
547
1e757ac98988
renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents:
497
diff
changeset
|
37 #to get the rhodecode import |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
38 project_path = dn(dn(dn(dn(os.path.realpath(__file__))))) |
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
39 sys.path.append(project_path) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
40 |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
41 |
691
7486da5f0628
Refactor codes for scm model
Marcin Kuzminski <marcin@python-works.com>
parents:
683
diff
changeset
|
42 from rhodecode.model.scm import ScmModel |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
43 from rhodecode.lib import safe_unicode |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
44 from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
45 |
2007
324ac367a4da
Added VCS into rhodecode core for faster and easier deployments of new versions
Marcin Kuzminski <marcin@python-works.com>
parents:
1995
diff
changeset
|
46 from rhodecode.lib.vcs.exceptions import ChangesetError, RepositoryError, \ |
1711
b369bec5d468
fixes issue with whoosh reindexing files that were removed or renamed
Marcin Kuzminski <marcin@python-works.com>
parents:
1451
diff
changeset
|
47 NodeDoesNotExistError |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
48 |
1154
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
49 from whoosh.index import create_in, open_dir |
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
50 |
36fe593dfe4b
simplified str2bool, and moved safe_unicode out of helpers since it was not html specific function
Marcin Kuzminski <marcin@python-works.com>
parents:
1036
diff
changeset
|
51 |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
52 log = logging.getLogger('whooshIndexer') |
483
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
53 # create logger |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
54 log.setLevel(logging.DEBUG) |
491
fefffd6fd5f4
Added some more tests, rewrite testing schema, to autogenerate fresh db, new index.
Marcin Kuzminski <marcin@python-works.com>
parents:
483
diff
changeset
|
55 log.propagate = False |
483
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
56 # create console handler and set level to debug |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
57 ch = logging.StreamHandler() |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
58 ch.setLevel(logging.DEBUG) |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
59 |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
60 # create formatter |
1183
514efe34c255
fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents:
1171
diff
changeset
|
61 formatter = logging.Formatter("%(asctime)s - %(name)s -" |
514efe34c255
fixes issue #146
Marcin Kuzminski <marcin@python-works.com>
parents:
1171
diff
changeset
|
62 " %(levelname)s - %(message)s") |
483
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
63 |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
64 # add formatter to ch |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
65 ch.setFormatter(formatter) |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
66 |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
67 # add ch to logger |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
465
diff
changeset
|
68 log.addHandler(ch) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
69 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
70 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
71 class WhooshIndexingDaemon(object): |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
72 """ |
1377
78e5853df5c8
fixed daemon typos
Marcin Kuzminski <marcin@python-works.com>
parents:
1206
diff
changeset
|
73 Daemon for atomic jobs |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
74 """ |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
75 |
1995
b6c902d88472
bumbed whoosh to 2.3.X series
Marcin Kuzminski <marcin@python-works.com>
parents:
1824
diff
changeset
|
76 def __init__(self, indexname=IDX_NAME, index_location=None, |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
77 repo_location=None, sa=None, repo_list=None): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
78 self.indexname = indexname |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
79 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
80 self.index_location = index_location |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
81 if not index_location: |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
82 raise Exception('You have to provide index location') |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
83 |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
84 self.repo_location = repo_location |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
85 if not repo_location: |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
86 raise Exception('You have to provide repositories location') |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
87 |
1036
405b80e4ccd5
Major refactoring, removed when possible calls to app globals.
Marcin Kuzminski <marcin@python-works.com>
parents:
947
diff
changeset
|
88 self.repo_paths = ScmModel(sa).repo_scan(self.repo_location) |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
89 |
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
90 if repo_list: |
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
91 filtered_repo_paths = {} |
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
92 for repo_name, repo in self.repo_paths.items(): |
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
93 if repo_name in repo_list: |
1171
2ab211e0aecd
changes for #56
Marcin Kuzminski <marcin@python-works.com>
parents:
1154
diff
changeset
|
94 filtered_repo_paths[repo_name] = repo |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
95 |
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
96 self.repo_paths = filtered_repo_paths |
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
97 |
465
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
98 self.initial = False |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
99 if not os.path.isdir(self.index_location): |
763
0dad296d2a57
extended trending languages to more entries, implemented new faster and "fancy"
Marcin Kuzminski <marcin@python-works.com>
parents:
691
diff
changeset
|
100 os.makedirs(self.index_location) |
465
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
101 log.info('Cannot run incremental index since it does not' |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
102 ' yet exist running full build') |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
103 self.initial = True |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
104 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
105 def get_paths(self, repo): |
683
341beaa9edba
Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents:
662
diff
changeset
|
106 """recursive walk in root dir and return a set of all path in that dir |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
107 based on repository walk function |
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
108 """ |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
109 index_paths_ = set() |
567
80dc0a23edf7
fixed whoosh failure on new repository
Marcin Kuzminski <marcin@python-works.com>
parents:
561
diff
changeset
|
110 try: |
947
99850ac883d1
Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents:
902
diff
changeset
|
111 tip = repo.get_changeset('tip') |
99850ac883d1
Fixed whoosh daemon, for depracated walk method
Marcin Kuzminski <marcin@python-works.com>
parents:
902
diff
changeset
|
112 for topnode, dirs, files in tip.walk('/'): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
113 for f in files: |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
114 index_paths_.add(jn(repo.path, f.path)) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
115 |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
116 except RepositoryError, e: |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
117 log.debug(traceback.format_exc()) |
567
80dc0a23edf7
fixed whoosh failure on new repository
Marcin Kuzminski <marcin@python-works.com>
parents:
561
diff
changeset
|
118 pass |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
119 return index_paths_ |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
120 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
121 def get_node(self, repo, path): |
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
122 n_path = path[len(repo.path) + 1:] |
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
123 node = repo.get_changeset().get_node(n_path) |
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
124 return node |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
125 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
126 def get_node_mtime(self, node): |
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
127 return mktime(node.last_changeset.date.timetuple()) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
128 |
1171
2ab211e0aecd
changes for #56
Marcin Kuzminski <marcin@python-works.com>
parents:
1154
diff
changeset
|
129 def add_doc(self, writer, path, repo, repo_name): |
683
341beaa9edba
Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents:
662
diff
changeset
|
130 """Adding doc to writer this function itself fetches data from |
341beaa9edba
Implemented whoosh index building as paster command.
Marcin Kuzminski <marcin@python-works.com>
parents:
662
diff
changeset
|
131 the instance of vcs backend""" |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
132 node = self.get_node(repo, path) |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
133 |
886
0736230c7d91
#92 removed content of binary files for whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
885
diff
changeset
|
134 #we just index the content of chosen files, and skip binary files |
0736230c7d91
#92 removed content of binary files for whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
885
diff
changeset
|
135 if node.extension in INDEX_EXTENSIONS and not node.is_binary: |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
136 |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
137 u_content = node.content |
885
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
138 if not isinstance(u_content, unicode): |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
139 log.warning(' >> %s Could not get this content as unicode ' |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
140 'replacing with empty content', path) |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
141 u_content = u'' |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
142 else: |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
143 log.debug(' >> %s [WITH CONTENT]' % path) |
94f7585af8a1
fixes to #92, updated changelog
Marcin Kuzminski <marcin@python-works.com>
parents:
777
diff
changeset
|
144 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
145 else: |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
146 log.debug(' >> %s' % path) |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
147 #just index file name without it's content |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
148 u_content = u'' |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
149 |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
150 writer.add_document(owner=unicode(repo.contact), |
1171
2ab211e0aecd
changes for #56
Marcin Kuzminski <marcin@python-works.com>
parents:
1154
diff
changeset
|
151 repository=safe_unicode(repo_name), |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
152 path=safe_unicode(path), |
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
153 content=u_content, |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
154 modtime=self.get_node_mtime(node), |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
155 extension=node.extension) |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
156 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
157 def build_index(self): |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
158 if os.path.exists(self.index_location): |
560
3072935bdeed
rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents:
557
diff
changeset
|
159 log.debug('removing previous index') |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
160 rmtree(self.index_location) |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
161 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
162 if not os.path.exists(self.index_location): |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
163 os.mkdir(self.index_location) |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
164 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
165 idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
166 writer = idx.writer() |
894
1fed3c9161bb
fixes #90 + docs update
Marcin Kuzminski <marcin@python-works.com>
parents:
886
diff
changeset
|
167 |
1171
2ab211e0aecd
changes for #56
Marcin Kuzminski <marcin@python-works.com>
parents:
1154
diff
changeset
|
168 for repo_name, repo in self.repo_paths.items(): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
169 log.debug('building index @ %s' % repo.path) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
170 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
171 for idx_path in self.get_paths(repo): |
1171
2ab211e0aecd
changes for #56
Marcin Kuzminski <marcin@python-works.com>
parents:
1154
diff
changeset
|
172 self.add_doc(writer, idx_path, repo, repo_name) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
173 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
174 log.debug('>> COMMITING CHANGES <<') |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
175 writer.commit(merge=True) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
176 log.debug('>>> FINISHED BUILDING INDEX <<<') |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
177 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
178 def update_index(self): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
179 log.debug('STARTING INCREMENTAL INDEXING UPDATE') |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
180 |
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
181 idx = open_dir(self.index_location, indexname=self.indexname) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
182 # The set of all paths in the index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
183 indexed_paths = set() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
184 # The set of all paths we need to re-index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
185 to_index = set() |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
186 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
187 reader = idx.reader() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
188 writer = idx.writer() |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
189 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
190 # Loop over the stored fields in the index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
191 for fields in reader.all_stored_fields(): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
192 indexed_path = fields['path'] |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
193 indexed_paths.add(indexed_path) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
194 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
195 repo = self.repo_paths[fields['repository']] |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
196 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
197 try: |
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
198 node = self.get_node(repo, indexed_path) |
1711
b369bec5d468
fixes issue with whoosh reindexing files that were removed or renamed
Marcin Kuzminski <marcin@python-works.com>
parents:
1451
diff
changeset
|
199 except (ChangesetError, NodeDoesNotExistError): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
200 # This file was deleted since it was indexed |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
201 log.debug('removing from index %s' % indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
202 writer.delete_by_term('path', indexed_path) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
203 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
204 else: |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
205 # Check if this file was changed since it was indexed |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
206 indexed_time = fields['modtime'] |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
207 mtime = self.get_node_mtime(node) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
208 if mtime > indexed_time: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
209 # The file has changed, delete it and add it to the list of |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
210 # files to reindex |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
211 log.debug('adding to reindex list %s' % indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
212 writer.delete_by_term('path', indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
213 to_index.add(indexed_path) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
214 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
215 # Loop over the files in the filesystem |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
216 # Assume we have a function that gathers the filenames of the |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
217 # documents to be indexed |
1171
2ab211e0aecd
changes for #56
Marcin Kuzminski <marcin@python-works.com>
parents:
1154
diff
changeset
|
218 for repo_name, repo in self.repo_paths.items(): |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
219 for path in self.get_paths(repo): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
220 if path in to_index or path not in indexed_paths: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
221 # This is either a file that's changed, or a new file |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
222 # that wasn't indexed before. So index it! |
1171
2ab211e0aecd
changes for #56
Marcin Kuzminski <marcin@python-works.com>
parents:
1154
diff
changeset
|
223 self.add_doc(writer, path, repo, repo_name) |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
224 log.debug('re indexing %s' % path) |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
225 |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
226 log.debug('>> COMMITING CHANGES <<') |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
227 writer.commit(merge=True) |
561
5f3b967d9d10
fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents:
560
diff
changeset
|
228 log.debug('>>> FINISHED REBUILDING INDEX <<<') |
631
05528ad948c4
Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents:
629
diff
changeset
|
229 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
230 def run(self, full_index=False): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
231 """Run daemon""" |
465
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
232 if full_index or self.initial: |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
233 self.build_index() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
234 else: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
235 self.update_index() |