annotate rhodecode/lib/indexers/daemon.py @ 561:5f3b967d9d10

fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
author Marcin Kuzminski <marcin@python-works.com>
date Sat, 09 Oct 2010 01:11:44 +0200
parents 3072935bdeed
children 80dc0a23edf7
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
1 #!/usr/bin/env python
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
2 # encoding: utf-8
549
f99075170eb4 more renames for rhode code !!
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
3 # whoosh indexer daemon for rhodecode
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
4 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
5 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
6 # This program is free software; you can redistribute it and/or
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
7 # modify it under the terms of the GNU General Public License
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
8 # as published by the Free Software Foundation; version 2
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
9 # of the License or (at your opinion) any later version of the license.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
10 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
11 # This program is distributed in the hope that it will be useful,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
12 # but WITHOUT ANY WARRANTY; without even the implied warranty of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
14 # GNU General Public License for more details.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
15 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
16 # You should have received a copy of the GNU General Public License
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
17 # along with this program; if not, write to the Free Software
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
18 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
19 # MA 02110-1301, USA.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
20 """
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
21 Created on Jan 26, 2010
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
22
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
23 @author: marcink
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
24 A deamon will read from task table and run tasks
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
25 """
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
26 import sys
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
27 import os
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
28 from os.path import dirname as dn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
29 from os.path import join as jn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
30
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
31 #to get the rhodecode import
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
32 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
33 sys.path.append(project_path)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
34
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
35 from rhodecode.lib.pidlock import LockHeld, DaemonLock
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
36 from rhodecode.model.hg_model import HgModel
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
37 from rhodecode.lib.helpers import safe_unicode
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
38 from whoosh.index import create_in, open_dir
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
39 from shutil import rmtree
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
40 from rhodecode.lib.indexers import INDEX_EXTENSIONS, IDX_LOCATION, SCHEMA, IDX_NAME
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
41
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
42 from time import mktime
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
43 from vcs.exceptions import ChangesetError
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
44
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
45 import logging
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
46
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
47 log = logging.getLogger('whooshIndexer')
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
48 # create logger
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
49 log.setLevel(logging.DEBUG)
491
fefffd6fd5f4 Added some more tests, rewrite testing schema, to autogenerate fresh db, new index.
Marcin Kuzminski <marcin@python-works.com>
parents: 483
diff changeset
50 log.propagate = False
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
51 # create console handler and set level to debug
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
52 ch = logging.StreamHandler()
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
53 ch.setLevel(logging.DEBUG)
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
54
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
55 # create formatter
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
56 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
57
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
58 # add formatter to ch
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
59 ch.setFormatter(formatter)
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
60
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
61 # add ch to logger
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
62 log.addHandler(ch)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
63
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
64 def scan_paths(root_location):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
65 return HgModel.repo_scan('/', root_location, None, True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
66
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
67 class WhooshIndexingDaemon(object):
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
68 """
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
69 Deamon for atomic jobs
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
70 """
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
71
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
72 def __init__(self, indexname='HG_INDEX', repo_location=None):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
73 self.indexname = indexname
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
74 self.repo_location = repo_location
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
75 self.repo_paths = scan_paths(self.repo_location)
465
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
76 self.initial = False
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
77 if not os.path.isdir(IDX_LOCATION):
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
78 os.mkdir(IDX_LOCATION)
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
79 log.info('Cannot run incremental index since it does not'
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
80 ' yet exist running full build')
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
81 self.initial = True
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
82
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
83 def get_paths(self, repo):
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
84 """
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
85 recursive walk in root dir and return a set of all path in that dir
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
86 based on repository walk function
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
87 """
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
88 index_paths_ = set()
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
89 for topnode, dirs, files in repo.walk('/', 'tip'):
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
90 for f in files:
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
91 index_paths_.add(jn(repo.path, f.path))
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
92 for dir in dirs:
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
93 for f in files:
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
94 index_paths_.add(jn(repo.path, f.path))
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
95
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
96 return index_paths_
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
97
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
98 def get_node(self, repo, path):
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
99 n_path = path[len(repo.path) + 1:]
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
100 node = repo.get_changeset().get_node(n_path)
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
101 return node
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
102
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
103 def get_node_mtime(self, node):
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
104 return mktime(node.last_changeset.date.timetuple())
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
105
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
106 def add_doc(self, writer, path, repo):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
107 """Adding doc to writer"""
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
108 node = self.get_node(repo, path)
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
109
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
110 #we just index the content of chosen files
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
111 if node.extension in INDEX_EXTENSIONS:
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
112 log.debug(' >> %s [WITH CONTENT]' % path)
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
113 u_content = node.content
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
114 else:
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
115 log.debug(' >> %s' % path)
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
116 #just index file name without it's content
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
117 u_content = u''
441
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
118
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
119 writer.add_document(owner=unicode(repo.contact),
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
120 repository=safe_unicode(repo.name),
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
121 path=safe_unicode(path),
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
122 content=u_content,
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
123 modtime=self.get_node_mtime(node),
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
124 extension=node.extension)
441
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
125
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
126
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
127 def build_index(self):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
128 if os.path.exists(IDX_LOCATION):
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
129 log.debug('removing previous index')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
130 rmtree(IDX_LOCATION)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
131
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
132 if not os.path.exists(IDX_LOCATION):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
133 os.mkdir(IDX_LOCATION)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
134
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
135 idx = create_in(IDX_LOCATION, SCHEMA, indexname=IDX_NAME)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
136 writer = idx.writer()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
137
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
138 for cnt, repo in enumerate(self.repo_paths.values()):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
139 log.debug('building index @ %s' % repo.path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
140
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
141 for idx_path in self.get_paths(repo):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
142 self.add_doc(writer, idx_path, repo)
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
143
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
144 log.debug('>> COMMITING CHANGES <<')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
145 writer.commit(merge=True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
146 log.debug('>>> FINISHED BUILDING INDEX <<<')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
147
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
148
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
149 def update_index(self):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
150 log.debug('STARTING INCREMENTAL INDEXING UPDATE')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
151
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
152 idx = open_dir(IDX_LOCATION, indexname=self.indexname)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
153 # The set of all paths in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
154 indexed_paths = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
155 # The set of all paths we need to re-index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
156 to_index = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
157
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
158 reader = idx.reader()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
159 writer = idx.writer()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
160
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
161 # Loop over the stored fields in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
162 for fields in reader.all_stored_fields():
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
163 indexed_path = fields['path']
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
164 indexed_paths.add(indexed_path)
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
165
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
166 repo = self.repo_paths[fields['repository']]
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
167
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
168 try:
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
169 node = self.get_node(repo, indexed_path)
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
170 except ChangesetError:
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
171 # This file was deleted since it was indexed
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
172 log.debug('removing from index %s' % indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
173 writer.delete_by_term('path', indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
174
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
175 else:
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
176 # Check if this file was changed since it was indexed
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
177 indexed_time = fields['modtime']
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
178 mtime = self.get_node_mtime(node)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
179 if mtime > indexed_time:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
180 # The file has changed, delete it and add it to the list of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
181 # files to reindex
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
182 log.debug('adding to reindex list %s' % indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
183 writer.delete_by_term('path', indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
184 to_index.add(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
185
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
186 # Loop over the files in the filesystem
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
187 # Assume we have a function that gathers the filenames of the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
188 # documents to be indexed
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
189 for repo in self.repo_paths.values():
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
190 for path in self.get_paths(repo):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
191 if path in to_index or path not in indexed_paths:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
192 # This is either a file that's changed, or a new file
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
193 # that wasn't indexed before. So index it!
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
194 self.add_doc(writer, path, repo)
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
195 log.debug('re indexing %s' % path)
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
196
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
197 log.debug('>> COMMITING CHANGES <<')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
198 writer.commit(merge=True)
561
5f3b967d9d10 fixed reindexing, and made some optimizations to reuse repo instances from repo scann list.
Marcin Kuzminski <marcin@python-works.com>
parents: 560
diff changeset
199 log.debug('>>> FINISHED REBUILDING INDEX <<<')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
200
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
201 def run(self, full_index=False):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
202 """Run daemon"""
465
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
203 if full_index or self.initial:
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
204 self.build_index()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
205 else:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
206 self.update_index()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
207
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
208 if __name__ == "__main__":
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
209 arg = sys.argv[1:]
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
210 if len(arg) != 2:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
211 sys.stderr.write('Please specify indexing type [full|incremental]'
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
212 'and path to repositories as script args \n')
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
213 sys.exit()
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
214
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
215
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
216 if arg[0] == 'full':
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
217 full_index = True
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
218 elif arg[0] == 'incremental':
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
219 # False means looking just for changes
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
220 full_index = False
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
221 else:
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
222 sys.stdout.write('Please use [full|incremental]'
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
223 ' as script first arg \n')
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
224 sys.exit()
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
225
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
226 if not os.path.isdir(arg[1]):
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
227 sys.stderr.write('%s is not a valid path \n' % arg[1])
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
228 sys.exit()
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
229 else:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
230 if arg[1].endswith('/'):
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
231 repo_location = arg[1] + '*'
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
232 else:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
233 repo_location = arg[1] + '/*'
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
234
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
235 try:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
236 l = DaemonLock()
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
237 WhooshIndexingDaemon(repo_location=repo_location)\
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
238 .run(full_index=full_index)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
239 l.release()
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
240 reload(logging)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
241 except LockHeld:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
242 sys.exit(1)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
243