annotate rhodecode/lib/indexers/daemon.py @ 560:3072935bdeed

rewrote whoosh indexing to run internal repository.walk() instead of filesystem. Disabled default hg update hook (not needed since whoosh is not dependent on file system files to index)
author Marcin Kuzminski <marcin@python-works.com>
date Sat, 09 Oct 2010 00:22:19 +0200
parents 29ec9ddbe258
children 5f3b967d9d10
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
1 #!/usr/bin/env python
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
2 # encoding: utf-8
549
f99075170eb4 more renames for rhode code !!
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
3 # whoosh indexer daemon for rhodecode
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
4 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
5 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
6 # This program is free software; you can redistribute it and/or
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
7 # modify it under the terms of the GNU General Public License
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
8 # as published by the Free Software Foundation; version 2
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
9 # of the License or (at your opinion) any later version of the license.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
10 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
11 # This program is distributed in the hope that it will be useful,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
12 # but WITHOUT ANY WARRANTY; without even the implied warranty of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
14 # GNU General Public License for more details.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
15 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
16 # You should have received a copy of the GNU General Public License
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
17 # along with this program; if not, write to the Free Software
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
18 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
19 # MA 02110-1301, USA.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
20 """
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
21 Created on Jan 26, 2010
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
22
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
23 @author: marcink
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
24 A deamon will read from task table and run tasks
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
25 """
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
26 import sys
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
27 import os
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
28 from os.path import dirname as dn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
29 from os.path import join as jn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
30
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
31 #to get the rhodecode import
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
32 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
33 sys.path.append(project_path)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
34
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
35 from rhodecode.lib.pidlock import LockHeld, DaemonLock
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
36 from rhodecode.model.hg_model import HgModel
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
37 from rhodecode.lib.helpers import safe_unicode
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
38 from whoosh.index import create_in, open_dir
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
39 from shutil import rmtree
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
40 from rhodecode.lib.indexers import INDEX_EXTENSIONS, IDX_LOCATION, SCHEMA, IDX_NAME
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
41
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
42 from time import mktime
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
43 from vcs.backends import hg
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
44
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
45 import logging
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
46
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
47 log = logging.getLogger('whooshIndexer')
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
48 # create logger
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
49 log.setLevel(logging.DEBUG)
491
fefffd6fd5f4 Added some more tests, rewrite testing schema, to autogenerate fresh db, new index.
Marcin Kuzminski <marcin@python-works.com>
parents: 483
diff changeset
50 log.propagate = False
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
51 # create console handler and set level to debug
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
52 ch = logging.StreamHandler()
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
53 ch.setLevel(logging.DEBUG)
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
54
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
55 # create formatter
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
56 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
57
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
58 # add formatter to ch
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
59 ch.setFormatter(formatter)
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
60
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
61 # add ch to logger
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
62 log.addHandler(ch)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
63
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
64 def scan_paths(root_location):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
65 return HgModel.repo_scan('/', root_location, None, True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
66
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
67 class WhooshIndexingDaemon(object):
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
68 """
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
69 Deamon for atomic jobs
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
70 """
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
71
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
72 def __init__(self, indexname='HG_INDEX', repo_location=None):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
73 self.indexname = indexname
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
74 self.repo_location = repo_location
465
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
75 self.initial = False
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
76 if not os.path.isdir(IDX_LOCATION):
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
77 os.mkdir(IDX_LOCATION)
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
78 log.info('Cannot run incremental index since it does not'
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
79 ' yet exist running full build')
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
80 self.initial = True
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
81
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
82 def get_paths(self, root_dir):
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
83 """
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
84 recursive walk in root dir and return a set of all path in that dir
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
85 based on repository walk function
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
86 """
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
87 repo = hg.MercurialRepository(root_dir)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
88 index_paths_ = set()
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
89 for topnode, dirs, files in repo.walk('/', 'tip'):
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
90 for f in files:
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
91 index_paths_.add(jn(root_dir, f.path))
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
92 for dir in dirs:
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
93 for f in files:
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
94 index_paths_.add(jn(root_dir, f.path))
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
95
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
96 return index_paths_
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
97
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
98
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
99 def add_doc(self, writer, path, repo):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
100 """Adding doc to writer"""
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
101 n_path = path[len(repo.path) + 1:]
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
102 node = repo.get_changeset().get_node(n_path)
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
103
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
104 #we just index the content of chosen files
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
105 if node.extension in INDEX_EXTENSIONS:
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
106 log.debug(' >> %s [WITH CONTENT]' % path)
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
107 u_content = node.content
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
108 else:
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
109 log.debug(' >> %s' % path)
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
110 #just index file name without it's content
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
111 u_content = u''
441
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
112
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
113 writer.add_document(owner=unicode(repo.contact),
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
114 repository=safe_unicode(repo.name),
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
115 path=safe_unicode(path),
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
116 content=u_content,
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
117 modtime=mktime(node.last_changeset.date.timetuple()),
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
118 extension=node.extension)
441
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
119
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
120
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
121 def build_index(self):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
122 if os.path.exists(IDX_LOCATION):
560
3072935bdeed rewrote whoosh indexing to run internal repository.walk() instead of filesystem.
Marcin Kuzminski <marcin@python-works.com>
parents: 557
diff changeset
123 log.debug('removing previous index')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
124 rmtree(IDX_LOCATION)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
125
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
126 if not os.path.exists(IDX_LOCATION):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
127 os.mkdir(IDX_LOCATION)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
128
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
129 idx = create_in(IDX_LOCATION, SCHEMA, indexname=IDX_NAME)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
130 writer = idx.writer()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
131
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
132 for cnt, repo in enumerate(scan_paths(self.repo_location).values()):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
133 log.debug('building index @ %s' % repo.path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
134
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
135 for idx_path in self.get_paths(repo.path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
136 self.add_doc(writer, idx_path, repo)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
137 writer.commit(merge=True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
138
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
139 log.debug('>>> FINISHED BUILDING INDEX <<<')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
140
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
141
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
142 def update_index(self):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
143 log.debug('STARTING INCREMENTAL INDEXING UPDATE')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
144
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
145 idx = open_dir(IDX_LOCATION, indexname=self.indexname)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
146 # The set of all paths in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
147 indexed_paths = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
148 # The set of all paths we need to re-index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
149 to_index = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
150
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
151 reader = idx.reader()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
152 writer = idx.writer()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
153
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
154 # Loop over the stored fields in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
155 for fields in reader.all_stored_fields():
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
156 indexed_path = fields['path']
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
157 indexed_paths.add(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
158
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
159 if not os.path.exists(indexed_path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
160 # This file was deleted since it was indexed
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
161 log.debug('removing from index %s' % indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
162 writer.delete_by_term('path', indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
163
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
164 else:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
165 # Check if this file was changed since it
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
166 # was indexed
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
167 indexed_time = fields['modtime']
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
168
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
169 mtime = os.path.getmtime(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
170
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
171 if mtime > indexed_time:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
172
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
173 # The file has changed, delete it and add it to the list of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
174 # files to reindex
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
175 log.debug('adding to reindex list %s' % indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
176 writer.delete_by_term('path', indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
177 to_index.add(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
178 #writer.commit()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
179
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
180 # Loop over the files in the filesystem
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
181 # Assume we have a function that gathers the filenames of the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
182 # documents to be indexed
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
183 for repo in scan_paths(self.repo_location).values():
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
184 for path in self.get_paths(repo.path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
185 if path in to_index or path not in indexed_paths:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
186 # This is either a file that's changed, or a new file
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
187 # that wasn't indexed before. So index it!
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
188 self.add_doc(writer, path, repo)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
189 log.debug('reindexing %s' % path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
190
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
191 writer.commit(merge=True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
192 #idx.optimize()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
193 log.debug('>>> FINISHED <<<')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
194
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
195 def run(self, full_index=False):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
196 """Run daemon"""
465
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
197 if full_index or self.initial:
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
198 self.build_index()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
199 else:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
200 self.update_index()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
201
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
202 if __name__ == "__main__":
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
203 arg = sys.argv[1:]
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
204 if len(arg) != 2:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
205 sys.stderr.write('Please specify indexing type [full|incremental]'
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
206 'and path to repositories as script args \n')
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
207 sys.exit()
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
208
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
209
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
210 if arg[0] == 'full':
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
211 full_index = True
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
212 elif arg[0] == 'incremental':
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
213 # False means looking just for changes
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
214 full_index = False
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
215 else:
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
216 sys.stdout.write('Please use [full|incremental]'
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
217 ' as script first arg \n')
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
218 sys.exit()
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
219
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
220 if not os.path.isdir(arg[1]):
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
221 sys.stderr.write('%s is not a valid path \n' % arg[1])
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
222 sys.exit()
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
223 else:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
224 if arg[1].endswith('/'):
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
225 repo_location = arg[1] + '*'
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
226 else:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
227 repo_location = arg[1] + '/*'
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
228
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
229 try:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
230 l = DaemonLock()
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
231 WhooshIndexingDaemon(repo_location=repo_location)\
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
232 .run(full_index=full_index)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
233 l.release()
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
234 reload(logging)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
235 except LockHeld:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
236 sys.exit(1)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
237