annotate pylons_app/lib/indexers/daemon.py @ 452:f19d3ee89335

updated whoosh indexer to take path as second argument
author Marcin Kuzminski <marcin@python-works.com>
date Fri, 03 Sep 2010 13:15:16 +0200
parents d726f62f886e
children e01a85f9fc90
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
1 #!/usr/bin/env python
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
2 # encoding: utf-8
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
3 # whoosh indexer daemon for hg-app
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
4 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
5 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
6 # This program is free software; you can redistribute it and/or
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
7 # modify it under the terms of the GNU General Public License
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
8 # as published by the Free Software Foundation; version 2
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
9 # of the License or (at your opinion) any later version of the license.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
10 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
11 # This program is distributed in the hope that it will be useful,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
12 # but WITHOUT ANY WARRANTY; without even the implied warranty of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
14 # GNU General Public License for more details.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
15 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
16 # You should have received a copy of the GNU General Public License
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
17 # along with this program; if not, write to the Free Software
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
18 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
19 # MA 02110-1301, USA.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
20 """
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
21 Created on Jan 26, 2010
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
22
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
23 @author: marcink
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
24 A deamon will read from task table and run tasks
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
25 """
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
26 import sys
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
27 import os
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
28 from os.path import dirname as dn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
29 from os.path import join as jn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
30
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
31 #to get the pylons_app import
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
32 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
33 sys.path.append(project_path)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
34
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
35 from pidlock import LockHeld, DaemonLock
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
36 import traceback
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
37 from pylons_app.config.environment import load_environment
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
38 from pylons_app.model.hg_model import HgModel
443
e5157e2a530e added safe unicode funtion, and implemented it in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 441
diff changeset
39 from pylons_app.lib.helpers import safe_unicode
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
40 from whoosh.index import create_in, open_dir
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
41 from shutil import rmtree
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
42 from pylons_app.lib.indexers import ANALYZER, INDEX_EXTENSIONS, IDX_LOCATION, \
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
43 SCHEMA, IDX_NAME
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
44
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
45 import logging
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
46 import logging.config
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
47 logging.config.fileConfig(jn(project_path, 'development.ini'))
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
48 log = logging.getLogger('whooshIndexer')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
49
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
50 def scan_paths(root_location):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
51 return HgModel.repo_scan('/', root_location, None, True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
52
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
53 class WhooshIndexingDaemon(object):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
54 """Deamon for atomic jobs"""
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
55
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
56 def __init__(self, indexname='HG_INDEX', repo_location=None):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
57 self.indexname = indexname
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
58 self.repo_location = repo_location
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
59
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
60 def get_paths(self, root_dir):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
61 """recursive walk in root dir and return a set of all path in that dir
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
62 excluding files in .hg dir"""
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
63 index_paths_ = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
64 for path, dirs, files in os.walk(root_dir):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
65 if path.find('.hg') == -1:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
66 for f in files:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
67 index_paths_.add(jn(path, f))
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
68
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
69 return index_paths_
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
70
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
71 def add_doc(self, writer, path, repo):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
72 """Adding doc to writer"""
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
73
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
74 ext = unicode(path.split('/')[-1].split('.')[-1].lower())
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
75 #we just index the content of choosen files
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
76 if ext in INDEX_EXTENSIONS:
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
77 log.debug(' >> %s [WITH CONTENT]' % path)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
78 fobj = open(path, 'rb')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
79 content = fobj.read()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
80 fobj.close()
443
e5157e2a530e added safe unicode funtion, and implemented it in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 441
diff changeset
81 u_content = safe_unicode(content)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
82 else:
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
83 log.debug(' >> %s' % path)
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
84 #just index file name without it's content
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
85 u_content = u''
441
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
86
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
87
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
88
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
89 try:
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
90 os.stat(path)
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
91 writer.add_document(owner=unicode(repo.contact),
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
92 repository=u"%s" % repo.name,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
93 path=u"%s" % path,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
94 content=u_content,
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
95 modtime=os.path.getmtime(path),
441
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
96 extension=ext)
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
97 except OSError, e:
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
98 import errno
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
99 if e.errno == errno.ENOENT:
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
100 log.debug('path %s does not exist or is a broken symlink' % path)
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
101 else:
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
102 raise e
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
103
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
104
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
105 def build_index(self):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
106 if os.path.exists(IDX_LOCATION):
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
107 log.debug('removing previos index')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
108 rmtree(IDX_LOCATION)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
109
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
110 if not os.path.exists(IDX_LOCATION):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
111 os.mkdir(IDX_LOCATION)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
112
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
113 idx = create_in(IDX_LOCATION, SCHEMA, indexname=IDX_NAME)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
114 writer = idx.writer()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
115
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
116 for cnt, repo in enumerate(scan_paths(self.repo_location).values()):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
117 log.debug('building index @ %s' % repo.path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
118
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
119 for idx_path in self.get_paths(repo.path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
120 self.add_doc(writer, idx_path, repo)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
121 writer.commit(merge=True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
122
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
123 log.debug('>>> FINISHED BUILDING INDEX <<<')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
124
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
125
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
126 def update_index(self):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
127 log.debug('STARTING INCREMENTAL INDEXING UPDATE')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
128
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
129 idx = open_dir(IDX_LOCATION, indexname=self.indexname)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
130 # The set of all paths in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
131 indexed_paths = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
132 # The set of all paths we need to re-index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
133 to_index = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
134
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
135 reader = idx.reader()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
136 writer = idx.writer()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
137
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
138 # Loop over the stored fields in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
139 for fields in reader.all_stored_fields():
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
140 indexed_path = fields['path']
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
141 indexed_paths.add(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
142
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
143 if not os.path.exists(indexed_path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
144 # This file was deleted since it was indexed
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
145 log.debug('removing from index %s' % indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
146 writer.delete_by_term('path', indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
147
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
148 else:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
149 # Check if this file was changed since it
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
150 # was indexed
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
151 indexed_time = fields['modtime']
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
152
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
153 mtime = os.path.getmtime(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
154
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
155 if mtime > indexed_time:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
156
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
157 # The file has changed, delete it and add it to the list of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
158 # files to reindex
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
159 log.debug('adding to reindex list %s' % indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
160 writer.delete_by_term('path', indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
161 to_index.add(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
162 #writer.commit()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
163
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
164 # Loop over the files in the filesystem
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
165 # Assume we have a function that gathers the filenames of the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
166 # documents to be indexed
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
167 for repo in scan_paths(self.repo_location).values():
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
168 for path in self.get_paths(repo.path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
169 if path in to_index or path not in indexed_paths:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
170 # This is either a file that's changed, or a new file
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
171 # that wasn't indexed before. So index it!
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
172 self.add_doc(writer, path, repo)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
173 log.debug('reindexing %s' % path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
174
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
175 writer.commit(merge=True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
176 #idx.optimize()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
177 log.debug('>>> FINISHED <<<')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
178
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
179 def run(self, full_index=False):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
180 """Run daemon"""
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
181 if full_index:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
182 self.build_index()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
183 else:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
184 self.update_index()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
185
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
186 if __name__ == "__main__":
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
187 arg = sys.argv[1:]
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
188 if len(arg) != 2:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
189 sys.stderr.write('Please specify indexing type [full|incremental]'
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
190 'and path to repositories as script args \n')
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
191 sys.exit()
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
192
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
193
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
194 if arg[0] == 'full':
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
195 full_index = True
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
196 elif arg[0] == 'incremental':
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
197 # False means looking just for changes
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
198 full_index = False
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
199 else:
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
200 sys.stdout.write('Please use [full|incremental]'
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
201 ' as script first arg \n')
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
202 sys.exit()
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
203
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
204 if not os.path.isdir(arg[1]):
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
205 sys.stderr.write('%s is not a valid path \n' % arg[1])
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
206 sys.exit()
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
207 else:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
208 if arg[1].endswith('/'):
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
209 repo_location = arg[1] + '*'
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
210 else:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
211 repo_location = arg[1] + '/*'
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
212
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
213 try:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
214 l = DaemonLock()
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
215 WhooshIndexingDaemon(repo_location=repo_location)\
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
216 .run(full_index=full_index)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
217 l.release()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
218 except LockHeld:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
219 sys.exit(1)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
220