annotate pylons_app/lib/indexers/daemon.py @ 483:a9e50dce3081 celery

Removed config names from whoosh and celery, celery is now configured based on the config name it's using on celeryconfig. And whoosh uses it's own logger configured just for whoosh Test creates a fresh whoosh index now, for more accurate checks fixed tests for searching
author Marcin Kuzminski <marcin@python-works.com>
date Fri, 17 Sep 2010 22:54:30 +0200
parents e01a85f9fc90
children fefffd6fd5f4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
1 #!/usr/bin/env python
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
2 # encoding: utf-8
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
3 # whoosh indexer daemon for hg-app
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
4 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
5 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
6 # This program is free software; you can redistribute it and/or
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
7 # modify it under the terms of the GNU General Public License
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
8 # as published by the Free Software Foundation; version 2
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
9 # of the License or (at your opinion) any later version of the license.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
10 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
11 # This program is distributed in the hope that it will be useful,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
12 # but WITHOUT ANY WARRANTY; without even the implied warranty of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
14 # GNU General Public License for more details.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
15 #
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
16 # You should have received a copy of the GNU General Public License
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
17 # along with this program; if not, write to the Free Software
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
18 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
19 # MA 02110-1301, USA.
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
20 """
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
21 Created on Jan 26, 2010
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
22
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
23 @author: marcink
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
24 A deamon will read from task table and run tasks
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
25 """
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
26 import sys
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
27 import os
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
28 from os.path import dirname as dn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
29 from os.path import join as jn
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
30
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
31 #to get the pylons_app import
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
32 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
33 sys.path.append(project_path)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
34
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
35 from pidlock import LockHeld, DaemonLock
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
36 from pylons_app.model.hg_model import HgModel
443
e5157e2a530e added safe unicode funtion, and implemented it in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 441
diff changeset
37 from pylons_app.lib.helpers import safe_unicode
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
38 from whoosh.index import create_in, open_dir
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
39 from shutil import rmtree
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
40 from pylons_app.lib.indexers import INDEX_EXTENSIONS, IDX_LOCATION, SCHEMA, IDX_NAME
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
41
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
42 import logging
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
43
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
44 log = logging.getLogger('whooshIndexer')
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
45 # create logger
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
46 log.setLevel(logging.DEBUG)
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
47
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
48 # create console handler and set level to debug
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
49 ch = logging.StreamHandler()
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
50 ch.setLevel(logging.DEBUG)
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
51
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
52 # create formatter
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
53 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
54
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
55 # add formatter to ch
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
56 ch.setFormatter(formatter)
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
57
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
58 # add ch to logger
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
59 log.addHandler(ch)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
60
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
61 def scan_paths(root_location):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
62 return HgModel.repo_scan('/', root_location, None, True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
63
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
64 class WhooshIndexingDaemon(object):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
65 """Deamon for atomic jobs"""
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
66
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
67 def __init__(self, indexname='HG_INDEX', repo_location=None):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
68 self.indexname = indexname
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
69 self.repo_location = repo_location
465
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
70 self.initial = False
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
71 if not os.path.isdir(IDX_LOCATION):
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
72 os.mkdir(IDX_LOCATION)
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
73 log.info('Cannot run incremental index since it does not'
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
74 ' yet exist running full build')
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
75 self.initial = True
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
76
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
77 def get_paths(self, root_dir):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
78 """recursive walk in root dir and return a set of all path in that dir
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
79 excluding files in .hg dir"""
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
80 index_paths_ = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
81 for path, dirs, files in os.walk(root_dir):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
82 if path.find('.hg') == -1:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
83 for f in files:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
84 index_paths_.add(jn(path, f))
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
85
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
86 return index_paths_
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
87
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
88 def add_doc(self, writer, path, repo):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
89 """Adding doc to writer"""
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
90
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
91 ext = unicode(path.split('/')[-1].split('.')[-1].lower())
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
92 #we just index the content of choosen files
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
93 if ext in INDEX_EXTENSIONS:
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
94 log.debug(' >> %s [WITH CONTENT]' % path)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
95 fobj = open(path, 'rb')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
96 content = fobj.read()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
97 fobj.close()
443
e5157e2a530e added safe unicode funtion, and implemented it in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 441
diff changeset
98 u_content = safe_unicode(content)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
99 else:
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
100 log.debug(' >> %s' % path)
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
101 #just index file name without it's content
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
102 u_content = u''
441
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
103
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
104
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
105
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
106 try:
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
107 os.stat(path)
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
108 writer.add_document(owner=unicode(repo.contact),
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
109 repository=u"%s" % repo.name,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
110 path=u"%s" % path,
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
111 content=u_content,
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
112 modtime=os.path.getmtime(path),
441
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
113 extension=ext)
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
114 except OSError, e:
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
115 import errno
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
116 if e.errno == errno.ENOENT:
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
117 log.debug('path %s does not exist or is a broken symlink' % path)
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
118 else:
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
119 raise e
c59c4d4323e7 added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
120
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
121
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
122 def build_index(self):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
123 if os.path.exists(IDX_LOCATION):
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
124 log.debug('removing previos index')
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
125 rmtree(IDX_LOCATION)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
126
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
127 if not os.path.exists(IDX_LOCATION):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
128 os.mkdir(IDX_LOCATION)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
129
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
130 idx = create_in(IDX_LOCATION, SCHEMA, indexname=IDX_NAME)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
131 writer = idx.writer()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
132
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
133 for cnt, repo in enumerate(scan_paths(self.repo_location).values()):
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
134 log.debug('building index @ %s' % repo.path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
135
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
136 for idx_path in self.get_paths(repo.path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
137 self.add_doc(writer, idx_path, repo)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
138 writer.commit(merge=True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
139
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
140 log.debug('>>> FINISHED BUILDING INDEX <<<')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
141
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
142
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
143 def update_index(self):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
144 log.debug('STARTING INCREMENTAL INDEXING UPDATE')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
145
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
146 idx = open_dir(IDX_LOCATION, indexname=self.indexname)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
147 # The set of all paths in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
148 indexed_paths = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
149 # The set of all paths we need to re-index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
150 to_index = set()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
151
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
152 reader = idx.reader()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
153 writer = idx.writer()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
154
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
155 # Loop over the stored fields in the index
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
156 for fields in reader.all_stored_fields():
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
157 indexed_path = fields['path']
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
158 indexed_paths.add(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
159
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
160 if not os.path.exists(indexed_path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
161 # This file was deleted since it was indexed
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
162 log.debug('removing from index %s' % indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
163 writer.delete_by_term('path', indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
164
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
165 else:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
166 # Check if this file was changed since it
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
167 # was indexed
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
168 indexed_time = fields['modtime']
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
169
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
170 mtime = os.path.getmtime(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
171
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
172 if mtime > indexed_time:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
173
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
174 # The file has changed, delete it and add it to the list of
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
175 # files to reindex
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
176 log.debug('adding to reindex list %s' % indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
177 writer.delete_by_term('path', indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
178 to_index.add(indexed_path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
179 #writer.commit()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
180
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
181 # Loop over the files in the filesystem
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
182 # Assume we have a function that gathers the filenames of the
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
183 # documents to be indexed
411
9b67cebe6609 some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents: 407
diff changeset
184 for repo in scan_paths(self.repo_location).values():
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
185 for path in self.get_paths(repo.path):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
186 if path in to_index or path not in indexed_paths:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
187 # This is either a file that's changed, or a new file
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
188 # that wasn't indexed before. So index it!
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
189 self.add_doc(writer, path, repo)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
190 log.debug('reindexing %s' % path)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
191
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
192 writer.commit(merge=True)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
193 #idx.optimize()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
194 log.debug('>>> FINISHED <<<')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
195
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
196 def run(self, full_index=False):
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
197 """Run daemon"""
465
e01a85f9fc90 fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents: 452
diff changeset
198 if full_index or self.initial:
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
199 self.build_index()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
200 else:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
201 self.update_index()
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
202
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
203 if __name__ == "__main__":
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
204 arg = sys.argv[1:]
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
205 if len(arg) != 2:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
206 sys.stderr.write('Please specify indexing type [full|incremental]'
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
207 'and path to repositories as script args \n')
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
208 sys.exit()
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
209
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
210
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
211 if arg[0] == 'full':
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
212 full_index = True
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
213 elif arg[0] == 'incremental':
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
214 # False means looking just for changes
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
215 full_index = False
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
216 else:
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
217 sys.stdout.write('Please use [full|incremental]'
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
218 ' as script first arg \n')
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
219 sys.exit()
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
220
452
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
221 if not os.path.isdir(arg[1]):
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
222 sys.stderr.write('%s is not a valid path \n' % arg[1])
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
223 sys.exit()
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
224 else:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
225 if arg[1].endswith('/'):
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
226 repo_location = arg[1] + '*'
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
227 else:
f19d3ee89335 updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents: 451
diff changeset
228 repo_location = arg[1] + '/*'
451
d726f62f886e updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents: 443
diff changeset
229
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
230 try:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
231 l = DaemonLock()
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
232 WhooshIndexingDaemon(repo_location=repo_location)\
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 411
diff changeset
233 .run(full_index=full_index)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
234 l.release()
483
a9e50dce3081 Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents: 465
diff changeset
235 reload(logging)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
236 except LockHeld:
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
237 sys.exit(1)
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
238