Mercurial > kallithea
annotate pylons_app/lib/indexers/daemon.py @ 465:e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
author | Marcin Kuzminski <marcin@python-works.com> |
---|---|
date | Wed, 08 Sep 2010 01:33:38 +0200 |
parents | f19d3ee89335 |
children | a9e50dce3081 |
rev | line source |
---|---|
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
2 # encoding: utf-8 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
3 # whoosh indexer daemon for hg-app |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
4 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com> |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
5 # |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
6 # This program is free software; you can redistribute it and/or |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
7 # modify it under the terms of the GNU General Public License |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
8 # as published by the Free Software Foundation; version 2 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
9 # of the License or (at your opinion) any later version of the license. |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
10 # |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
11 # This program is distributed in the hope that it will be useful, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
12 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
14 # GNU General Public License for more details. |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
15 # |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
16 # You should have received a copy of the GNU General Public License |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
17 # along with this program; if not, write to the Free Software |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
18 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
19 # MA 02110-1301, USA. |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
20 """ |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
21 Created on Jan 26, 2010 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
22 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
23 @author: marcink |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
24 A deamon will read from task table and run tasks |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
25 """ |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
26 import sys |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
27 import os |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
28 from os.path import dirname as dn |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
29 from os.path import join as jn |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
30 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
31 #to get the pylons_app import |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
32 project_path = dn(dn(dn(dn(os.path.realpath(__file__))))) |
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
33 sys.path.append(project_path) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
34 |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
35 from pidlock import LockHeld, DaemonLock |
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
36 import traceback |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
37 from pylons_app.config.environment import load_environment |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
38 from pylons_app.model.hg_model import HgModel |
443
e5157e2a530e
added safe unicode funtion, and implemented it in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
39 from pylons_app.lib.helpers import safe_unicode |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
40 from whoosh.index import create_in, open_dir |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
41 from shutil import rmtree |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
42 from pylons_app.lib.indexers import ANALYZER, INDEX_EXTENSIONS, IDX_LOCATION, \ |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
43 SCHEMA, IDX_NAME |
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
44 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
45 import logging |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
46 import logging.config |
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
47 logging.config.fileConfig(jn(project_path, 'development.ini')) |
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
48 log = logging.getLogger('whooshIndexer') |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
49 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
50 def scan_paths(root_location): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
51 return HgModel.repo_scan('/', root_location, None, True) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
52 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
53 class WhooshIndexingDaemon(object): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
54 """Deamon for atomic jobs""" |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
55 |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
56 def __init__(self, indexname='HG_INDEX', repo_location=None): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
57 self.indexname = indexname |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
58 self.repo_location = repo_location |
465
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
59 self.initial = False |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
60 if not os.path.isdir(IDX_LOCATION): |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
61 os.mkdir(IDX_LOCATION) |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
62 log.info('Cannot run incremental index since it does not' |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
63 ' yet exist running full build') |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
64 self.initial = True |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
65 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
66 def get_paths(self, root_dir): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
67 """recursive walk in root dir and return a set of all path in that dir |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
68 excluding files in .hg dir""" |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
69 index_paths_ = set() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
70 for path, dirs, files in os.walk(root_dir): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
71 if path.find('.hg') == -1: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
72 for f in files: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
73 index_paths_.add(jn(path, f)) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
74 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
75 return index_paths_ |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
76 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
77 def add_doc(self, writer, path, repo): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
78 """Adding doc to writer""" |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
79 |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
80 ext = unicode(path.split('/')[-1].split('.')[-1].lower()) |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
81 #we just index the content of choosen files |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
82 if ext in INDEX_EXTENSIONS: |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
83 log.debug(' >> %s [WITH CONTENT]' % path) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
84 fobj = open(path, 'rb') |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
85 content = fobj.read() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
86 fobj.close() |
443
e5157e2a530e
added safe unicode funtion, and implemented it in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
87 u_content = safe_unicode(content) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
88 else: |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
89 log.debug(' >> %s' % path) |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
90 #just index file name without it's content |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
91 u_content = u'' |
441
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
92 |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
93 |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
94 |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
95 try: |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
96 os.stat(path) |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
97 writer.add_document(owner=unicode(repo.contact), |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
98 repository=u"%s" % repo.name, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
99 path=u"%s" % path, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
100 content=u_content, |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
101 modtime=os.path.getmtime(path), |
441
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
102 extension=ext) |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
103 except OSError, e: |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
104 import errno |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
105 if e.errno == errno.ENOENT: |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
106 log.debug('path %s does not exist or is a broken symlink' % path) |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
107 else: |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
108 raise e |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
436
diff
changeset
|
109 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
110 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
111 def build_index(self): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
112 if os.path.exists(IDX_LOCATION): |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
113 log.debug('removing previos index') |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
114 rmtree(IDX_LOCATION) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
115 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
116 if not os.path.exists(IDX_LOCATION): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
117 os.mkdir(IDX_LOCATION) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
118 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
119 idx = create_in(IDX_LOCATION, SCHEMA, indexname=IDX_NAME) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
120 writer = idx.writer() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
121 |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
122 for cnt, repo in enumerate(scan_paths(self.repo_location).values()): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
123 log.debug('building index @ %s' % repo.path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
124 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
125 for idx_path in self.get_paths(repo.path): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
126 self.add_doc(writer, idx_path, repo) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
127 writer.commit(merge=True) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
128 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
129 log.debug('>>> FINISHED BUILDING INDEX <<<') |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
130 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
131 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
132 def update_index(self): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
133 log.debug('STARTING INCREMENTAL INDEXING UPDATE') |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
134 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
135 idx = open_dir(IDX_LOCATION, indexname=self.indexname) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
136 # The set of all paths in the index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
137 indexed_paths = set() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
138 # The set of all paths we need to re-index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
139 to_index = set() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
140 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
141 reader = idx.reader() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
142 writer = idx.writer() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
143 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
144 # Loop over the stored fields in the index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
145 for fields in reader.all_stored_fields(): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
146 indexed_path = fields['path'] |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
147 indexed_paths.add(indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
148 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
149 if not os.path.exists(indexed_path): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
150 # This file was deleted since it was indexed |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
151 log.debug('removing from index %s' % indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
152 writer.delete_by_term('path', indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
153 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
154 else: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
155 # Check if this file was changed since it |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
156 # was indexed |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
157 indexed_time = fields['modtime'] |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
158 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
159 mtime = os.path.getmtime(indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
160 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
161 if mtime > indexed_time: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
162 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
163 # The file has changed, delete it and add it to the list of |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
164 # files to reindex |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
165 log.debug('adding to reindex list %s' % indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
166 writer.delete_by_term('path', indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
167 to_index.add(indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
168 #writer.commit() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
169 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
170 # Loop over the files in the filesystem |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
171 # Assume we have a function that gathers the filenames of the |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
172 # documents to be indexed |
411
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
407
diff
changeset
|
173 for repo in scan_paths(self.repo_location).values(): |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
174 for path in self.get_paths(repo.path): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
175 if path in to_index or path not in indexed_paths: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
176 # This is either a file that's changed, or a new file |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
177 # that wasn't indexed before. So index it! |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
178 self.add_doc(writer, path, repo) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
179 log.debug('reindexing %s' % path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
180 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
181 writer.commit(merge=True) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
182 #idx.optimize() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
183 log.debug('>>> FINISHED <<<') |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
184 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
185 def run(self, full_index=False): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
186 """Run daemon""" |
465
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
452
diff
changeset
|
187 if full_index or self.initial: |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
188 self.build_index() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
189 else: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
190 self.update_index() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
191 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
192 if __name__ == "__main__": |
451
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
193 arg = sys.argv[1:] |
452
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
194 if len(arg) != 2: |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
195 sys.stderr.write('Please specify indexing type [full|incremental]' |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
196 'and path to repositories as script args \n') |
451
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
197 sys.exit() |
452
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
198 |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
199 |
451
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
200 if arg[0] == 'full': |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
201 full_index = True |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
202 elif arg[0] == 'incremental': |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
203 # False means looking just for changes |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
204 full_index = False |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
205 else: |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
206 sys.stdout.write('Please use [full|incremental]' |
452
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
207 ' as script first arg \n') |
451
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
208 sys.exit() |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
209 |
452
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
210 if not os.path.isdir(arg[1]): |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
211 sys.stderr.write('%s is not a valid path \n' % arg[1]) |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
212 sys.exit() |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
213 else: |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
214 if arg[1].endswith('/'): |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
215 repo_location = arg[1] + '*' |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
216 else: |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
451
diff
changeset
|
217 repo_location = arg[1] + '/*' |
451
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
443
diff
changeset
|
218 |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
219 try: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
220 l = DaemonLock() |
436
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
221 WhooshIndexingDaemon(repo_location=repo_location)\ |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
411
diff
changeset
|
222 .run(full_index=full_index) |
406
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
223 l.release() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
224 except LockHeld: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
225 sys.exit(1) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
226 |