annotate rhodecode/lib/indexers/__init__.py @ 662:373ee7031003 beta

fixed annotation bug, added history to annotation. multiple fixes for raw_id length removed unneded function from index daemon.
author Marcin Kuzminski <marcin@python-works.com>
date Sat, 06 Nov 2010 16:14:49 +0100
parents 05528ad948c4
children 341beaa9edba
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
1 import os
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
2 import sys
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
3 from os.path import dirname as dn, join as jn
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
4
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
5 #to get the rhodecode import
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
6 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
7
547
1e757ac98988 renamed project to rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 497
diff changeset
8 from rhodecode.config.environment import load_environment
629
7e536d1af60d Code refactoring,models renames
Marcin Kuzminski <marcin@python-works.com>
parents: 604
diff changeset
9 from rhodecode.model.hg import HgModel
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
10 from shutil import rmtree
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
11 from webhelpers.html.builder import escape
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
12 from vcs.utils.lazy import LazyProperty
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
13
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
14 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
15 from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
16 from whoosh.index import create_in, open_dir
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
17 from whoosh.formats import Characters
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
18 from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
19
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
20 import traceback
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
21
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
22
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
23 #LOCATION WE KEEP THE INDEX
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
24 IDX_LOCATION = jn(dn(dn(dn(dn(os.path.abspath(__file__))))), 'data', 'index')
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
25
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
26 #EXTENSIONS WE WANT TO INDEX CONTENT OFF
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
27 INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
28 'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
29 'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
30 'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
31 'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
32 'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
33 'yaws']
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
34
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
35 #CUSTOM ANALYZER wordsplit + lowercase filter
436
28f19fa562df updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents: 406
diff changeset
36 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
37
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
38
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
39 #INDEX SCHEMA DEFINITION
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
40 SCHEMA = Schema(owner=TEXT(),
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
41 repository=TEXT(stored=True),
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
42 path=TEXT(stored=True),
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
43 content=FieldType(format=Characters(ANALYZER),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
44 scorable=True, stored=True),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
45 modtime=STORED(), extension=TEXT(stored=True))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
46
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
47
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
48 IDX_NAME = 'HG_INDEX'
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
49 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
50 FRAGMENTER = SimpleFragmenter(200)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
51
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
52 from paste.script import command
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
53 import ConfigParser
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
54
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
55 class MakeIndex(command.Command):
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
56
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
57 max_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
58 min_args = 1
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
59
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
60 usage = "CONFIG_FILE"
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
61 summary = "Creates index for full text search given configuration file"
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
62 group_name = "Whoosh indexing"
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
63
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
64 parser = command.Command.standard_parser(verbose=True)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
65 # parser.add_option('--repo-location',
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
66 # action='store',
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
67 # dest='repo_location',
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
68 # help="Specifies repositories location to index",
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
69 # )
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
70 parser.add_option('-f',
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
71 action='store_true',
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
72 dest='full_index',
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
73 help="Specifies that index should be made full i.e"
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
74 " destroy old and build from scratch",
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
75 default=False)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
76 def command(self):
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
77 config_name = self.args[0]
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
78
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
79 p = config_name.split('/')
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
80 if len(p) == 1:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
81 root = '.'
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
82 else:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
83 root = '/'.join(p[:-1])
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
84 print root
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
85 config = ConfigParser.ConfigParser({'here':root})
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
86 config.read(config_name)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
87 print dict(config.items('app:main'))['index_dir']
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
88 index_location = dict(config.items('app:main'))['index_dir']
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
89 #return
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
90
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
91 #=======================================================================
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
92 # WHOOSH DAEMON
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
93 #=======================================================================
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
94 from rhodecode.lib.pidlock import LockHeld, DaemonLock
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
95 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
96 try:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
97 l = DaemonLock()
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
98 WhooshIndexingDaemon(index_location=index_location)\
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
99 .run(full_index=self.options.full_index)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
100 l.release()
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
101 except LockHeld:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
102 sys.exit(1)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
103
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
104
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
105 class ResultWrapper(object):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
106 def __init__(self, search_type, searcher, matcher, highlight_items):
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
107 self.search_type = search_type
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
108 self.searcher = searcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
109 self.matcher = matcher
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
110 self.highlight_items = highlight_items
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
111 self.fragment_size = 200 / 2
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
112
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
113 @LazyProperty
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
114 def doc_ids(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
115 docs_id = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
116 while self.matcher.is_active():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
117 docnum = self.matcher.id()
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
118 chunks = [offsets for offsets in self.get_chunks()]
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
119 docs_id.append([docnum, chunks])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
120 self.matcher.next()
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
121 return docs_id
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
122
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
123 def __str__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
124 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
125
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
126 def __repr__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
127 return self.__str__()
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
128
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
129 def __len__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
130 return len(self.doc_ids)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
131
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
132 def __iter__(self):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
133 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
134 Allows Iteration over results,and lazy generate content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
135
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
136 *Requires* implementation of ``__getitem__`` method.
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
137 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
138 for docid in self.doc_ids:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
139 yield self.get_full_content(docid)
406
b153a51b1d3b Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff changeset
140
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
141 def __getslice__(self, i, j):
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
142 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
143 Slicing of resultWrapper
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
144 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
145 slice = []
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
146 for docid in self.doc_ids[i:j]:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
147 slice.append(self.get_full_content(docid))
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
148 return slice
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
149
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
150
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
151 def get_full_content(self, docid):
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
152 res = self.searcher.stored_fields(docid[0])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
153 f_path = res['path'][res['path'].find(res['repository']) \
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
154 + len(res['repository']):].lstrip('/')
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
155
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
156 content_short = self.get_short_content(res, docid[1])
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
157 res.update({'content_short':content_short,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
158 'content_short_hl':self.highlight(content_short),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
159 'f_path':f_path})
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
160
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
161 return res
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
162
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
163 def get_short_content(self, res, chunks):
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
164
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
165 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
166
479
149940ba96d9 fixed search chunking bug and optimized chunk size
Marcin Kuzminski <marcin@python-works.com>
parents: 478
diff changeset
167 def get_chunks(self):
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
168 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
169 Smart function that implements chunking the content
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
170 but not overlap chunks so it doesn't highlight the same
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
171 close occurrences twice.
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
172 @param matcher:
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
173 @param size:
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
174 """
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
175 memory = [(0, 0)]
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
176 for span in self.matcher.spans():
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
177 start = span.startchar or 0
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
178 end = span.endchar or 0
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
179 start_offseted = max(0, start - self.fragment_size)
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
180 end_offseted = end + self.fragment_size
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
181
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
182 if start_offseted < memory[-1][1]:
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
183 start_offseted = memory[-1][1]
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
184 memory.append((start_offseted, end_offseted,))
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
185 yield (start_offseted, end_offseted,)
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
186
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
187 def highlight(self, content, top=5):
556
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
188 if self.search_type != 'content':
65b2f150beb7 Added searching for file names within the repository in rhodecode
Marcin Kuzminski <marcin@python-works.com>
parents: 547
diff changeset
189 return ''
478
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
190 hl = highlight(escape(content),
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
191 self.highlight_items,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
192 analyzer=ANALYZER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
193 fragmenter=FRAGMENTER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
194 formatter=FORMATTER,
7010af6efde5 Reimplemented searching for speed on large files and added paging for search results
Marcin Kuzminski <marcin@python-works.com>
parents: 436
diff changeset
195 top=top)
631
05528ad948c4 Hacking for git support,and new faster repo scan
Marcin Kuzminski <marcin@python-works.com>
parents: 629
diff changeset
196 return hl