dbTalk Databases Forums  

Sphinx indexing problem

comp.databases.postgresql comp.databases.postgresql


Discuss Sphinx indexing problem in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mladen Gogala
 
Posts: n/a

Default Sphinx indexing problem - 05-23-2010 , 05:22 PM






I am trying to create a Sphinx index on a fairly large Postgres table. My
problem is the fact that the Postgres API is trying to put the entire
result set into the memory:

[root@medo etc]# ../bin/indexer --all
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/usr/local/etc/sphinx.conf'...
indexing index 'test1'...
ERROR: index 'test1': sql_query: out of memory for query result
(DSN=pgsql://news:***@medo:5432/news).
total 0 docs, 0 bytes
total 712.593 sec, 0 bytes/sec, 0.00 docs/sec
total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 0 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg

Corresponding log entries on the Postgres side are:

STATEMENT: SELECT segment_id,air_date,start_time,end_time,source_typ e,
market_name,station_name,program_name, content_text FROM news_segments
LOG: unexpected EOF on client connection
LOG: unexpected EOF on client connection
LOG: unexpected EOF on client connection

The table I am using is pretty large:

news=> select count(*) from news_segments;
count
----------
14366286
(1 row)

Time: 233759.639 ms

Is there anything I can do to prevent the API from attempting to put the
entire query result in memory? I can partition the table, create
separate indexes and merge them, but that is a large unnecessary
maintenance. I also suspect that the other queries with a large result
set will start to fail.

I temporarily solved my problem by using "range query" option offered by
sphinx:

sql_query_range = \
SELECT min(segment_id),max(segment_id) FROM news_segments
sql_range_step=10000
sql_query = \
SELECT
segment_id,air_date,start_time,end_time,source_typ e, \
market_name,station_name,program_name,
segment_text \
FROM news_segments \
WHERE segment_id>=$start and segment_id<$end

Segment_id is a numeric field and the query will be executed many times,
which is less than optimal. It does make the thing work, though. Would it
be possible to set maximum memory for the query result caching from the
API itself?

--
http://mgogala.byethost5.com

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.