How to design a lightweight user autocomplete system

70 1

I've recently designed a system that needs to have a user option to design the autocomplete mode of the automatic association. There are some trouble in designing data structures. Currently, I'm going to do this directly with MySQL 's LIKE, but if the user and traffic grow, it's very slow.

I used a redis cache, and a friend with redis should read that with redis to achieve autocomplete. But I'm going to say that this article isn't my problem. Because it can only complete the keyword association function, but its keyword is stateless, there's no index marking its id. In other words, it can't complete the association process of the keyword and user.

Another disadvantage is that the cost of the index is too high, and to do a word segmentation for each word, then it's stored in the cache. And the user name can be modified, and I think the cost of rebuilding the index is too much.

I tried another scheme, using the intersection of redis to search for search, which is also a popular approach. For example, a user name is called Hello, with php code showing how to handle this user name, which is for convenience, and I only deal with the english alphabet

$user_name = strtolower('Hello');
$user_id = 1001;
$len = strlen($user_name);
for ($pos = 0; $pos <$len; $pos ++) {
 $key = md5($user_name[$pos]);
 $redis->sAdd('user_name:'. $key. ':'. $pos, $user_id);
}

In the above code, I hash each letter with md5 and then make a redis index with his position value $pos. So if all the user names are indexed in this way, we can use the intersection function of redis to do the search.

$keyword = strtolower('he');//测试的关键词
$indexes = array();//索引集合
$len = strlen($keyword);
for ($pos = 0; $pos <$len; $pos ++) {
 $key = md5($keyword[$pos]);
 $indexes[] = 'user_name:'. $key. ':'. $pos;
}
//得出结果仅需一步, 求它们的交集
$result = $redis->sInter($indexes);

But there are a few flaws.

  • I think it's right, because I just want to implement a user name automatic association instead of search. Such systems also maintain a large set of indexes that will be removed every time the user name is updated and then inserted into a new one.
  • Can't do LIMIT and OFFSET, how to turn the page, if the intersection is very large, it isn't to break the system.
  • The function of fuzzy query is weak and can only deal with words. But this isn't the point. It's a good way to solve the above two points.

I don't know what you've.

2 Answers

76 4

Antirez 's method isn't the same as you, he is done automatically, so only one value is required, and you need more than one value, and it isn't guaranteed that multiple values. And you need to be able to change it, use that one.

And you're too much complexity. The complexity of sinter is o ( n * m ), and n represents the minimum set length, m represents the number of sets. When user input is shorter, such as he, you m = 2, n and n should be larger as the amount of data grows. M and n will become very large when users enter a long number of lengths. So I think it isn't possible.

My suggestion is to use a structure similar to the trie tree, and it's easy to use redis. Cut all usename into multiple prefixes. Like"foo,"cut.

f
fo
foo

Several prefixes, each of which maintains a set, followed by uname: the form of uid, so that you'll be able to check your uid.
So if there are two users:

100001:foo
100002:fou

Your storage is

f => foo:100001,fou:100002
fo => foo:100001,fou:100002
foo => foo:100001
fou => fou:100001

It's time to query a query. When you modify it, you've to delete the in all prefixes: uid. For example, foo 's user name changes to bar, and then the foo is removed, and the corresponding collection of the new name is maintained.

srem f foo:100001
srem fo foo:100001
srem foo foo:100001

In addition, if you replace the set with zset, score can be used to save heat, such as fou and foo on the top, if the user input f selects foo, then the score added to the f.

zincrby f 1 foo

That's where the user is always chosen.

111 1

http://github. com/huacnlee/redis-sear..

  • Not every word is indexed, except for each word in zset, but the repeated data will be merged, and I've the fuzzy query as the result of the word segmentation;
  • The ability to associate a user with some special keywords is the goal of my next stage ( plus ).

Memory usage:
89m, 7500 + actual data in the prefix matching index, 2086 + participle index, also with 1 million actual data in the actual data in the

...