introduction
Speaking of redis
data structures, you may be familiar with the five basic data types: String
, Hash
, List
, Set
, Sorted Set
. Then addition, there are three major derived data structure, we usually are little contact, namely: bitmaps
, hyperloglog
, geo
Also, I think, these three data structures, can only be icing on the cake. Really in the project, I really haven't used it. Let's take a look at the definition and use of these three data structures
bitmaps
definition
Speaking of this bitmaps
, it is actually String
, but it can operate on String
the bits. And then, this bit operations, has its own command, and so the operation String
of the redis
command not much different! Can be understood like this
Bitmaps is an array in units of bits, each unit of the array can only store 0 and 1
Here is an example, for example, if we want to do a set operation, key
for w
, value
for h
, then execute the following command
127.0.0.1:6379> set w h
OK
127.0.0.1:6379> get w
"h"
Then h
the ASCII is 0110 1000
Next, you can use the bit command getbit
command to fetch, fetch the content of each bit.
127.0.0.1:6379> getbit w 0 # getbit w 0
(integer) 0
127.0.0.1:6379> getbit w 1 # getbit w 1
(integer) 1
127.0.0.1:6379> getbit w 2 # getbit w 2
(integer) 1
127.0.0.1:6379> getbit w 3 # getbit w 3
(integer) 0
use
According to online rumors, this structure is used to count the number of active users in a certain period of time, and the structure used bitmap
is less set
space- saving than the traditional structure. However, this use has great limitations, as I will talk about later. Let me talk about the online statement. Suppose there are 30 users, of which 5 users 2018-10-04
log in on this day. Suppose the userid of these 5 users is 2, 4, 8, 11, 12. Then, we assume that the key is users:2018-10-04
, and its value
value is used to record user login information. So in order to record that the above 5 users have logged in, we value
set the 2nd, 4th, 8th, 11th, and 12th bits of the value to 1, that is, execute the following command
127.0.0.1:6379> setbit users:2018-10-04 2 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 4 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 8 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 11 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 12 1
(integer) 0
At this time, for example, if you want to determine the user with userid=11, on 2018-10-04, if you have logged in, execute the following command
127.0.0.1:6379> getbit users:2018-10-04 11
(integer) 1
The result is 1, which means the user has logged in. If the returned result is 0, it means that the user has never logged in. If you want to count, 2018-10-04, the number of users logged in on this day, you can execute the following command
127.0.0.1:6379> bitcount users:2018-10-04
(integer) 5
The above command can be counted, 2018-10-04, 5 users logged in on this day. Ok, you won't be able to understand much if you find it here. Let me talk about it first, here userid=2 4 8 11 12
, it can be understood as an offset. For example, if the userid in the actual project is 1000002, then the offset is 2. Everyone can be flexible in the project. However, this approach has a limitation. In our actual project, if the userid is generated using uuid, then how do you generate the offset based on these userid? Could it be that you have to find a hash function to generate an offset? Even if the corresponding hash function is found, can you ensure that the hash collision will not occur, resulting in the same offset? So, everyone can understand.
HyperLogLog
definition
HyperLogLog
It is not a data structure, but an algorithm that can use a very small memory space to complete the statistics of independent totals . In fact, everyone may be relatively unfamiliar with this algorithm. We java
have a library called stream-lib
, which also implements HyperLogLog
algorithms. I probably talk about the principle of the algorithm . I don't want to go to a long-formed math paper. Everyone looks boring. This Hyper
refers to the super meaning. Its previous life is the LogLog algorithm. Here, I m just a little bit stunned, so that everyone can understand the essence. Suppose there is the following dialogue
Me: "Small song, suppose, I tossed 5 coins in a round, and after many rounds, I found that in these rounds, there were at most 2 consecutive negatives and 1 positive. Can you guess how many rounds I lost? "Small tune: "It shouldn't be a few rounds, at most seven or eight rounds." Me: "Fuck, so witty, how do you count?" Small tune: "It's very simple, the pros and cons are both 1/2, even Focusing on the second negative side and the first positive side. Isn t it 1/2 1/2 1/2!" I: "What if there are 4 negative sides and 1 positive side at most?" Xiao tune: "It should be many rounds, right? !" Me: "It's really witty!"
The above chat is between myself and my colleague, and they play each other every day! Any similarity is purely coincidental!
Well, the principle is over! It's just that his estimation algorithm is more complicated! It's not that simple! And with this estimate, the error is still relatively large! The pseudo code of the algorithm is given below.
max = 0
hashCode = hash( )
num = hashCode 0
if num > max:
max = num
2 (max + 1)
It should be noted
hashCode = hash( )
It is to map your input element into a long binary string. After mapping it into a long binary string, it can be compared to the result of the coin toss I mentioned first. As for why the final result is used (max+1)
, you can check the literature. After all, this article is talking about redis
, not about this algorithm. Moreover, this algorithm has undergone a series of evolutions. For example, the input parameter set is divided into m parts, and then m
the results of each part are averaged (avg)
, and finally the (avg + 1)
independent total is estimated by the power of 2 ! Those readers who are interested can make inquiries by themselves!
use
This structure can save memory to count various counts, such as the IP
number of registrations and the number of daily visits IP
. Of course, there are errors! The official number given by Redis is 0.81% error rate. The usage is also very simple as shown below
127.0.0.1:6379> pfadd ips:2018-10-04 "127.0.0.1" "127.0.0.2" "127.0.0.3" "127.0.0.4"
(integer) 1
127.0.0.1:6379> pfcount ips:2018-10-04
(integer) 4
The above is the demonstration. On 2018-10-04, about 4 ips logged into the system! There is a comparison chart of the occupied space with the traditional collection structure on the Internet, post it for everyone to see
Attention, once again, there are errors in using this structure! For example, if you pfadd
enter a million pieces of data, pfcount
the result may be 999,756!
Geo
definition
Geo can be used to store latitude and longitude, calculate the distance between two places, range calculations, etc. The underlying implementation is zset.
use
There are mainly the following six groups of commands
geoadd
: Add the coordinates of a certain geographic location.geopos
: Get the coordinates of a certain geographic location.geodist
: Get the distance between two geographic locations.georadius
: Obtain a set of geographic locations within a specified range according to the given geographic location coordinates.georadiusbymember
: Get a collection of geographic locations within a specified range according to a given geographic locationgeohash
: Get the geohash value of a certain geographic location.
I ll post an example of the official website document directly here. If you are interested, you can check it yourself. 1. add two coordinates to the key.
redis> GEOADD Sicily 13.361389 38.115556 "Palermo" 15.087269 37.502669 "Catania"
(integer) 2
2. calculate an example between two coordinates
redis> GEODIST Sicily Palermo Catania
"166274.15156960039"
Finally, calculate the distance from the latitude and longitude (15,37)
distance 100km
and 200km
the coordinates within the range
redis> GEORADIUS Sicily 15 37 100 km
1) "Catania"
redis> GEORADIUS Sicily 15 37 200 km
1) "Palermo"
2) "Catania"