8 Eureka optimization techniques to increase efficiency by 100 times

8 Eureka optimization techniques to increase efficiency by 100 times

1. Eureka's self-protection

After the service is registered in Eureka. By default, Eureka sends a heartbeat every 30s (default). If Eureka does not receive a heartbeat for a period of time (default 90s), the service will be removed. But sometimes the service is normal, but the heartbeat is not sent to Eureka due to abnormal network jitter. If Eureka removes the service at this time, when the network returns to normal, the service will not re-register with Eureka (the service will only register with Eureka when it starts. ). The service is not accessible through Eureka.

In order to prevent this kind of manslaughter, Eureka provides a self-protection mechanism: Eureka will trigger if the number of heartbeats received from the server within 15 minutes is less than the total number of heartbeats that Eureka should have received * the self-protection threshold (default 0.85). This mechanism is turned on by default. Exit the self-protection after the network is restored.

The overall idea is: I would rather keep the unhealthy, than blindly write off any healthy services.

For example, if we have 10 servers, under normal circumstances, 10 * (2 * 15) = 300 heartbeats (every 30 seconds) should be sent to Eureka in 15 minutes, but the heartbeat received by Eureka is less than 300 * 0.85 = 255. Trigger self-protection. There are two possibilities to not send a heartbeat.

  1. Eureka did not receive the heartbeat of the service due to reasons such as service failure or internet. Continue to send heartbeats after the network is restored.
  2. The service hung up and it was too late to go offline. When will it be removed from the registration list? It is to wait for those services that are really protected due to abnormal network jitter to resend their heartbeats.
  • Then when to open self-protection and when not to open it?
  1. Personally, I will think a little bit, and I will start to protect the ones with more services, and the ones with less services will not.

Because more than 15% of the services do not receive a heartbeat, network problems are more likely. However, the service is less than 15% without heartbeat, and the service is more likely to be suspended. If the suspended service is protected, an error will be returned to the client 2. Of course, in order to ensure the robustness and stability of the online system, it can be under any circumstances Turn on self-protection.

The self-protection configuration is as follows:

      enable-self-preservation: true
      renewal-percent-threshold: 0.85

2. Fast offline

Eureka Server will create a scheduled task when it starts, and at regular intervals (60 seconds by default), services that have not renewed over time (90 seconds by default) are removed from the current service list. We can set the time interval of scheduled tasks to be shorter to achieve quick offline. Prevent unavailable services from being pulled.

     eviction-interval-timer-in-ms: 3000//3s

3. Cache optimization

In order to avoid concurrent conflicts caused by reading and writing memory data structures at the same time, Eureka Server adopts a 3-level cache mechanism to further improve the response speed of service requests. The steps to pull the registration form are:

  1. First check the cached registry from ReadOnlyCacheMap.
  2. If not, find the registry cached in ReadWriteCacheMap.
  3. If not, get the actual registry data from memory.

When the registry changes, update the registry data and the data cached in the ReadWriteCacheMap first, and update the data in the ReadWriteCacheMap to ReadOnlyCacheMap after 30s by default. To increase the speed at which services are discovered. We can make some settings.

  1. When the service is pulled, it is not checked from the ReadOnlyCacheMap, but directly from the ReadWriteCacheMap.
	use-read-only-response-cache: false//ReadOnlyCacheMap 
  1. Shorten the time interval between ReadWriteCacheMap and ReadOnlyCacheMap synchronization, the default is 30 seconds, we can optimize to 3 seconds, this depends on our own situation.
	response-cache-update-interval-ms: 3000

When looking at the source code here, I found a problem with the code:

if (shouldUseReadOnlyResponseCache) {
                    new Date(((System.currentTimeMillis()/responseCacheUpdateIntervalMs) * responseCacheUpdateIntervalMs)
                            + responseCacheUpdateIntervalMs),

Why is System.currentTimeMillis() divided by responseCacheUpdateIntervalMs and then multiplied by responseCacheUpdateIntervalMs, isn't this still the original System.currentTimeMillis()?

The practical timer also has hidden dangers, that is, when multiple threads are processing timed tasks in parallel, when the timer runs multiple timetasks, as long as one of them does not catch the thrown exception, the other tasks will automatically terminate. Can be changed to use ScheduledExcutorService.

4. Tips for client development

When we develop the client, if we do not start the registry, we will always report the error of the registry link timeout. We can make the following configuration during development to decouple the service from the registry.

    enabled: false

5. The client pulls the registry more instantaneously

The api-client will periodically go to eureka-server to pull the registry. By default, it is pulled every 30 seconds. The pull time interval can be set according to the actual situation.

   registry-fetch-interval-seconds: 3

6. client.serviceUrl.defaultZone optimization

The api-client pulls registry information from eureka-server in the order of defaultZone configuration, and then retrieves/registers from eureka2 when eureka1 is unavailable. But if eureka1 never hangs. All microservices will get information from eureka1 first, which leads to excessive pressure on eureka1. In actual production, each microservice can be randomly configured with a different defaultZone sequence. Do load balancing manually. For example, the defaultZone of clientA is eureka1, eureka2, and eureka3; the defaultZone of clientB is eureka2, eureka3, and eureka1.

      defaultZone: eureka1,eureka2,eureka3

7. Client heartbeat frequency

By default, the client sends a heartbeat to the server every 30 seconds. This time can also be appropriately adjusted down.

    ## 30s 
    lease-renewal-interval-in-seconds: 30

8. The time interval for the server to remove the client

By default, the server did not receive the client's heartbeat within 90s and kicked me out. In order to make the service respond quickly, you can appropriately change this time to a smaller amount.

    lease-expiration-duration-in-seconds: 90

Eureka Other Questions

Where is consistency not achieved? That is, the C in CAP.

  1. The self-protection mechanism can pull the registry to call even if the network is not good.
  2. Not implemented during cache synchronization. When we optimized the cache above, we found that the data between ReadOnlyCacheMap and ReadWriteCacheMap did not achieve consistency.
  3. Pull the registration form from other peers. The state between the clusters is synchronized asynchronously, so there is no guarantee that the state between the nodes must be consistent, but the final state is basically guaranteed to be consistent.

Cluster synchronization, the cluster did not expand Eureka did not expand its endurance, but to achieve availability.

Under what circumstances will the data be synchronized? We analyze from the following nodes.

  1. Registration: The first node is registered and only the next node is synchronized.
  2. Renewal: There is a new service renewal, automatically synchronized to other Eureka-Server.
  3. Offline: Synchronize all clusters all the time.
  4. Elimination: Not synchronized, each server has its own elimination mechanism.

Estimate how much service you can afford

For example, there are 20 services and 5 instances are deployed for each service. That is 20 * 5 = 100 instances.

  1. By default, an instance sends a heartbeat every 30 seconds, and pulls the registration form every 30 seconds. The number of requests received by the Service per minute is that. 100 * 2 * 2 = 400 times. The amount that can be tolerated that day is 400 * 60 * 24 = 576,000 requests. That is, more than 5 million visits per day.

Therefore, by setting an appropriate frequency of pulling the registry and sending heartbeats, it can be ensured that the request pressure on Eureka Server in a large-scale system will not be too great.

The problem in production, when the service is restarted, it can still be accessed, but the service error is returned

When the service starts, you must stop the service first, and then manually trigger the offline. If you do not manually log off, you may access the restarting service. And this service is not available. If you manually log off first, you may also pull the restarted service, and the manual logoff is invalid.

Regional issues

When the number of users is relatively large, our services may be deployed in different areas and different computer rooms. If we want the services in the same computer room to call the services in the same computer room when we launch the microservices online, when the services in the same computer room cannot be used to call the services in other computer rooms. Similar to CDN. This can reduce network latency. Eureka provides 2 concepts to partition.

  1. region: equivalent to a region, such as the Beijing region.
  2. Zone: It is a subordinate unit of the region, such as Beijing Computer Room A and Computer Room B.

After that, I will write so much. If you have any questions, you can discuss it.

The content has been included in the headline "Programmer Book Club", 8 Eureka optimization techniques to increase efficiency by 100 times