How the program terminates suddenly, resulting in data loss

logs are recorded before each operation to prevent data loss caused by interruptions.

do the following:
A log
B redis count + 1
C clear log

1. If the execution of An is interrupted, check the log after startup, and you can perform operation B again without data loss.
2. If the execution of b is interrupted, then b will be executed one more time after log recovery. Although the data is not lost, it will result in duplicate data.

the question was originally a power outage, but it may have been answered around the power outage. In fact, the probability of power outage is quite small, mainly due to data loss caused by sudden termination by the user during the execution of the application, or data loss caused by the program crash, etc.

< hr >

look at the comments and some people carry them.
in fact, many databases have similar mechanisms to prevent data loss caused by a sudden abort.

1, for example, the main role of leveldb: log file in LevelDb is to ensure that data will not be lost when the system is restored from failure. Because before the record is written to the Memtable in memory, the Log file will be written first, so that even if the system fails and the data in the Memtable does not have time to Dump the SSTable file to the disk, LevelDB can recover the contents of the Memtable data structure in memory according to the log file, without causing the system to lose data. LevelDb and Bigtable are consistent at this point.

related articles: https://www.cnblogs.com/haipp...

2, and elasticsearch:

if we do not use fsync to transfer data from the file system cache brush (flush) to the hard disk, there is no guarantee that the data will survive a power outage or even a normal exit of the program. To ensure the reliability of Elasticsearch, you need to ensure that data changes are persisted to disk.

related articles: ide/cn/elasticsearch/guide/current/translog.html" rel=" nofollow noreferrer "> https://www.elastic.co/guide/...

3, and rabbitmq"s prevention of message loss and repeated consumption:

related articles: https://www.jianshu.com/p/5ad...

Jul.09,2022

in general, you really don't need to consider breakpoints and data loss caused by downtime. However, if the consistency requirements are very strict, you should still consider, such as bank transfer, Taobao Wechat, etc., software upgrades (only half of which may cause the program to not start).

it is definitely necessary to keep a journal in this situation, and how to prevent the second repeated execution you mentioned, I think the key point is not to prevent repeated execution, but that repeated execution will not bring side effects. For example, replacing a file is an operation with no side effects. You replace the original B with A, no matter how many times you copy and paste, the result is the same, and the final file is A. Adding and deducting money directly to the account has side effects, and adding 1 to the count in your example also has side effects. In this case, the operation must be modified, usually split into more steps, each step must have no side effects, and then each step must be logged.

for example, in your example, I can split it like this:

  1. Lock, start the operation and log
  2. read the redis current count and log
  3. count + 1
  4. write back the count and log
  5. operation completed, release lock

the whole operation is atomic because of the lock. If there is a problem with any of the steps, it can be checked out the next time you resume execution. If something goes wrong in the second step, you just need to simply start all over again. If there is a problem in the third step and the fourth step has not yet been implemented, then there will only be logs in the second step. After re-execution, the current redis count will be read. If it is the same as that recorded in the log, it will be executed again from the third step. If it is 1 larger than the logging step, it means that the third step has already been implemented. Just start from the fourth step.


After thinking about it all night, I found that there was no solution to the problem.
think about why you want to consider scenarios where programs hang up frequently. Occasional power outages and low-probability data loss should be tolerable.

the most important thing is to prevent this from happening.


this is not an area that should be considered by software alone. You should consider the hardware. If the hardware does not provide support, there will be omissions in how the software handles it, because this is an unexpected situation.

usually the computer room will provide temporary power supply after a power outage to ensure that you have time to process data, and a single host can also be guaranteed by adding TPS.


add a voltage regulator that can store power in the middle


there are many solutions to recover data after a power outage, but as far as the current situation is concerned, most of the data servers are either hosted by various cloud service providers, or the servers bought by their own companies are hosted in local places that specialize in managing servers. Or it is up to the operation and maintenance personnel of our own company to maintain the server (the latter two cases are generally considered by operation and maintenance), so this kind of problem, as software development, we need to pay more attention to, after the power is restored, how to deal with other problems such as data inconsistency in the process of power outage.

Menu