Further investigation showed that the issue was caused by one of the menu's bulk editing services using excessive resources.
Once identified, the development team applied changes to prevent excessive resource usage from happening again.
We identified that this issue could have been avoided by activating some specific alerts.
In order to improve system stability and performance, and ensure that situations like the ones above do not occur again, we have implemented several important technical changes, covering both our server-level services and our programming and database services. The following improvements have been made to address recent instabilities:
We implemented new monitoring panels on the server to identify and analyze services that consume more resources, in addition to detecting areas with a higher volume of calls.
We have added new protection for services that require multiple changes.
We’ve added more detailed runtime logs to critical system services. This allows us to better monitor which processes are taking the longest to execute and prioritize improvements.
We improved data organization, moving some information to a specialized reading system, which ensures greater agility, especially during peak times.
We made adjustments to the server, separating the heaviest tasks (such as integrations with delivery platforms) and the most accessed records, to ensure that the system works faster and more efficiently.
These are the actions we have taken to ensure that the system remains as efficient and stable as possible. We are continually investing in improvements to prevent similar issues from occurring in the future.