Suitable Status - Service instability – Incident details

All systems operational

Service instability

Resolved
Operational
Started 6 months agoLasted 10 minutes

Affected

Sistema Suitable

Operational from 1:10 AM to 1:20 AM

Online store

Operational from 1:10 AM to 1:20 AM

Waiter

Operational from 1:10 AM to 1:20 AM

Impressions

Operational from 1:10 AM to 1:20 AM

Whatsapp robot

Operational from 1:10 AM to 1:20 AM

Integração Ifood

Operational from 1:10 AM to 1:20 AM

Updates
  • Update
    Update

    Further investigation showed that the issue was caused by one of the menu's bulk editing services using excessive resources.

    Once identified, the development team applied changes to prevent excessive resource usage from happening again.

    We identified that this issue could have been avoided by activating some specific alerts.

    In order to improve system stability and performance, and ensure that situations like the ones above do not occur again, we have implemented several important technical changes, covering both our server-level services and our programming and database services. The following improvements have been made to address recent instabilities:

    • We implemented new monitoring panels on the server to identify and analyze services that consume more resources, in addition to detecting areas with a higher volume of calls.

    • We have added new protection for services that require multiple changes.

    • We’ve added more detailed runtime logs to critical system services. This allows us to better monitor which processes are taking the longest to execute and prioritize improvements.

    • We improved data organization, moving some information to a specialized reading system, which ensures greater agility, especially during peak times.

    • We made adjustments to the server, separating the heaviest tasks (such as integrations with delivery platforms) and the most accessed records, to ensure that the system works faster and more efficiently.

    These are the actions we have taken to ensure that the system remains as efficient and stable as possible. We are continually investing in improvements to prevent similar issues from occurring in the future.

  • Update
    Update

    Our infrastructure team worked over the weekend to investigate the causes of the instability that occurred. We are committed to resolving the issue definitively and ensuring system stability.

    In our next update, which will be published tomorrow, we will share more details on the actions we have taken and the next steps we are taking.

    We are working to resolve the issue at its root cause.

  • Resolved
    Resolved

    According to reports from practically all customers and tests carried out by the operations team, after the cleaning carried out by the infrastructure team, the system has all services completely restored.

  • Monitoring
    Monitoring

    After the procedure by our infrastructure team, our services were reestablished according to tests performed by the operations team and customer reports.

  • Identified
    Identified

    Our infrastructure team identified an issue with one of our Google-hosted services that could be causing the slowdown and implemented a quick cleanup solution to restore the services.

  • Investigating
    Investigating

    Some customers reported that the system was slow.

    We confirmed that the problem was indeed occurring and activated our infrastructure team to analyze what happened.