Business Continuity - How can my app agency help us prepare?
We've all heard recently of the largest IT outage in history, when CrowdStrike deployed a content update that caused widespread disruption across a range of sectors - banking, aerospace, rail, telecommunications, live broadcasting, medical care, and retail to name a few. It's raised a lot of questions around why a single company has the ability to cause so much disruption, if cybersecurity tools should be embedded into a system in such a way that corrupted config can cause the system to break, and what businesses can do to prevent disruption in the future. Even before the recent outage, Peslo has been considering how to help clients with business continuity so when the unexpected happens, they can move quickly to reduce disruption.
Processes & Communication
When something happens that stops your business from operating, you need to move fast to solve it. Consider:
- Do you know who you can contact within your business / external suppliers for help? What forms of communication can you use - if your emails become inaccessible, do you have alternative means to reach out?
- Is your staff / client list only kept in one location? What if that becomes unavailable - how can you reach out to others?
- Are there agreed processes for starting communications about urgent issues - for example a shared Slack channel for an on-call WhatsApp group?
- Do your on-call staff know how to triage issues, and when to reach out for additional support when needed? Is this process documented so everyone operates in the same way?
Whilst processes are fundamental to the smooth resolution of issues, take a moment to think about if your team would actually follow them in the heat of the moment - because if the processes won't be followed then relying on them is futile. Work out why they won't be followed, and make steps to resolve those issues.
We love processes and working out how to make businesses run smoothly - our partner agency Elevate works closely with us to support startups and small businesses implement and improve their processes - reach out if you want to hear more!
Backups / Recovery
Take a moment to think - when was the last time you made a backup of your critical and non-critical systems? Was it within the last 24 hours, within the last month, within the last year - or not at all? If an issue was to impact your systems so widely that the only way to continue was to revert from a backup, what would that mean for your business?
What would the impact be if:
- Your cloud hosting provider lost data - would you lose important customer data / be able to recover the platform elsewhere?
- Your version control provider had an outage or lost repositories - would you lose your company's code, or would staff be prevented from working during the outage?
- Your HR system went offline for a few days - would you still be able to contact staff?
- A key employee's laptop was lost or stolen - how much data would be lost or in the hands of a malicious actor?
It's important to never rely on a single service for your business operations, and to consider how you'd recover if a system was to fail.
We work closely with our clients to understand what should be in place to protect them, from automated backups within their cloud hosting provider, to built-in redundancy within critical systems to safeguard customer data, and MDM tools to protect devices when off premises. Certain sectors, such as medical or finance, may also have regulatory requirements to prevent loss of customer data.
Security
One of your company's obligations is to protect client & customer data from loss, exposure, or incorrect changes, so you should consider how you would accomplish these remotely if your offices suddenly need to close. For example, if you're providing laptops to staff, are you using the available built-in protections to secure data, such as disk encryption or MDM systems that allow you to remotely lock & track the device? Are staff storing data only on their individual devices, or do they use company-provided cloud systems instead? Is your team aware of common cybersecurity threats, and are they regularly tested to ensure they're not getting complacent?
We'd recommend initially taking a look at the systems you use, and exploring the security options available to you, such as two factor authentication (2FA) and enforcing password requirements.
Customer Support
If something goes wrong, even if it's out of your own control, your customers may lose trust in your organisation if you seem unprepared or unable to resolve it quickly. Consider how you'd reach out to your customers in the event of an outage, what your messaging would look like, and how often you'd choose to communicate updates. Too much communication risks distracting your team, and too little communication risks customers individually reaching out to check in or users moving elsewhere.
We work with our clients to implement automated services to inform both internal teams and clients when an issue occurs, such as public status pages that show if a service is available & if not, what is the current status & what updates are available.
Analytics / Error Reporting
If you don't have reporting in place to tell you, would you know the instant something was broken, or would you be waiting for your customers to tell you there was an issue? By being proactive, you set a positive impression and show that customers can trust you to look after them - by not realising when there's an issue, things may escalate and you may be left with a larger situation to resolve.
We focus on building analytics & error reporting tools into our client's projects, so that if something goes wrong or doesn't work as expected it can be prioritised for resolution & the impacted customers can be proactively contacted. One key example of this is in Perception, where we have used analytics and error reporting to uncover why some users struggled importing events from their car. We found that some imports failed as the drive was not being selected (which was improved with refreshed messaging), some failed as users were expecting to use cloud storage (which was added in as a new feature), and some failed as events were partially overwritten by the car when space was low on the USB drive (also improved with new messaging) - without our error reporting these issues would have been harder to understand and would have resulted in increased churn.
Fallbacks if things go wrong
One of the key issues with the CrowdStrike outage in July 2024 was that computers would continuously show the 'blue screen of death' - the code did not have a fallback state that left the system in a useable state.
Within the platforms we build for our clients, we work to understand the potential error cases & how we can guard against them causing a negative impact. For example, if a network request submitting user data to a system failed, we keep hold of the user data and allow the attempt to be retried; or if we were unpacking new app content on a device and found it lacked space, we could show messaging to the user or fallback to previous content. Ultimately, we ensure that when something goes wrong, it can be easily resolved with a retry or by following some simple in-app messaging. For example within the VIKIN platform apps, if a content update is unable to be unpacked due to a device issue, there is a way to retry the action & the error is reported to our internal systems.
In summary, it is critical that startups consider how they would react to an unexpected event - ensuring processes are in place, implementing automated systems that notify when an issue has occurred, managing communication with customers, and providing ways for the system to fallback to a safe state when an issue occurs.
Want to hear more about how Peslo can help? Reach out here!








