Self-Healing Infrastructure Setup
AgencyStack supports a self-healing “buddy system” architecture where multiple servers monitor each other and automatically recover from failures.
How It Works
The buddy system uses a combination of DroneCI and custom scripts to:
- Monitor: Regularly check the health of other nodes in the cluster
- Detect: Identify when a node has failed or is experiencing issues
- Recover: Automatically restore or rebuild failing nodes
- Notify: Alert administrators of issues and recovery actions

Setup Instructions
Prerequisites
- At least two servers running AgencyStack
- Both servers must have public IP addresses or be mutually accessible
- DroneCI installed on all servers (included in AgencyStack by default)
Configuration Steps
- Enable DroneCI monitoring:
cd /opt/agency_stack
make enable-monitoring
- Configure buddy relationships:
Edit the /opt/agency_stack/config/buddies.json file:
{
"name": "server1.example.com",
"buddies": [
{
"name": "server2.example.com",
"ip": "192.168.1.2",
"ssh_key": "/opt/agency_stack/config/buddy_keys/server2.key",
"check_interval_minutes": 5,
"recovery_actions": ["restart", "rebuild", "notify"]
}
],
"notification_email": "admin@example.com",
"notification_slack_webhook": "https://hooks.slack.com/services/XXX/YYY/ZZZ"
}
- Generate and exchange SSH keys:
cd /opt/agency_stack
make generate-buddy-keys
# Then manually exchange keys between servers
- Start the buddy monitoring system:
make start-buddy-system
Recovery Actions
The system supports multiple recovery actions:
- restart: Restart failed services
- rebuild: Completely rebuild the server from scratch
- restore: Restore from the latest backup
- notify: Send notifications but take no action
DroneCI Integration
DroneCI runs monitoring pipelines that:
- Check server health metrics
- Run buddy system monitoring scripts
- Execute scheduled backups
- Perform security scans
You can view the DroneCI dashboard at https://drone.your-agency-stack-domain.com
Advanced Configuration
For more advanced configuration options, including custom monitoring checks and recovery scripts, see the Advanced Monitoring Guide.
Troubleshooting
If you encounter issues with the buddy system:
- Check the buddy system logs:
/var/log/agency_stack/buddy-system.log - Verify DroneCI is running:
docker ps | grep drone - Check connectivity between servers
- Review the DroneCI pipeline logs
For assistance, contact support@nerdofmouth.com.