Supervisor
Overview
The @supervisor
and @issue
decorators are part of a monitoring system that helps identify and automatically fix issues in the application. They work together to create a self-healing mechanism where:
@supervisor
functions detect potential issues in the system@issue
handlers attempt to fix those issues automatically
@supervisor Decorator
Purpose
The @supervisor
decorator is used to create monitoring functions that periodically check for potential issues or anomalies in the system. These functions yield messages when they detect problems, which are then recorded in the database.
Syntax
@supervisor(delta=timedelta(hours=6))
def supervise_something():
# Check for issues
if issue_detected:
# Simple message
yield "Something is wrong"
# Or with code and parameters for issue handlers
yield "Specific issue description", "issue-code", {"param1": value1, "param2": value2}
Parameters
delta
: Atimedelta
object that specifies how often the supervisor should run. For example,delta=timedelta(hours=6)
means the supervisor will run every 6 hours.auto
: Boolean that determines if the supervisor should run automatically (default isTrue
).raises
: Boolean that determines if the supervisor should raise exceptions (default isFalse
).
Return Value
A supervisor function should be a generator that yields:
- A simple string message describing the issue, or
- A tuple containing: 1. A string message, 2. An issue code, 3. A dictionary of parameters for the issue handler
Example
@supervisor(delta=timedelta(minutes=10))
def supervise_pending_bags_to_be_delivered():
"""
Supervisor to check for bags that are paid but not delivered.
This helps identify issues in the subscription/plan financing creation process.
"""
utc_now = timezone.now()
# Filter bags that are paid but not delivered, updated between 5 and 30 minutes ago
pending_bags = Bag.objects.filter(
status="PAID",
was_delivered=False,
updated_at__lte=utc_now - timedelta(minutes=30),
updated_at__gte=utc_now - timedelta(minutes=5),
)
for bag in pending_bags:
invoice = bag.invoices.filter(status="FULFILLED").order_by("-paid_at").first()
if invoice:
yield (
f"Bag {bag.id} for user {bag.user.email} in academy {bag.academy.name} has not been delivered",
"pending-bag-delivery",
{"bag_id": bag.id},
)
@issue Decorator
Purpose
The @issue
decorator creates handler functions that attempt to fix issues detected by supervisors. When a supervisor yields an issue with a specific code, the corresponding issue handler is triggered to fix the problem.
Syntax
@issue(supervisor_function, delta=timedelta(minutes=30), attempts=3)
def fix_specific_issue(param1, param2):
# Attempt to fix the issue
if issue_fixed:
return True # Issue is fixed
elif issue_cannot_be_fixed:
return False # Issue cannot be fixed
else:
return None # Issue is not yet fixed, will retry later
Parameters
supervisor_function
: The supervisor function that this issue handler is associated with.delta
: Atimedelta
object that specifies how often the issue handler should retry if the issue is not fixed.attempts
: The maximum number of attempts to fix the issue before giving up.
Parameters Passed to the Handler
The parameters passed to the issue handler function come from the third element of the tuple yielded by the supervisor function. These parameters should match the parameter names in the issue handler function signature.
Return Value
An issue handler function should return:
True
: The issue is fixedFalse
: The issue cannot be fixed (will not retry)None
: The issue is not yet fixed (will retry later, up to the specified number of attempts)
Example
@issue(supervise_pending_bags_to_be_delivered, delta=timedelta(minutes=30), attempts=3)
def pending_bag_delivery(bag_id: int):
"""
Issue handler for pending bag delivery.
This function is called when a bag is detected as paid but not delivered.
It will attempt to retry the delivery process.
"""
# Check if the bag still needs to be processed
bag = Bag.objects.filter(id=bag_id, status="PAID", was_delivered=False).first()
if not bag:
# Bag was already delivered or doesn't exist, mark as fixed
return True
# Call the function to retry the delivery
res = retry_pending_bag(bag)
# Return True if the issue is fixed
if res in ["scheduled", "done"]:
return True
# Return None to retry later
return None
How They Work Together
- The
@supervisor
function runs periodically (based on itsdelta
parameter) and checks for issues. - When an issue is detected, the supervisor yields a message, code, and parameters.
- The system records this issue in the database.
- The
@issue
handler associated with the supervisor (matching the yielded code) is triggered. - The issue handler receives the parameters yielded by the supervisor and attempts to fix the issue.
- If the issue handler returns
None
, it will be retried later (based on the handler'sdelta
parameter). - If the issue handler returns
True
, the issue is marked as fixed. - If the issue handler returns
False
, the issue is marked as unfixable.
Benefits
- Separation of Concerns: Supervisors focus on detecting issues, while issue handlers focus on fixing them.
- Automatic Healing: The system can automatically detect and fix common issues without manual intervention.
- Monitoring: Issues are recorded in the database, providing visibility into system health.
- Retry Logic: Issue handlers can retry fixing issues multiple times with configurable delays.
- Scalability: New supervisors and issue handlers can be added without modifying existing code.
Best Practices
- Keep supervisors focused on detecting specific types of issues.
- Make issue handlers idempotent (can be run multiple times safely).
- Use meaningful codes that clearly identify the type of issue.
- Include all necessary parameters in the supervisor's yield statement.
- Return appropriate values from issue handlers to indicate the status of the fix.
- Set appropriate
delta
values based on the urgency and frequency of the issue. - Document the purpose and behavior of each supervisor and issue handler.
Implementation Details
The supervisors and issues are stored in the database using the Supervisor
and SupervisorIssue
models from the breathecode.monitoring
app. These models track:
- When supervisors last ran
- What issues were detected
- Whether issues have been fixed
- How many times an issue has occurred
The system automatically schedules supervisors to run based on their delta
parameter and triggers issue handlers when issues are detected.
Creating New Supervisors and Issue Handlers
To create a new supervisor and issue handler:
- Define a supervisor function decorated with
@supervisor
that yields issues. - Define an issue handler function decorated with
@issue
that fixes the issues. - Make sure the code in the supervisor's yield statement matches the name of the issue handler function (with hyphens instead of underscores).
- Ensure the parameters in the supervisor's yield statement match the parameters expected by the issue handler.
Example Use Cases
- Detecting and fixing stalled processes
- Identifying and resolving data inconsistencies
- Monitoring system resources and taking action when thresholds are exceeded
- Checking for failed operations and retrying them
- Detecting security issues and taking remedial action
This monitoring and self-healing system helps maintain the health of the application by automatically detecting and fixing common issues, reducing the need for manual intervention and improving overall system reliability.