# Roadrunner SMB — Support & Troubleshooting

**Last Updated**: 2026-04-15

---

## How to Get Help

**Support email**: support@roadrunnersmb.com  
**Response time**: best effort within 1 business day (early access period)

Before contacting support, generate a Support Report (see below). Support will request it at the start of any triage interaction.

---

## Generating a Support Report

The Support Report bundles all diagnostic information needed for triage.

1. Log in to the Admin UI
2. Click **Support** in the left navigation
3. Click **Generate Report**
4. Download the report file (`.zip`)
5. Attach it to your support email

The report includes:
- Software version and build
- Cluster state and node roster
- Configuration snapshot
- Last 72 hours of CloudWatch logs
- Diagnostic data

---

## Where Logs Come From

### CloudWatch Logs
All nodes write structured logs to the CloudWatch log group shown in your CloudFormation outputs (`CloudWatchLogGroup`). Each ECS task has its own log stream.

To access logs in the AWS console:
1. Go to CloudWatch → Log groups
2. Find the log group (e.g. `/rrsmb/prod`)
3. Each log stream corresponds to one ECS task

### Admin UI Logs page
The **Logs** page in the Admin UI shows logs from the node that handled your Admin UI request. For multi-node log viewing, use CloudWatch.

---

## Common First-Use Issues

### Admin UI is unreachable

**Symptom**: Browser cannot reach the Admin UI at the `AdminUIUrl` (port 443 via ALB in Marketplace mode, or port 8888 via NLB in single-node mode)

**Check**:
1. Is the machine you're using within the `AdminCidr` you specified? Check the ALB security group inbound rules for port 443 (Marketplace) or the NLB/security group for port 8888 (single-node).
2. Is the NLB provisioned? Check CloudFormation status — stack must be `CREATE_COMPLETE`.
3. Are all 3 ECS tasks running? Check ECS → cluster → service → tasks.
4. Are NLB target health checks passing? Check EC2 → Target Groups → `AdminTargetGroup` → Targets.

---

### Domain join fails during First-Time Setup

**Symptom**: FTS wizard shows "Join failed" or times out

**Check**:
1. **DNS**: Can the task resolve the AD domain? Check DHCP options or Route 53 Resolver rules.
2. **Reachability**: Can the task reach domain controller IPs on ports 88, 389, 445? Check security groups and NACLs.
3. **Credentials**: Is the AD join account correct and has permission to join computers?
4. **Computer name conflict**: if a previous deployment left a stale computer account in AD, delete it before re-joining.

---

### Share mount fails: "Network path not found"

**Symptom**: Windows Explorer shows error when browsing to `\\<NlbDnsName>\<sharename>`

**Check**:
1. Is the share name spelled correctly?
2. Is the share showing as active in the Admin UI Shares page?
3. Wait 2 minutes after share creation — the NLB health check may still be converging.
4. Try `\\<task-ENI-IP>\<sharename>` directly (bypassing NLB) to isolate whether the issue is NLB or Samba.

---

### Share mount fails: "Access denied"

**Symptom**: Windows shows access denied when opening the share

**Check**:
1. Is the user's AD account in the share's permission list? Check the Shares page → share → Permissions.
2. Is the user's domain in the expected format? Try both `DOMAIN\username` and `username@domain.com`.
3. Check CloudWatch logs for `ACL` or `AUTH` entries at the time of the access attempt.

---

### Storage: Disconnected shown in Admin UI

**Symptom**: Dashboard shows "Storage: Disconnected" for the cluster

**Cause**: This usually means one or more enabled shares have a missing EFS path. Commonly caused by leftover test shares whose EFS access points were removed.

**Fix**:
1. Go to **Shares** in Admin UI
2. Look for shares with a red/error indicator
3. Delete those shares
4. If the issue persists, check CloudWatch logs for `MATERIALIZE_ERROR` entries

---

### Cluster page shows a node as yellow/unhealthy

**Symptom**: One or more nodes show a yellow health state in the Cluster page

**What it means**: The node has temporarily suspended SMB traffic serving while it resolves a health check failure (CTDB recovery, identity check, etc.). The node will automatically recover when the underlying condition resolves.

**If a node is yellow for more than 5 minutes**:
1. Check CloudWatch logs for that task's log stream — look for `FENCE_REASON` or `probe_` log entries
2. Check AD/winbind health: look for `IDENTITY_NOT_READY` in logs
3. If the node is stuck, ECS will eventually replace it

---

### Cluster not forming (all nodes stuck at startup)

**Symptom**: All nodes appear to be starting but the Cluster page shows no healthy nodes

**Check**:
1. EFS mount: can nodes mount EFS? Check CloudWatch logs for EFS mount errors.
2. AD join: if the domain join wasn't completed via FTS, winbindd will not be healthy and nodes will not become eligible.
3. DynamoDB: is the reclock DynamoDB table accessible? Check IAM role permissions.
4. Check the `fence-agent` logs in CloudWatch for `PREFLIGHT_FAIL` or `INELIGIBLE` messages.

---

## Useful CloudWatch Log Queries

Find eligibility state changes (Logs Insights):
```
fields @timestamp, @message
| filter @message like "eligibility"
| sort @timestamp desc
| limit 50
```

Find CTDB events:
```
fields @timestamp, @message
| filter @message like "ctdb" or @message like "CTDB"
| sort @timestamp desc
| limit 100
```

Find ACL errors:
```
fields @timestamp, @message
| filter @message like "ACL_" and @message like "fail"
| sort @timestamp desc
| limit 50
```

---

## Escalating to Support

When emailing support@roadrunnersmb.com, include:

1. **Support Report** (generated from Admin UI)
2. **CloudFormation stack name** and region
3. **Description** of the issue: what you did, what you expected, what happened
4. **Time** of the first occurrence (UTC preferred)
5. Any **error messages** shown in the Admin UI or on the Windows client

The more context you provide, the faster we can diagnose. The Support Report contains most of what we need.
