How Scanning Works
A detailed look at the Brokenly crawl pipeline — from sitemap fetch to link status assignment.
Understanding the crawl pipeline helps you get the most out of Brokenly and explains why a given link has the status it has.
The Crawl Pipeline
1. Sitemap Fetch
Brokenly fetches your sitemap XML. If the URL is a sitemap index (a file pointing to multiple sitemaps), we fetch each sub-sitemap and merge the page list.
2. Page Crawl
Each page in the sitemap is visited and its HTML is downloaded. Brokenly tracks how many pages it has crawled so far — you'll see this in the live crawl progress banner.
3. Link Extraction
Every outbound <a> tag on the page is extracted. Brokenly classifies a link as an affiliate link based on known affiliate network URL patterns — Amazon Associates, ShareASale, CJ, Impact, and others — plus common affiliate redirect domains.
4. Link Verification
Each identified affiliate link is checked with an HTTP request. Brokenly follows redirects, records the final destination URL and the HTTP status code, and classifies the result:
2xx→ Healthy4xx/5xx→ Broken403specifically → Blocked- Redirects ending at the merchant's homepage → Redirects to Homepage
- Redirects ending at the merchant's search page → Redirects to Search
- Timeouts, rate limits, or unexpected errors → Could Not Verify
5. Amazon Availability Check
For links Brokenly identifies as Amazon products, an extra availability check runs to detect Out of Stock and Unavailable products that Amazon would otherwise return 200 OK for. See Amazon Health Check.
Crawl Duration
Crawl time depends on:
- Number of pages in your sitemap
- Number of affiliate links per page
- Response times of the affiliate networks being checked
Most sites finish a full crawl in a few minutes. You'll see live progress — pages crawled, links found, links checked — while it runs.
Plan Limits
Your plan sets a maximum number of links that can be checked per cycle (500 on Starter through to 50,000+ on Agency). If a crawl finds more links than your remaining quota allows, Brokenly checks as many as it can and carries the rest over to the next crawl so nothing is missed permanently.
What Brokenly Does Not Index
- Pages not listed in your sitemap
- Links inside iframes
- Links injected dynamically by JavaScript after page load
- Password-protected pages
If you need Brokenly to check pages that aren't in your sitemap, contact us.