Introduction
When building web crawlers, competitive analysis, SEO audits, or AI agents, one of the first critical tasks is finding all the URLs on a website.
While traditional methods like Google search tricks, sitemap exploration, and SEO tools work, there’s a faster, modern way: using Olostep Maps API.
In this guide, we’ll:
- Introduce the challenge of URL discovery
- Show how to build a live Streamlit app to scrape all URLs
- Compare it with traditional techniques (like sitemap.xml and robots.txt)
- Provide complete runnable Python code
Target Audience: Developers, Growth Engineers, Data Scientists, SEO specialists, and Founders who need structured, scalable scraping.
Why Extract All URLs?
Finding every page on a website can help you:
- Analyze site structure (for SEO)
- Scrape website content efficiently
- Find hidden gems like orphan pages
- Monitor website changes
- Prepare data for AI agents and automation
Traditional Methods (Before Olostep)
1. Sitemaps (XML Files)
Webmasters often create XML sitemaps to help Google index their sites. Here’s an example:
<urlset>
<url>
<loc>https://example.com</loc>
</url>
<url>
<loc>https://example.com/about</loc>
</url>
</urlset>
To find sitemaps:
Other possible sitemap locations:
/sitemap.xml.gz
/sitemap_index.xml
/sitemap.php
You can also Google:
site:example.com filetype:xml
Problems:
- Some websites don’t maintain updated sitemaps.
- Not all pages may be listed.
- Dynamic websites (heavy JavaScript) often leave out many pages.
2. Robots.txt
Example:
User-agent: *
Sitemap: https://example.com/sitemap.xml
Disallow: /admin
Good for finding disallowed URLs and sitemap links, but again not comprehensive.
The Modern Solution: Olostep Maps API
✅ Find up to 100,000 URLs in seconds.
✅ No need to manually find sitemap or robots.txt.
✅ Simple API call.
✅ No server maintenance or IP bans.
👉 Full code Gist
Let’s build a full Streamlit app to demo this!
🛠️ Full Project: Website URL Extractor with Olostep Maps API + Streamlit
1. Install Requirements
pip install streamlit requests
2. Python Code
import streamlit as st
import requests
import json
def fetch_urls(target_url, api_key):
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
payload = {"url": target_url}
response = requests.post("https://api.olostep.com/v1/maps", headers=headers, json=payload)
if response.status_code == 200:
return response.json()
else:
st.error(f"Failed to fetch URLs. Status code: {response.status_code}")
return None
st.title("🔎 Website URL Scraper")
st.markdown("Use Olostep Maps API to instantly extract all discovered URLs from any website. Great for SEO, scraping, site analysis, and more!")
api_key = st.text_input("Enter your Olostep API Key", type="password")
url_to_scrape = st.text_input("Enter Website URL (e.g., https://example.com)")
if st.button("Find URLs"):
if api_key and url_to_scrape:
with st.spinner("Fetching URLs..."):
data = fetch_urls(url_to_scrape, api_key)
if data:
urls = data.get("urls", [])
st.success(f"✅ Found {len(urls)} URLs!")
for idx, u in enumerate(urls, start=1):
st.markdown(f"{idx}. [{u}]({u})")
st.download_button(
"📄 Download URLs as Text File",
data="\n".join(urls),
file_name="discovered_urls.txt",
mime="text/plain"
)
📸 Example Output
✅ Found 35 URLs from https://docs.olostep.com
📥 Saved as discovered_urls.txt
⚡ Why Olostep Maps API Beats Traditional Methods
Feature | Sitemap/Robots.txt | SEO Spider | Olostep Maps |
---|---|---|---|
Instant Response | ❌ | ❌ | ✅ |
Handles JS-heavy Sites | ❌ | ⚠️ (Partial) | ✅ |
Handles Big Sites | ❌ | ❌ (Limit) | ✅ |
No Setup Needed | ❌ | ❌ | ✅ |
Easy Pagination | ❌ | ❌ | ✅ |
📈 Conclusion
Using Olostep Maps API + a few lines of Streamlit code, you can build powerful website discovery tools in minutes.
No more worrying about sitemaps, robots.txt, or getting blocked by firewalls.
✅ Super fast
✅ Reliable
✅ Perfect for Growth Engineering, SEO, Scraping, and Automation.
🚀 Ready to try?
Register at 👉 Olostep.com and start building your own data pipelines today!
Written by:
Mohammad Ehsan Ansari
Growth Engineer @ Olostep
Happy scraping! 🚀