Monday, April 28, 2025

How to Find and Extract All URLs from a Website Using Olostep Maps API and Streamlit

Programming LanguageHow to Find and Extract All URLs from a Website Using Olostep Maps API and Streamlit


Introduction

When building web crawlers, competitive analysis, SEO audits, or AI agents, one of the first critical tasks is finding all the URLs on a website.

While traditional methods like Google search tricks, sitemap exploration, and SEO tools work, there’s a faster, modern way: using Olostep Maps API.

In this guide, we’ll:

  • Introduce the challenge of URL discovery
  • Show how to build a live Streamlit app to scrape all URLs
  • Compare it with traditional techniques (like sitemap.xml and robots.txt)
  • Provide complete runnable Python code

Target Audience: Developers, Growth Engineers, Data Scientists, SEO specialists, and Founders who need structured, scalable scraping.

Why Extract All URLs?

Finding every page on a website can help you:

  • Analyze site structure (for SEO)
  • Scrape website content efficiently
  • Find hidden gems like orphan pages
  • Monitor website changes
  • Prepare data for AI agents and automation

Traditional Methods (Before Olostep)

1. Sitemaps (XML Files)

Webmasters often create XML sitemaps to help Google index their sites. Here’s an example:

<urlset>
  <url>
    <loc>https://example.com</loc>
  </url>
  <url>
    <loc>https://example.com/about</loc>
  </url>
</urlset>
Enter fullscreen mode

Exit fullscreen mode

To find sitemaps:

Other possible sitemap locations:

  • /sitemap.xml.gz
  • /sitemap_index.xml
  • /sitemap.php

You can also Google:

site:example.com filetype:xml
Enter fullscreen mode

Exit fullscreen mode

Problems:

  • Some websites don’t maintain updated sitemaps.
  • Not all pages may be listed.
  • Dynamic websites (heavy JavaScript) often leave out many pages.

2. Robots.txt

Example:

User-agent: *
Sitemap: https://example.com/sitemap.xml
Disallow: /admin
Enter fullscreen mode

Exit fullscreen mode

Good for finding disallowed URLs and sitemap links, but again not comprehensive.

The Modern Solution: Olostep Maps API

✅ Find up to 100,000 URLs in seconds.

✅ No need to manually find sitemap or robots.txt.

✅ Simple API call.

✅ No server maintenance or IP bans.

👉 Full code Gist

Let’s build a full Streamlit app to demo this!

🛠️ Full Project: Website URL Extractor with Olostep Maps API + Streamlit

1. Install Requirements

pip install streamlit requests
Enter fullscreen mode

Exit fullscreen mode

2. Python Code

import streamlit as st
import requests
import json

def fetch_urls(target_url, api_key):
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    payload = {"url": target_url}
    response = requests.post("https://api.olostep.com/v1/maps", headers=headers, json=payload)
    if response.status_code == 200:
        return response.json()
    else:
        st.error(f"Failed to fetch URLs. Status code: {response.status_code}")
        return None

st.title("🔎 Website URL Scraper")

st.markdown("Use Olostep Maps API to instantly extract all discovered URLs from any website. Great for SEO, scraping, site analysis, and more!")

api_key = st.text_input("Enter your Olostep API Key", type="password")
url_to_scrape = st.text_input("Enter Website URL (e.g., https://example.com)")

if st.button("Find URLs"):
    if api_key and url_to_scrape:
        with st.spinner("Fetching URLs..."):
            data = fetch_urls(url_to_scrape, api_key)
        if data:
            urls = data.get("urls", [])
            st.success(f"✅ Found {len(urls)} URLs!")
            for idx, u in enumerate(urls, start=1):
                st.markdown(f"{idx}. [{u}]({u})")

            st.download_button(
                "📄 Download URLs as Text File",
                data="\n".join(urls),
                file_name="discovered_urls.txt",
                mime="text/plain"
            )
Enter fullscreen mode

Exit fullscreen mode

📸 Example Output

✅ Found 35 URLs from https://docs.olostep.com

📥 Saved as discovered_urls.txt

⚡ Why Olostep Maps API Beats Traditional Methods

Feature Sitemap/Robots.txt SEO Spider Olostep Maps
Instant Response
Handles JS-heavy Sites ⚠️ (Partial)
Handles Big Sites ❌ (Limit)
No Setup Needed
Easy Pagination

📈 Conclusion

Using Olostep Maps API + a few lines of Streamlit code, you can build powerful website discovery tools in minutes.

No more worrying about sitemaps, robots.txt, or getting blocked by firewalls.

✅ Super fast

✅ Reliable

✅ Perfect for Growth Engineering, SEO, Scraping, and Automation.

🚀 Ready to try?

Register at 👉 Olostep.com and start building your own data pipelines today!


Written by:

Mohammad Ehsan Ansari

Growth Engineer @ Olostep

Happy scraping! 🚀

Check out our other content

Check out other tags:

Most Popular Articles