# Obsidian URL Name Extractor
**Forked from**: [obsidian-url-namer](https://github.com/zfei/obsidian-url-namer) by [zfei](https://github.com/zfei)
This is a plugin for Obsidian (https://obsidian.md) that retrieves HTML titles to name raw URL links.
## What's New in This Fork
- **Improved URL detection**: Liberal regex pattern that correctly handles all valid URLs including domains with any valid TLD, DOI links, and academic article URLs
- **Progressive complexity**: Smart request strategy that tries simple approaches first, only adding complexity when needed to avoid triggering anti-bot systems
- **Configurable URL regex**: Customize the URL matching pattern in plugin settings
- **Site-specific title extraction**: Define custom title regex patterns for websites with non-standard HTML
- **Multiple fallback methods**: Archive.org and Microlink API support for bot-protected sites
- **Better error handling**: Clear error messages and graceful degradation
## Usage
Select the text that contains the URLs to be named, execute the command *Name the URL links in the selected text*.
It's recommended to name few URLs at a time. In the case when the URL requests are taking some time, please **DO NOT** change the text selection or the content itself, before the command is done. Otherwise, the eventual result will be out of order.
Easier with the command binded to a keyboard shortcut.

## Settings
### URL Regex Pattern
Customize the regular expression used to detect URLs in your text. The default pattern is:
```regex
https?:\/\/[^\s\]\)]+
```
This pattern is compatible with all devices including iOS. It matches any `http://` or `https://` URL and automatically filters out URLs already in markdown links `[text](url)`. The plugin validates URLs using the native URL constructor, so invalid URLs are safely ignored.
### Site-Specific Title Patterns
For websites that don't use standard `
` tags or use lazy-loaded content, you can define custom patterns. Each pattern consists of:
- **URL Match**: A string to identify the website (e.g., `example.com`)
- **Title Regex**: A regex pattern to extract the title (e.g., `` tags.
## Known Limitations
### Bot Protection
Some websites use Cloudflare, AWS WAF, or other bot protection systems. The plugin handles these cases through its progressive approach:
1. Attempts simple request (avoids triggering protection)
2. Falls back to browser emulation if needed
3. Uses external services if bot protection is detected
**Common protection systems:**
- Cloudflare challenge pages
- AWS WAF cookie challenges
- JavaScript-based bot detection
**When you'll see errors:**
If bot protection is detected and no fallback methods are enabled:
```
Error: Bot protection detected. Enable a fallback method in settings.
```
**Recommended approach:**
Enable Microlink fallback in settings for the most reliable experience with protected sites.
### Archive.org Fallback
When enabled in settings, the plugin will automatically attempt to fetch titles from Archive.org's Wayback Machine if a site blocks direct access. This:
- Works for sites with Cloudflare or other bot protection
- May use slightly outdated content (shows date of archived snapshot)
- Adds a small delay while checking for archived versions
- Won't work for very recent URLs that haven't been archived yet
### Microlink Fallback
When enabled in settings, the plugin uses [Microlink API](https://microlink.io) to fetch titles from protected sites. This is the **recommended fallback** for protected sites.
**Features:**
- More reliable than Archive.org for recent content
- Works with most protected sites
- No account required for free tier
**Limitations:**
- ⚠️ **Free tier: 50 requests/day** — when exhausted, falls back to Archive.org if enabled
- URLs are sent to a third-party service (Microlink)
- Optional API key field for users with paid plans
### How Title Fetching Works
The plugin uses a progressive complexity approach to maximize compatibility:
1. **Simple request** — Clean HTTP request with minimal headers
- Works for the majority of websites
- Avoids triggering anti-bot systems
- Faster response times
- Obsidian's requestUrl automatically handles HTTP redirects (301, 302, etc.)
2. **Complex browser emulation** — Full browser-like headers if simple request fails
- User-Agent, Accept, Referer, and other browser headers
- Used automatically when simple approach doesn't work
3. **External fallback services** — When bot protection is detected:
- **Microlink API** — Headless browser service (requires sending URLs to third-party)
- **Archive.org** — Wayback Machine archived snapshots
Both fallback methods are **disabled by default**. Enable them in settings if you frequently encounter protected sites.
### HTML Entity Decoding
Page titles are automatically decoded for common HTML entities:
- Basic entities: `&`, `<`, `>`, `"`, `'`, ` `
- Typographic quotes: `’`, `‘`, `”`, `“`
- Dashes: `–`, `—`
- Special characters: `…`, `•`
- Numeric entities: `{` (decimal) and `«` (hexadecimal)
### Fallback Priority
When both Archive.org and Microlink fallbacks are enabled, you can choose the priority order:
- **Microlink → Archive.org** (recommended): More reliable for recent content
- **Archive.org → Microlink** (privacy-focused): Tries non-profit Archive.org first
## Troubleshooting
### URLs aren't being detected
Check your URL regex pattern in settings. The default pattern requires `http://` or `https://` prefix:
```regex
https?:\/\/[^\s\]\)]+
```
The plugin automatically skips URLs already in markdown links `[text](url)`.
### Microlink rate limit errors
The free tier of Microlink allows 50 requests per day. When this limit is reached:
1. Error message: "Microlink daily limit reached (50/day)"
2. Plugin automatically tries Archive.org fallback if enabled
3. Consider adding an API key in settings for higher limits
### Site-specific issues
For websites with non-standard HTML or lazy-loaded titles, configure site-specific patterns in settings. See the Site-Specific Title Patterns section above.
## Compilation
- Clone this repo.
- `npm i` or `yarn` to install dependencies
- `npm run build` to compile, or `npm run dev` to start compilation in watch mode.
## Installation
- After compiled, rename the `dist` directory to `obsidian-url-name-extractor` and move it into the vault's plugin directory `VaultFolder/.obsidian/plugins/`.
## Credits
Original plugin created by [zfei](https://github.com/zfei). This fork adds configurable settings, multiple fallback methods for bot-protected sites, and improved URL handling.