AZLyrics Scraper & Translator avatar

AZLyrics Scraper & Translator

Pricing

from $1.50 / 1,000 song scrapeds

Go to Apify Store
AZLyrics Scraper & Translator

AZLyrics Scraper & Translator

Scrape song lyrics from AZLyrics in bulk. Search by artist, song title, or keywords. Extract full artist catalogs with metadata. Translate lyrics to any language with AI. The cheapest per-song lyrics scraper on Apify — built for NLP researchers, dataset builders, and music data projects.

Pricing

from $1.50 / 1,000 song scrapeds

Rating

0.0

(0)

Developer

Marielise

Marielise

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

11 days ago

Last modified

Share

AZLyrics Bulk Lyrics Scraper

Scrape song lyrics in bulk from AZLyrics.com at the cheapest per-song rate on the Apify platform. At $0.0015 per song, scraping 10,000 songs costs just $15. Built for NLP researchers, computational linguists, and dataset builders who need large lyrics corpora without breaking the bank. Translate lyrics to any language with built-in AI translation.

AZLyrics hosts one of the largest collections of song lyrics on the web, covering hundreds of thousands of artists across every genre and language. This Actor extracts clean, plain-text lyrics along with structured metadata (artist, album, title) ready for direct use in machine learning pipelines, text analysis, and research datasets.

Features

  • Bulk-optimized scraping -- scrape entire artist discographies or target individual songs
  • Clean text output -- lyrics extracted as clean plain text, free of HTML artifacts, ads, and formatting noise
  • AI-powered translation -- translate lyrics to any language via OpenAI-compatible APIs, with built-in service or bring your own key (BYOK) for lower cost
  • Structured metadata -- song title, artist name, album name, and source URL for every record
  • Artist catalog extraction -- full discography listings with album groupings and related artists
  • Multi-strategy search -- provide artist + song fields, free-text queries, or direct AZLyrics URLs
  • Google fallback -- when direct URL guesses fail, the Actor falls back to site:azlyrics.com Google search
  • Anti-bot resilience -- PlaywrightCrawler with Firefox, fingerprint generation, session rotation, and residential proxy support to handle Cloudflare and reCAPTCHA challenges

How to Use

1. Scrape a specific song

Provide the artist name and song title for the most reliable direct lookup.

{
"artistName": "Radiohead",
"songTitle": "Creep",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

2. Scrape all songs by an artist

Set only the artist name. The Actor locates the artist page, extracts the full catalog, then scrapes lyrics for every song (up to maxResults).

{
"artistName": "Radiohead",
"maxResults": 0,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Use a natural query. The Actor tries multiple URL patterns and falls back to Google if needed.

{
"searchQuery": "Pomme Soleil Soleil",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

4. Direct AZLyrics URLs

Provide one or more AZLyrics URLs. The Actor auto-detects whether each URL is a song page, artist page, or letter index.

{
"startUrls": [
{ "url": "https://www.azlyrics.com/lyrics/radiohead/creep.html" },
{ "url": "https://www.azlyrics.com/r/radiohead.html" }
],
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

5. Scrape with AI translation

Add translateTo to translate every song's lyrics. Works out of the box at $0.008/song, or provide your own OpenAI-compatible API key for $0.002/song.

{
"artistName": "Pomme",
"translateTo": "English",
"maxResults": 10,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

With your own API key (lower cost):

{
"artistName": "Pomme",
"translateTo": "English",
"translateApiKey": "sk-your-openai-key",
"maxResults": 10,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

You can also use non-OpenAI providers by setting translateBaseUrl and translateModel:

{
"artistName": "Pomme",
"translateTo": "English",
"translateApiKey": "your-api-key",
"translateBaseUrl": "https://generativelanguage.googleapis.com/v1beta/openai/",
"translateModel": "gemini-2.0-flash",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Input Parameters

ParameterTypeDefaultDescription
artistNamestring--Artist or band name. Used alone to scrape all songs, or with songTitle for a specific song.
songTitlestring--Song title to search for. Requires artistName.
searchQuerystring--Free-text search query. The Actor guesses artist/song from the text.
startUrlsarray--Direct AZLyrics URLs to scrape. Format: [{ "url": "..." }]
maxResultsinteger50Maximum number of songs to scrape. Set to 0 for unlimited.
translateTostring"English"Target language for AI translation (e.g. "Spanish", "Japanese"). Leave empty to skip.
translateApiKeystring--Your OpenAI-compatible API key (BYOK). Omit to use the built-in service.
translateModelstring"gpt-4o-mini"Chat completions model. Only relevant when using your own API key.
translateBaseUrlstring--API base URL for non-OpenAI providers.
proxyConfigurationobject--Proxy settings. Residential proxies are strongly recommended.

You must provide at least one of artistName, searchQuery, or startUrls.

Output

The Actor produces two types of records in the same dataset. Use the dataset views in the Apify Console to filter them.

Song result

Each song produces one record with lyrics and metadata.

{
"title": "Creep",
"artist": "Radiohead",
"album": "Pablo Honey",
"lyrics": "When you were here before\nCouldn't look you in the eye\nYou're just like an angel\nYour skin makes me cry\n\nYou float like a feather\nIn a beautiful world\nI wish I was special\nYou're so very special\n\nBut I'm a creep\nI'm a weirdo\nWhat the hell am I doing here?\nI don't belong here",
"translatedLyrics": null,
"url": "https://www.azlyrics.com/lyrics/radiohead/creep.html",
"scrapedAt": "2025-02-15T10:30:00.000Z"
}
FieldTypeDescription
titlestringSong title
artiststringArtist or band name
albumstring or nullAlbum name when available on the page
lyricsstringFull lyrics as clean plain text
translatedLyricsstring or nullAI-translated lyrics, or null if translation was not requested
urlstringSource URL on AZLyrics
scrapedAtstringISO 8601 timestamp

Artist catalog result

When an artist page is scraped, the Actor produces a catalog record with the full discography and then enqueues individual song pages.

{
"type": "artist-catalog",
"artist": "Radiohead",
"artistUrl": "https://www.azlyrics.com/r/radiohead.html",
"totalSongs": 162,
"songs": [
{
"title": "You",
"url": "https://www.azlyrics.com/lyrics/radiohead/you.html",
"album": "Pablo Honey"
},
{
"title": "Creep",
"url": "https://www.azlyrics.com/lyrics/radiohead/creep.html",
"album": "Pablo Honey"
}
],
"relatedArtists": ["Thom Yorke", "Atoms for Peace", "Muse"],
"scrapedAt": "2025-02-15T10:25:00.000Z"
}
FieldTypeDescription
typestringAlways "artist-catalog"
artiststringArtist or band name
artistUrlstringArtist page URL
totalSongsnumberTotal songs in catalog
songsarrayList of { title, url, album } entries
relatedArtistsarrayNames of related artists listed on the page
scrapedAtstringISO 8601 timestamp

Dataset views

The Apify Console provides four pre-configured views:

  • Songs -- table of scraped songs with title, artist, album, URL (filtered to song records only)
  • Lyrics -- table with full lyrics text and translations side-by-side
  • Artist Catalogs -- summary table of artist catalogs with song counts and related artists
  • Catalog Songs (expanded) -- the songs array from each catalog unwound into individual rows

Pricing

This Actor uses Pay Per Event (PPE) pricing. You only pay for successfully scraped data. There are no platform fees, no subscriptions, and no minimum spend.

EventPriceDescription
Song scraped$0.0015One song with lyrics and metadata successfully extracted
Artist catalog scraped$0.015One artist's full discography listing extracted
Translation (BYOK)$0.002Lyrics translated using your own OpenAI API key
Translation (integrated)$0.008Lyrics translated using the built-in AI service

Cost examples

ScenarioSongsCatalogsActor costEst. proxy costTotal
Single song lookup10$0.002~$0.01~$0.01
Small artist501$0.09~$0.05~$0.14
Large artist2001$0.32~$0.20~$0.52
10 artists1,00010$1.65~$0.75~$2.40
Bulk corpus10,000100$16.50~$7.50~$24.00
Large corpus50,000500$82.50~$40.00~$122.50
1,000 songs + translation (BYOK)1,00010$3.65~$0.75~$4.40
1,000 songs + translation (integrated)1,00010$9.65~$0.75~$10.40

Proxy costs are estimates based on Apify residential proxy pricing and assume a moderate retry rate. Actual costs depend on blocking rates and geographic targeting.

Proxy Requirements

Residential proxies are required for reliable operation. AZLyrics employs Cloudflare protection and Google reCAPTCHA. Datacenter proxies will be blocked within a few requests.

When configuring the Actor, select Apify Proxy with the RESIDENTIAL group or provide your own residential proxy URLs. Budget approximately $0.50--1.00 in proxy costs per 1,000 songs scraped.

The Actor uses conservative request pacing (3--7 second random delays, maximum 12 requests per minute, 3 concurrent sessions) combined with session rotation, browser fingerprint generation, and automatic CAPTCHA detection with retry. These measures are effective only when paired with residential proxy IPs.

How It Works

  1. URL resolution -- the Actor converts your input (artist name, song title, search query, or direct URLs) into AZLyrics page URLs. For name-based inputs, it constructs direct URLs from the normalized name. If the direct URL returns a 404 or empty page, it falls back to a site:azlyrics.com Google search.

  2. Artist page processing -- when an artist page is found, the Actor extracts the full song catalog organized by album, pushes the catalog record to the dataset, and enqueues individual song URLs for lyrics extraction.

  3. Lyrics extraction -- on each song page, the Actor waits for Cloudflare challenges to clear, then extracts lyrics from the DOM using comment-marker navigation (AZLyrics wraps lyrics between HTML comments rather than using semantic markup). The raw text is cleaned: HTML stripped, whitespace normalized, and line breaks preserved.

  4. Translation -- if translateTo is set, each song's lyrics are sent to an OpenAI-compatible chat completions API for translation. The translated text is stored alongside the original.

  5. Anti-bot handling -- the Actor uses PlaywrightCrawler with real Firefox to pass browser fingerprint checks. Sessions are rotated through a pool of 20, each limited to 15 requests or 5 minutes. CAPTCHA and block responses are detected automatically, causing the session to be discarded and the request retried with a fresh IP.

Use Cases

  • NLP and text analysis -- build lyrics corpora for sentiment analysis, topic modeling, language studies, and text classification
  • Music information retrieval -- structured lyrics data for MIR research pipelines and music recommendation systems
  • Dataset building -- create training datasets for generative models, search indexes, or annotation projects
  • Discography research -- catalog complete artist discographies with album groupings and song counts
  • Translation datasets -- generate parallel lyrics corpora across languages for machine translation research
  • Cultural analysis -- study lyrical trends, vocabulary evolution, and thematic patterns across genres, decades, or regions

Tips for Reliable Large Runs

  • Use maxResults to break large jobs into manageable batches (e.g. 200 songs per run)
  • Run during off-peak hours (UTC late night / early morning) for lower blocking rates
  • Monitor the run log for CAPTCHA retry rates; if retries spike, pause and resume later
  • For very large corpora (10,000+ songs across many artists), use the Apify scheduler to spread runs over multiple days

FAQ

Can I use datacenter proxies? You can try, but expect very high failure rates. AZLyrics actively blocks datacenter IP ranges. Residential proxies are strongly recommended.

How fast is the scraping? With default settings (3--7 second delays, 3 concurrent sessions, max 12 requests/minute), expect roughly 10--15 songs per minute. The Actor prioritizes reliability over speed.

What happens when a CAPTCHA is triggered? The Actor detects the CAPTCHA response, discards the current session (IP + cookies), and retries the request with a fresh session. Most CAPTCHA blocks resolve automatically within 1--2 retries.

Are the lyrics cleaned? Yes. The Actor extracts raw text content from the browser DOM, strips all HTML, trims whitespace, and normalizes line breaks. The output is clean plain text ready for NLP pipelines.

Can I scrape lyrics without artist catalog data? Yes. Provide individual song page URLs via startUrls. The Actor will scrape only those songs without generating catalog records.

What if an artist name does not match? The Actor first constructs a direct URL from the normalized artist name. If that page returns no results (404 or empty song list), it falls back to a Google site:azlyrics.com search to find the correct page.

Can I use a non-OpenAI translation provider? Yes. Set translateBaseUrl to any OpenAI-compatible API endpoint and translateModel to the model you want. For example, use Google Gemini by setting the base URL to https://generativelanguage.googleapis.com/v1beta/openai/ and the model to gemini-2.0-flash.

What is the difference between integrated and BYOK translation? With integrated translation ($0.008/song), the Actor uses a built-in OpenAI API key -- you do not need your own. With BYOK translation ($0.002/song), you provide your own API key and pay your LLM provider directly, so the Actor charges a lower fee.