Scraper – A Highly effective Python Script That Permits You To Scrape Messages And Media From Telegram Channels Utilizing The Telethon Library

April 12, 2025

180

A strong Python script that permits you to scrape messages and media from Telegram channels utilizing the Telethon library. Options embody real-time steady scraping, media downloading, and knowledge export capabilities.

___________________  _________
__    ___/  _____/ /   _____/
|    | /     ___ _____   
|    |     _  /        
|____|  ______  /_______  /
/        /

Options 🚀

Scrape messages from a number of Telegram channels
Obtain media recordsdata (images, paperwork)
Actual-time steady scraping
Export knowledge to JSON and CSV codecs
SQLite database storage
Resume functionality (saves progress)
Media reprocessing for failed downloads
Progress monitoring
Interactive menu interface

Stipulations 📋

Earlier than working the script, you may want:

Python 3.7 or increased
Telegram account
API credentials from Telegram

Required Python packages

pip set up -r necessities.txt

Contents of necessities.txt:

telethon
aiohttp
asyncio

Getting Telegram API Credentials 🔑

Go to https://my.telegram.org/auth
Log in along with your telephone quantity
Click on on “API improvement instruments”
Fill within the kind:
App title: Your app identify
Brief identify: Your app brief identify
Platform: Will be left as “Desktop”
Description: Transient description of your app
Click on “Create utility”
You will obtain:
api_id: A quantity
api_hash: A string of letters and numbers

Hold these credentials protected, you may want them to run the script!

Setup and Working 🔧

Clone the repository:

git clone https://github.com/unnohwn/telegram-scraper.git
cd telegram-scraper

Set up necessities:

pip set up -r necessities.txt

Run the script:

python telegram-scraper.py

On first run, you may be prompted to enter:
Your API ID
Your API Hash
Your telephone quantity (with nation code)
Your telephone quantity (with nation code) or bot, however use the telephone quantity possibility when prompted second time.
Verification code (despatched to your Telegram)

Preliminary Scraping Conduct 🕒

When scraping a channel for the primary time, please word:

The script will try and retrieve the whole channel historical past, ranging from the oldest messages
Preliminary scraping can take a number of minutes and even hours, relying on:
The entire variety of messages within the channel
Whether or not media downloading is enabled
The dimensions and variety of media recordsdata
Your web connection velocity
Telegram’s charge limiting
The script makes use of pagination and maintains state, so if interrupted, it may possibly resume from the place it left off
Progress proportion is displayed in real-time to trace the scraping standing
Messages are saved within the database as they’re scraped, so you can begin analyzing out there knowledge even earlier than the scraping is full

Utilization 📝

The script supplies an interactive menu with the next choices:

[A] Add new channel
Enter the channel ID or channelname
[R] Take away channel
Take away a channel from scraping checklist
[S] Scrape all channels
One-time scraping of all configured channels
[M] Toggle media scraping
Allow/disable downloading of media recordsdata
[C] Steady scraping
Actual-time monitoring of channels for brand new messages
[E] Export knowledge
Export to JSON and CSV codecs
[V] View saved channels
Checklist all saved channels
[L] Checklist account channels
Checklist all channels with ID:s for account
[Q] Stop

Channel IDs 📢

You should utilize both: – Channel username (e.g., channelname) – Channel ID (e.g., -1001234567890)

Information Storage 💾

Database Construction

Information is saved in SQLite databases, one per channel: – Location: ./channelname/channelname.db – Desk: messages – id: Major key – message_id: Telegram message ID – date: Message timestamp – sender_id: Sender’s Telegram ID – first_name: Sender’s first identify – last_name: Sender’s final identify – username: Sender’s username – message: Message textual content – media_type: Kind of media (if any) – media_path: Native path to downloaded media – reply_to: ID of replied message (if any)

Media Storage 📁

Media recordsdata are saved in: – Location: ./channelname/media/ – Information are named utilizing message ID or unique filename

Exported Information 📊

Information might be exported in two codecs: 1. CSV: ./channelname/channelname.csv – Human-readable spreadsheet format – Simple to import into Excel/Google Sheets

JSON: ./channelname/channelname.json
Structured knowledge format
Very best for programmatic processing

Options in Element 🔍

Steady Scraping

The continual scraping characteristic ([C] possibility) permits you to: – Monitor channels in real-time – Robotically obtain new messages – Obtain media because it’s posted – Run indefinitely till interrupted (Ctrl+C) – Maintains state between runs

Media Dealing with

The script can obtain: – Images – Paperwork – Different media varieties supported by Telegram – Robotically retries failed downloads – Skips current recordsdata to keep away from duplicates

Error Dealing with 🛠️

The script contains: – Automated retry mechanism for failed media downloads – State preservation in case of interruption – Flood management compliance – Error logging for failed operations

Limitations ⚠️

Respects Telegram’s charge limits
Can solely entry public channels or channels you are a member of
Media obtain dimension limits apply as per Telegram’s restrictions

Contributing 🤝

Contributions are welcome! Please be happy to submit a Pull Request.

License 📄

This venture is licensed below the MIT License – see the LICENSE file for particulars.

Disclaimer ⚖️

This software is for academic functions solely. Ensure that to: – Respect Telegram’s Phrases of Service – Acquire essential permissions earlier than scraping – Use responsibly and ethically – Adjust to knowledge safety rules