Monday, June 9, 2025

Scraper – A Highly effective Python Script That Permits You To Scrape Messages And Media From Telegram Channels Utilizing The Telethon Library




A strong Python script that permits you to scrape messages and media from Telegram channels utilizing the Telethon library. Options embody real-time steady scraping, media downloading, and knowledge export capabilities.

___________________  _________
__ ___/ _____/ / _____/
| | / ___ _____
| | _ /
|____| ______ /_______ /
/ /

Options 🚀

  • Scrape messages from a number of Telegram channels
  • Obtain media recordsdata (images, paperwork)
  • Actual-time steady scraping
  • Export knowledge to JSON and CSV codecs
  • SQLite database storage
  • Resume functionality (saves progress)
  • Media reprocessing for failed downloads
  • Progress monitoring
  • Interactive menu interface

Stipulations 📋

Earlier than working the script, you may want:

  • Python 3.7 or increased
  • Telegram account
  • API credentials from Telegram

Required Python packages

pip set up -r necessities.txt

Contents of necessities.txt:

telethon
aiohttp
asyncio

Getting Telegram API Credentials 🔑

  1. Go to https://my.telegram.org/auth
  2. Log in along with your telephone quantity
  3. Click on on “API improvement instruments”
  4. Fill within the kind:
  5. App title: Your app identify
  6. Brief identify: Your app brief identify
  7. Platform: Will be left as “Desktop”
  8. Description: Transient description of your app
  9. Click on “Create utility”
  10. You will obtain:
  11. api_id: A quantity
  12. api_hash: A string of letters and numbers

Hold these credentials protected, you may want them to run the script!

Setup and Working 🔧

  1. Clone the repository:
git clone https://github.com/unnohwn/telegram-scraper.git
cd telegram-scraper
  1. Set up necessities:
pip set up -r necessities.txt
  1. Run the script:
python telegram-scraper.py
  1. On first run, you may be prompted to enter:
  2. Your API ID
  3. Your API Hash
  4. Your telephone quantity (with nation code)
  5. Your telephone quantity (with nation code) or bot, however use the telephone quantity possibility when prompted second time.
  6. Verification code (despatched to your Telegram)

Preliminary Scraping Conduct 🕒

When scraping a channel for the primary time, please word:

  • The script will try and retrieve the whole channel historical past, ranging from the oldest messages
  • Preliminary scraping can take a number of minutes and even hours, relying on:
  • The entire variety of messages within the channel
  • Whether or not media downloading is enabled
  • The dimensions and variety of media recordsdata
  • Your web connection velocity
  • Telegram’s charge limiting
  • The script makes use of pagination and maintains state, so if interrupted, it may possibly resume from the place it left off
  • Progress proportion is displayed in real-time to trace the scraping standing
  • Messages are saved within the database as they’re scraped, so you can begin analyzing out there knowledge even earlier than the scraping is full

Utilization 📝

The script supplies an interactive menu with the next choices:

  • [A] Add new channel
  • Enter the channel ID or channelname
  • [R] Take away channel
  • Take away a channel from scraping checklist
  • [S] Scrape all channels
  • One-time scraping of all configured channels
  • [M] Toggle media scraping
  • Allow/disable downloading of media recordsdata
  • [C] Steady scraping
  • Actual-time monitoring of channels for brand new messages
  • [E] Export knowledge
  • Export to JSON and CSV codecs
  • [V] View saved channels
  • Checklist all saved channels
  • [L] Checklist account channels
  • Checklist all channels with ID:s for account
  • [Q] Stop

Channel IDs 📢

You should utilize both: – Channel username (e.g., channelname) – Channel ID (e.g., -1001234567890)

Information Storage 💾

Database Construction

Information is saved in SQLite databases, one per channel: – Location: ./channelname/channelname.db – Desk: messagesid: Major key – message_id: Telegram message ID – date: Message timestamp – sender_id: Sender’s Telegram ID – first_name: Sender’s first identify – last_name: Sender’s final identify – username: Sender’s username – message: Message textual content – media_type: Kind of media (if any) – media_path: Native path to downloaded media – reply_to: ID of replied message (if any)

Media Storage 📁

Media recordsdata are saved in: – Location: ./channelname/media/ – Information are named utilizing message ID or unique filename

Exported Information 📊

Information might be exported in two codecs: 1. CSV: ./channelname/channelname.csv – Human-readable spreadsheet format – Simple to import into Excel/Google Sheets

  1. JSON: ./channelname/channelname.json
  2. Structured knowledge format
  3. Very best for programmatic processing

Options in Element 🔍

Steady Scraping

The continual scraping characteristic ([C] possibility) permits you to: – Monitor channels in real-time – Robotically obtain new messages – Obtain media because it’s posted – Run indefinitely till interrupted (Ctrl+C) – Maintains state between runs

Media Dealing with

The script can obtain: – Images – Paperwork – Different media varieties supported by Telegram – Robotically retries failed downloads – Skips current recordsdata to keep away from duplicates

Error Dealing with 🛠️

The script contains: – Automated retry mechanism for failed media downloads – State preservation in case of interruption – Flood management compliance – Error logging for failed operations

Limitations ⚠️

  • Respects Telegram’s charge limits
  • Can solely entry public channels or channels you are a member of
  • Media obtain dimension limits apply as per Telegram’s restrictions

Contributing 🤝

Contributions are welcome! Please be happy to submit a Pull Request.

License 📄

This venture is licensed below the MIT License – see the LICENSE file for particulars.

Disclaimer ⚖️

This software is for academic functions solely. Ensure that to: – Respect Telegram’s Phrases of Service – Acquire essential permissions earlier than scraping – Use responsibly and ethically – Adjust to knowledge safety rules



Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

PHP Code Snippets Powered By : XYZScripts.com