" MicromOne: Web Scraping: What It Is, When to Use It, and Why to Do It Ethically

Pagine

Web Scraping: What It Is, When to Use It, and Why to Do It Ethically

In the world of data, one of the most common challenges is accessing information. Very often, the data we need is not available in downloadable formats (such as CSV or Excel), but is instead embedded within web pages. This is where web scraping comes into play.

What Is Web Scraping?

Web scraping is a technique that allows you to extract data from websites using code. Web pages are written in HTML (HyperText Markup Language), a language that uses tags to structure content (headings, paragraphs, tables, links, etc.).

Since HTML is essentially text, it can be read and analyzed by programs called parsers, which make it possible to automatically locate and retrieve the desired information.

In practice, instead of manually copying and pasting data from a website, we can write a script that does it for us.

How Do You Obtain HTML Data?

HTML data can be collected in two main ways:

  • By manually downloading the HTML source code of a web page

  • By programmatically accessing the website via HTTP requests (for example, using a GET request)

Once the HTML is obtained, it can be analyzed and transformed into structured data ready for analysis.

When Not to Use Web Scraping

It’s important to clarify a crucial point: web scraping is not always allowed.

Many websites impose specific restrictions in their Terms and Conditions, and ignoring them can lead to legal issues. For this reason, it’s essential to do your homework before starting any scraping activity.

Here are some fundamental guidelines to follow:

  • Always check the website’s Terms and Conditions

  • Consider whether your data usage is personal, academic, or commercial

  • Act ethically and responsibly

  • If the website offers a public API, use it instead of scraping

  • Send HTTP requests at a reasonable frequency

  • Avoid massive or simultaneous requests that could resemble a DDoS attack

  • Stay informed about laws and regulations related to web scraping

Web scraping is not just a technical matter, but also an ethical one. There are excellent articles that explore this topic further, such as “Ethics in Web Scraping” on Towards Data Science.

API vs Web Scraping

Whenever possible, it’s always better to choose APIs over scraping.

Why?

  • APIs are more stable: they don’t depend on a website’s layout

  • They are specifically designed to provide data

  • They offer data that is already structured and easy to use

  • They scale better with increased request volume

Web scraping, on the other hand, is fragile: even a small change in the website’s HTML code (a redesign, a new tag, a different class) can completely break your script.

Golden rule: if an official API exists, use it.

Key Terms to Know

For beginners, here are some essential concepts:

  • HTML (HyperText Markup Language): the markup language used to create web pages

  • Parser: a tool that analyzes HTML code to extract information

  • Web Scraping: a technique for extracting data from websites using code


Web scraping is a powerful and highly useful tool for anyone working with data, but it must be used consciously and responsibly. Understanding how HTML works, when to use scraping, and when to avoid it is essential to becoming an effective—and above all ethical—data wrangler.

If you want to work with web data, remember: respect websites, respect the rules, and always choose the best solution between scraping and APIs.