Subject
- #Web Page
- #Indexing
- #Web Service
- #Search Engine
- #Web Crawling
Created: 2024-04-26
Created: 2024-04-26 17:29
Crawling is the process where a search engine or web crawler (also called a robot or spider) automatically explores web pages on the internet and collects information.automatically explores and collects information from web pages on the internetThis process allows search engines to understand and index the content of countless web pages on the internet, incorporating them into search results. The following is a detailed explanation of the concept and process of crawling.
Hypothetical Crawling Bot - Source: ChatGPT4
Web Crawler: A software or bot that performs crawling. Crawlers access specific websites, follow links, and collect and analyze the content of the pages.
Indexing: The process of storing the content of web pages collected through crawling in a search engine database. During this process, the search engine analyzes the content of each page, metadata, keywords, link structure, etc.
Sitemap: A file that describes the structure and pages of a website. Crawlers refer to sitemaps to efficiently crawl the website's pages.
Starting Page: Crawlers typically start from a known website or root domain. This page is the starting point for crawling.
Link Following: Crawlers follow links within a web page and move to other pages through those links. Crawling proceeds by following the connection relationships between web pages.
Content Collection: Crawlers collect text, images, metadata, etc., from each page. This information is stored in the search engine's index and used to generate search results.
Iteration: Crawlers continuously crawl multiple pages, discovering new links and collecting additional pages.
Google Search Engine: Google crawls the entire web to generate search results. It utilizes various web crawlers for this purpose, and the crawled information is converted into searchable data through Google's indexing process.
Specialized Crawling: There are also crawlers specialized in specific topics or industries. For example, Google Scholar (Google Scholar) for crawling academic papers, and real estate websites for crawling real estate information.
Price Comparison Websites: Crawlers collect price information from online stores and provide price comparison services. For instance, they crawl product information from various online shopping malls to help users find the cheapest prices.
※ In Korea, when you search on "Naver Shopping," you can see information from various shopping sites, not just Naver. Sites like "Danawa" are similar.
Social Media Crawling: Crawling publicly available information from social media can be used for trend analysis, marketing strategy development, and more. For example, there are tools that collect and analyze information on specific hashtags or topics from Twitter.
Crawling is a core technology that enables various web services, including search engines. However, when crawling, it is essential to comply with website robots exclusion standards (Robots.txt) and privacy policies.
Comments0