Web_Crawler_API/README.md
2023-05-31 13:31:01 +03:00

1.5 KiB

Web Crawler API

The Web Crawler API is a simple API that allows you to crawl websites and store the crawled data in a database. It uses GuzzleHttp to send HTTP requests and parses the HTML content to extract links from web pages. The API is built with Laravel framework.

Features

  • Crawls websites and stores the crawled data in the database.
  • Supports setting the depth of the crawling process.
  • Prevents duplicate URLs from being crawled.
  • Retrieves and saves the HTML content of crawled pages.
  • Extracts valid URLs from the crawled pages.

Prerequisites

  • PHP >= 7.4
  • Composer
  • Laravel framework
  • MongoDB
  • Docker
  • Docker Compose
  • GuzzleHttp
  • MongoDB PHP driver (extension - mongodb.so)
  • jenssegers/mongodb package

Getting Started

  1. Clone the repository:

    git clone https://git.dayanhub.com/kfir/rank_exam
    
    
    

Services

server

Run the server - php artisan serve

MongoDB

Run the server - php artisan serve
run mongo - run docker-compose up -d
migrate - php artisan migrate

Configuration

use .env file to set up the database connection

API Endpoints

GET /api/crawl: Crawls a website and stores the crawled data in the database. Required query parameter: url. Optional query parameter: depth (default: 1).

GET /api: Retrieves all crawled data from the database.

DELETE /api/crawl/{id}: Deletes a specific crawled data record from the database.

DELETE /api/crawl: Deletes all crawled data records from the database.