# Web Crawler API The Web Crawler API is a simple API that allows you to crawl websites and store the crawled data in a database. It uses GuzzleHttp to send HTTP requests and parses the HTML content to extract links from web pages. The API is built with Laravel framework. ## Features - Crawls websites and stores the crawled data in the database. - Supports setting the depth of the crawling process. - Prevents duplicate URLs from being crawled. - Retrieves and saves the HTML content of crawled pages. - Extracts valid URLs from the crawled pages. ## Prerequisites - PHP >= 7.4 - Composer - Laravel framework - MongoDB - Docker - Docker Compose - GuzzleHttp - MongoDB PHP driver (extension - mongodb.so) - jenssegers/mongodb package ## Getting Started 1. Clone the repository: ```bash git clone https://git.dayanhub.com/kfir/rank_exam ## Services # server Run the server - php artisan serve # MongoDB Run the server - php artisan serve run mongo - run docker-compose up -d migrate - php artisan migrate ## Configuration use .env file to set up the database connection ## API Endpoints ## # GET /api/crawl: Crawls a website and stores the crawled data in the database. Required query parameter: url. Optional query parameter: depth (default: 1). Parameters: - `url` (required): The URL of the website to crawl. - `depth` (optional): The depth of the crawling process (default: 0). - `refresh` (optional): If set to 1, the crawler will refresh the results for an existing URL (default: false). # GET /api: Retrieves all crawled data from the database. # DELETE /api/crawl/{id}: Deletes a specific crawled data record from the database. # DELETE /api/crawl: Deletes all crawled data records from the database.