63 lines
1.8 KiB
Markdown
63 lines
1.8 KiB
Markdown
# Web Crawler API
|
|
|
|
The Web Crawler API is a simple API that allows you to crawl websites and store the crawled data in a database. It uses GuzzleHttp to send HTTP requests and parses the HTML content to extract links from web pages. The API is built with Laravel framework.
|
|
|
|
## Features
|
|
|
|
- Crawls websites and stores the crawled data in the database.
|
|
- Supports setting the depth of the crawling process.
|
|
- Prevents duplicate URLs from being crawled.
|
|
- Retrieves and saves the HTML content of crawled pages.
|
|
- Extracts valid URLs from the crawled pages.
|
|
|
|
## Prerequisites
|
|
|
|
- PHP >= 7.4
|
|
- Composer
|
|
- Laravel framework
|
|
- MongoDB
|
|
- Docker
|
|
- Docker Compose
|
|
- GuzzleHttp
|
|
- MongoDB PHP driver (extension - mongodb.so)
|
|
- jenssegers/mongodb package
|
|
|
|
## Getting Started
|
|
|
|
1. Clone the repository:
|
|
|
|
```bash
|
|
git clone https://git.dayanhub.com/kfir/rank_exam
|
|
|
|
|
|
## Services
|
|
# server
|
|
|
|
Run the server - php artisan serve
|
|
# MongoDB
|
|
Run the server - php artisan serve
|
|
run mongo - run docker-compose up -d
|
|
migrate - php artisan migrate
|
|
|
|
|
|
## Configuration
|
|
use .env file to set up the database connection
|
|
|
|
|
|
|
|
|
|
## API Endpoints ##
|
|
|
|
# GET /api/crawl:
|
|
Crawls a website and stores the crawled data in the database. Required query parameter: url. Optional query parameter: depth (default: 1).
|
|
Parameters:
|
|
- `url` (required): The URL of the website to crawl.
|
|
- `depth` (optional): The depth of the crawling process (default: 0).
|
|
- `refresh` (optional): If set to 1, the crawler will refresh the results for an existing URL (default: false).
|
|
# GET /api:
|
|
Retrieves all crawled data from the database.
|
|
# DELETE /api/crawl/{id}:
|
|
Deletes a specific crawled data record from the database.
|
|
# DELETE /api/crawl:
|
|
Deletes all crawled data records from the database.
|
|
|