Web_Crawler_API/README.md

56 lines
1.5 KiB
Markdown
Raw Normal View History

2023-05-31 10:17:19 +00:00
# Web Crawler API
2023-05-30 09:56:38 +00:00
2023-05-31 10:17:19 +00:00
The Web Crawler API is a simple API that allows you to crawl websites and store the crawled data in a database. It uses GuzzleHttp to send HTTP requests and parses the HTML content to extract links from web pages. The API is built with Laravel framework.
2023-05-30 09:56:38 +00:00
2023-05-31 10:17:19 +00:00
## Features
2023-05-30 09:56:38 +00:00
2023-05-31 10:17:19 +00:00
- Crawls websites and stores the crawled data in the database.
- Supports setting the depth of the crawling process.
- Prevents duplicate URLs from being crawled.
- Retrieves and saves the HTML content of crawled pages.
- Extracts valid URLs from the crawled pages.
## Prerequisites
- PHP >= 7.4
- Composer
- Laravel framework
- MongoDB
- Docker
- Docker Compose
- GuzzleHttp
- MongoDB PHP driver (extension - mongodb.so)
- jenssegers/mongodb package
## Getting Started
1. Clone the repository:
```bash
2023-05-31 10:30:32 +00:00
git clone https://git.dayanhub.com/kfir/rank_exam
2023-05-31 10:17:19 +00:00
## Services
# server
Run the server - php artisan serve
# MongoDB
Run the server - php artisan serve
run mongo - run docker-compose up -d
migrate - php artisan migrate
## Configuration
2023-05-30 17:19:53 +00:00
use .env file to set up the database connection
2023-05-30 09:56:38 +00:00
2023-05-31 10:17:19 +00:00
## API Endpoints ##
2023-05-31 10:31:01 +00:00
# GET /api/crawl: Crawls a website and stores the crawled data in the database. Required query parameter: url. Optional query parameter: depth (default: 1).
# GET /api: Retrieves all crawled data from the database.
# DELETE /api/crawl/{id}: Deletes a specific crawled data record from the database.
# DELETE /api/crawl: Deletes all crawled data records from the database.
2023-05-31 10:17:19 +00:00