This is a Web Crawler for home assignment

Find a file

Kfir Dayan a9e810b8fb improved - nameing convention and spaces		2023-06-01 08:50:34 +03:00
app	improved - nameing convention and spaces	2023-06-01 08:50:34 +03:00
bootstrap	configuration stuff	2023-05-30 15:10:17 +03:00
config	done with migration file	2023-05-30 17:29:46 +03:00
database	work in progress	2023-05-30 20:19:53 +03:00
init-scripts	configuration stuff	2023-05-30 15:18:48 +03:00
lang/en	first	2023-05-30 12:56:38 +03:00
public	first	2023-05-30 12:56:38 +03:00
resources	first	2023-05-30 12:56:38 +03:00
routes	README.md has added	2023-05-31 13:17:19 +03:00
storage	first	2023-05-30 12:56:38 +03:00
tests	first	2023-05-30 12:56:38 +03:00
.editorconfig	first	2023-05-30 12:56:38 +03:00
.env.example	first	2023-05-30 12:56:38 +03:00
.gitattributes	first	2023-05-30 12:56:38 +03:00
.gitignore	first	2023-05-30 12:56:38 +03:00
artisan	first	2023-05-30 12:56:38 +03:00
composer.json	done with migration file	2023-05-30 17:29:46 +03:00
composer.lock	done with migration file	2023-05-30 17:29:46 +03:00
docker-compose.yaml	fixing bug in mongo by changing the vol	2023-05-31 11:34:41 +03:00
package.json	first	2023-05-30 12:56:38 +03:00
phpunit.xml	first	2023-05-30 12:56:38 +03:00
README.md	improved - nameing convention and spaces	2023-06-01 08:50:34 +03:00
vite.config.js	first	2023-05-30 12:56:38 +03:00

README.md

Web Crawler API

The Web Crawler API is a simple API that allows you to crawl websites and store the crawled data in a database. It uses GuzzleHttp to send HTTP requests and parses the HTML content to extract links from web pages. The API is built with Laravel framework.

Features

Crawls websites and stores the crawled data in the database.
Supports setting the depth of the crawling process.
Prevents duplicate URLs from being crawled.
Retrieves and saves the HTML content of crawled pages.
Extracts valid URLs from the crawled pages.

Prerequisites

PHP >= 7.4
Composer
Laravel framework
MongoDB
Docker
Docker Compose
GuzzleHttp
MongoDB PHP driver (extension - mongodb.so)
jenssegers/mongodb package

Getting Started

Clone the repository:

git clone https://git.dayanhub.com/kfir/rank_exam

Services

server

Run the server - php artisan serve

MongoDB

Run the server - php artisan serve
run mongo - run docker-compose up -d
migrate - php artisan migrate

Configuration

use .env file to set up the database connection

API Endpoints

GET /api/crawl:

Crawls a website and stores the crawled data in the database. Required query parameter: url. Optional query parameter: depth (default: 1).
Parameters:
- `url` (required): The URL of the website to crawl.
- `depth` (optional): The depth of the crawling process (default: 0).
- `refresh` (optional): If set to 1, the crawler will refresh the results for an existing URL (default: false).

GET /api:

Retrieves all crawled data from the database.

DELETE /api/crawl/{id}:

Deletes a specific crawled data record from the database.

DELETE /api/crawl:

Deletes all crawled data records from the database.