This is a Web Crawler for home assignment
Find a file
2023-06-01 08:50:34 +03:00
app improved - nameing convention and spaces 2023-06-01 08:50:34 +03:00
bootstrap configuration stuff 2023-05-30 15:10:17 +03:00
config done with migration file 2023-05-30 17:29:46 +03:00
database work in progress 2023-05-30 20:19:53 +03:00
init-scripts configuration stuff 2023-05-30 15:18:48 +03:00
lang/en first 2023-05-30 12:56:38 +03:00
public first 2023-05-30 12:56:38 +03:00
resources first 2023-05-30 12:56:38 +03:00
routes README.md has added 2023-05-31 13:17:19 +03:00
storage first 2023-05-30 12:56:38 +03:00
tests first 2023-05-30 12:56:38 +03:00
.editorconfig first 2023-05-30 12:56:38 +03:00
.env.example first 2023-05-30 12:56:38 +03:00
.gitattributes first 2023-05-30 12:56:38 +03:00
.gitignore first 2023-05-30 12:56:38 +03:00
artisan first 2023-05-30 12:56:38 +03:00
composer.json done with migration file 2023-05-30 17:29:46 +03:00
composer.lock done with migration file 2023-05-30 17:29:46 +03:00
docker-compose.yaml fixing bug in mongo by changing the vol 2023-05-31 11:34:41 +03:00
package.json first 2023-05-30 12:56:38 +03:00
phpunit.xml first 2023-05-30 12:56:38 +03:00
README.md improved - nameing convention and spaces 2023-06-01 08:50:34 +03:00
vite.config.js first 2023-05-30 12:56:38 +03:00

Web Crawler API

The Web Crawler API is a simple API that allows you to crawl websites and store the crawled data in a database. It uses GuzzleHttp to send HTTP requests and parses the HTML content to extract links from web pages. The API is built with Laravel framework.

Features

  • Crawls websites and stores the crawled data in the database.
  • Supports setting the depth of the crawling process.
  • Prevents duplicate URLs from being crawled.
  • Retrieves and saves the HTML content of crawled pages.
  • Extracts valid URLs from the crawled pages.

Prerequisites

  • PHP >= 7.4
  • Composer
  • Laravel framework
  • MongoDB
  • Docker
  • Docker Compose
  • GuzzleHttp
  • MongoDB PHP driver (extension - mongodb.so)
  • jenssegers/mongodb package

Getting Started

  1. Clone the repository:

    git clone https://git.dayanhub.com/kfir/rank_exam
    
    
    

Services

server

Run the server - php artisan serve

MongoDB

Run the server - php artisan serve
run mongo - run docker-compose up -d
migrate - php artisan migrate

Configuration

use .env file to set up the database connection

API Endpoints

GET /api/crawl:

Crawls a website and stores the crawled data in the database. Required query parameter: url. Optional query parameter: depth (default: 1).
Parameters:
- `url` (required): The URL of the website to crawl.
- `depth` (optional): The depth of the crawling process (default: 0).
- `refresh` (optional): If set to 1, the crawler will refresh the results for an existing URL (default: false).

GET /api:

Retrieves all crawled data from the database.

DELETE /api/crawl/{id}:

Deletes a specific crawled data record from the database.

DELETE /api/crawl:

Deletes all crawled data records from the database.