work in progress
This commit is contained in:
parent
a2df1212b1
commit
6e60bb513a
7 changed files with 152 additions and 99 deletions
66
README.md
66
README.md
|
@ -1,66 +1,8 @@
|
||||||
<p align="center"><a href="https://laravel.com" target="_blank"><img src="https://raw.githubusercontent.com/laravel/art/master/logo-lockup/5%20SVG/2%20CMYK/1%20Full%20Color/laravel-logolockup-cmyk-red.svg" width="400" alt="Laravel Logo"></a></p>
|
Run the server - php artisan serve
|
||||||
|
|
||||||
<p align="center">
|
run mongo - run docker-compose up -d
|
||||||
<a href="https://github.com/laravel/framework/actions"><img src="https://github.com/laravel/framework/workflows/tests/badge.svg" alt="Build Status"></a>
|
|
||||||
<a href="https://packagist.org/packages/laravel/framework"><img src="https://img.shields.io/packagist/dt/laravel/framework" alt="Total Downloads"></a>
|
|
||||||
<a href="https://packagist.org/packages/laravel/framework"><img src="https://img.shields.io/packagist/v/laravel/framework" alt="Latest Stable Version"></a>
|
|
||||||
<a href="https://packagist.org/packages/laravel/framework"><img src="https://img.shields.io/packagist/l/laravel/framework" alt="License"></a>
|
|
||||||
</p>
|
|
||||||
|
|
||||||
## About Laravel
|
migrate - php artisan migrate
|
||||||
|
|
||||||
Laravel is a web application framework with expressive, elegant syntax. We believe development must be an enjoyable and creative experience to be truly fulfilling. Laravel takes the pain out of development by easing common tasks used in many web projects, such as:
|
use .env file to set up the database connection
|
||||||
|
|
||||||
- [Simple, fast routing engine](https://laravel.com/docs/routing).
|
|
||||||
- [Powerful dependency injection container](https://laravel.com/docs/container).
|
|
||||||
- Multiple back-ends for [session](https://laravel.com/docs/session) and [cache](https://laravel.com/docs/cache) storage.
|
|
||||||
- Expressive, intuitive [database ORM](https://laravel.com/docs/eloquent).
|
|
||||||
- Database agnostic [schema migrations](https://laravel.com/docs/migrations).
|
|
||||||
- [Robust background job processing](https://laravel.com/docs/queues).
|
|
||||||
- [Real-time event broadcasting](https://laravel.com/docs/broadcasting).
|
|
||||||
|
|
||||||
Laravel is accessible, powerful, and provides tools required for large, robust applications.
|
|
||||||
|
|
||||||
## Learning Laravel
|
|
||||||
|
|
||||||
Laravel has the most extensive and thorough [documentation](https://laravel.com/docs) and video tutorial library of all modern web application frameworks, making it a breeze to get started with the framework.
|
|
||||||
|
|
||||||
You may also try the [Laravel Bootcamp](https://bootcamp.laravel.com), where you will be guided through building a modern Laravel application from scratch.
|
|
||||||
|
|
||||||
If you don't feel like reading, [Laracasts](https://laracasts.com) can help. Laracasts contains over 2000 video tutorials on a range of topics including Laravel, modern PHP, unit testing, and JavaScript. Boost your skills by digging into our comprehensive video library.
|
|
||||||
|
|
||||||
## Laravel Sponsors
|
|
||||||
|
|
||||||
We would like to extend our thanks to the following sponsors for funding Laravel development. If you are interested in becoming a sponsor, please visit the Laravel [Patreon page](https://patreon.com/taylorotwell).
|
|
||||||
|
|
||||||
### Premium Partners
|
|
||||||
|
|
||||||
- **[Vehikl](https://vehikl.com/)**
|
|
||||||
- **[Tighten Co.](https://tighten.co)**
|
|
||||||
- **[Kirschbaum Development Group](https://kirschbaumdevelopment.com)**
|
|
||||||
- **[64 Robots](https://64robots.com)**
|
|
||||||
- **[Cubet Techno Labs](https://cubettech.com)**
|
|
||||||
- **[Cyber-Duck](https://cyber-duck.co.uk)**
|
|
||||||
- **[Many](https://www.many.co.uk)**
|
|
||||||
- **[Webdock, Fast VPS Hosting](https://www.webdock.io/en)**
|
|
||||||
- **[DevSquad](https://devsquad.com)**
|
|
||||||
- **[Curotec](https://www.curotec.com/services/technologies/laravel/)**
|
|
||||||
- **[OP.GG](https://op.gg)**
|
|
||||||
- **[WebReinvent](https://webreinvent.com/?utm_source=laravel&utm_medium=github&utm_campaign=patreon-sponsors)**
|
|
||||||
- **[Lendio](https://lendio.com)**
|
|
||||||
|
|
||||||
## Contributing
|
|
||||||
|
|
||||||
Thank you for considering contributing to the Laravel framework! The contribution guide can be found in the [Laravel documentation](https://laravel.com/docs/contributions).
|
|
||||||
|
|
||||||
## Code of Conduct
|
|
||||||
|
|
||||||
In order to ensure that the Laravel community is welcoming to all, please review and abide by the [Code of Conduct](https://laravel.com/docs/contributions#code-of-conduct).
|
|
||||||
|
|
||||||
## Security Vulnerabilities
|
|
||||||
|
|
||||||
If you discover a security vulnerability within Laravel, please send an e-mail to Taylor Otwell via [taylor@laravel.com](mailto:taylor@laravel.com). All security vulnerabilities will be promptly addressed.
|
|
||||||
|
|
||||||
## License
|
|
||||||
|
|
||||||
The Laravel framework is open-sourced software licensed under the [MIT license](https://opensource.org/licenses/MIT).
|
|
||||||
|
|
|
@ -4,19 +4,66 @@
|
||||||
|
|
||||||
use App\Models\WebCrawl;
|
use App\Models\WebCrawl;
|
||||||
use Illuminate\Http\Request;
|
use Illuminate\Http\Request;
|
||||||
|
use GuzzleHttp\Client;
|
||||||
|
|
||||||
class WebCrawlController extends Controller
|
class WebCrawlController extends Controller
|
||||||
{
|
{
|
||||||
|
|
||||||
|
protected $webCrawl;
|
||||||
/**
|
/**
|
||||||
* Display a listing of the resource.
|
* Display a listing of the resource.
|
||||||
*
|
*
|
||||||
* @return \Illuminate\Http\Response
|
* @return \Illuminate\Http\Response
|
||||||
*/
|
*/
|
||||||
public function index()
|
public function index()
|
||||||
{
|
{
|
||||||
|
$allCrawls = WebCrawl::all();
|
||||||
|
|
||||||
//Return the results in JSON format
|
//Return the results in JSON format
|
||||||
// return response()->json($webCrawl);
|
return response()->json($allCrawls);
|
||||||
|
}
|
||||||
|
|
||||||
|
public function crawlWebsite($url, $depth) {
|
||||||
|
|
||||||
|
|
||||||
|
// // Use GuzzleHttp client to send HTTP requests
|
||||||
|
$client = new Client();
|
||||||
|
$response = $client->get($url);
|
||||||
|
if ($response->getStatusCode() >= 200 && $response->getStatusCode() < 300) {
|
||||||
|
|
||||||
|
$body = $response->getBody()->getContents();
|
||||||
|
// get:
|
||||||
|
// links from the page
|
||||||
|
// full content
|
||||||
|
// depth
|
||||||
|
// url
|
||||||
|
// visitedUrls
|
||||||
|
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
// // Check if the HTTP response is successful (status code 2xx)
|
||||||
|
|
||||||
|
// // Insert a page info the database if the HTTP response satus is successful
|
||||||
|
// $webCrawl = new WebCrawl();
|
||||||
|
// $webCrawl->url = $url;
|
||||||
|
// $webCrawl->content = $response->getBody()->getContents();
|
||||||
|
// $webCrawl->save();
|
||||||
|
// }
|
||||||
|
// Crawl the links on the page
|
||||||
|
|
||||||
|
echo 'Crawling completed!';
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
public function getOne($url)
|
||||||
|
{
|
||||||
|
$webCrawl = WebCrawl::where('url', $url)->first();
|
||||||
|
echo 'here!';die;
|
||||||
|
if ($webCrawl) {
|
||||||
|
return $webCrawl;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@ -24,9 +71,21 @@ public function index()
|
||||||
*
|
*
|
||||||
* @return \Illuminate\Http\Response
|
* @return \Illuminate\Http\Response
|
||||||
*/
|
*/
|
||||||
public function create()
|
private function create($response, $url, $depth, $visitedUrls, $links)
|
||||||
{
|
{
|
||||||
//
|
$webCrawl = new WebCrawl();
|
||||||
|
$webCrawl->url = $url;
|
||||||
|
$webCrawl->content = $response->getBody()->getContents();
|
||||||
|
$webCrawl->depth = $depth;
|
||||||
|
$webCrawl->visited_urls = $visitedUrls;
|
||||||
|
$webCrawl->status_code = $response->getStatusCode();
|
||||||
|
$webCrawl->status = $response->getReasonPhrase();
|
||||||
|
$webCrawl->created_at = $response->getHeader('Date')[0];
|
||||||
|
$webCrawl->updated_at = $response->getHeader('Last-Modified')[0];
|
||||||
|
$webCrawl->links = $links;
|
||||||
|
$webCrawl->save();
|
||||||
|
return $webCrawl;
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@ -80,8 +139,16 @@ public function update(Request $request, WebCrawl $webCrawl)
|
||||||
* @param \App\Models\WebCrawl $webCrawl
|
* @param \App\Models\WebCrawl $webCrawl
|
||||||
* @return \Illuminate\Http\Response
|
* @return \Illuminate\Http\Response
|
||||||
*/
|
*/
|
||||||
public function destroy(WebCrawl $webCrawl)
|
public function destroy($id)
|
||||||
{
|
{
|
||||||
//
|
$webCrawl = WebCrawl::where("_id", $id);
|
||||||
|
echo '<pre>';
|
||||||
|
echo 'fff';
|
||||||
|
print_r($webCrawl);die;
|
||||||
|
if ($webCrawl) {
|
||||||
|
$webCrawl->delete();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
35
app/Providers/CrawlerServiceProvider.php
Normal file
35
app/Providers/CrawlerServiceProvider.php
Normal file
|
@ -0,0 +1,35 @@
|
||||||
|
<?php
|
||||||
|
|
||||||
|
namespace App\Providers;
|
||||||
|
|
||||||
|
use Illuminate\Support\ServiceProvider;
|
||||||
|
|
||||||
|
class CrawlerServiceProvider extends ServiceProvider
|
||||||
|
{
|
||||||
|
/**
|
||||||
|
* Register services.
|
||||||
|
*
|
||||||
|
* @return void
|
||||||
|
*/
|
||||||
|
public function register()
|
||||||
|
{
|
||||||
|
//
|
||||||
|
}
|
||||||
|
|
||||||
|
public function crawlWebsite($url, $depth) {
|
||||||
|
$visitedUrls = [];
|
||||||
|
echo 'HERE!';die;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Bootstrap services.
|
||||||
|
*
|
||||||
|
* @return void
|
||||||
|
*/
|
||||||
|
public function boot()
|
||||||
|
{
|
||||||
|
//
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
}
|
|
@ -13,17 +13,12 @@
|
||||||
*/
|
*/
|
||||||
public function up()
|
public function up()
|
||||||
{
|
{
|
||||||
Schema::create('web_crawls', function (Blueprint $table) {
|
Schema::create('webCrawl', function (Blueprint $table) {
|
||||||
$table->id();
|
$table->id();
|
||||||
$table->string('url');
|
$table->string('url');
|
||||||
$table->string('content');
|
$table->string('content');
|
||||||
$table->string('depth');
|
|
||||||
$table->string('visited_urls');
|
|
||||||
$table->string('status_code');
|
|
||||||
$table->string('status');
|
|
||||||
$table->string('created_at');
|
$table->string('created_at');
|
||||||
$table->string('updated_at');
|
$table->string('updated_at');
|
||||||
$table->string('links');
|
|
||||||
$table->timestamps();
|
$table->timestamps();
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
|
@ -6,12 +6,14 @@ services:
|
||||||
ports:
|
ports:
|
||||||
- 27017:27017
|
- 27017:27017
|
||||||
volumes:
|
volumes:
|
||||||
- mongodb_data:/data/db
|
- data:/data/db
|
||||||
- ./init-scripts/init.js:/docker-entrypoint-initdb.d/mongo-init.js
|
- ./init-scripts/init.js:/docker-entrypoint-initdb.d/mongo-init.js
|
||||||
environment:
|
environment:
|
||||||
- MONGO_INITDB_DATABASE=${DB_DATABASE}
|
- MONGO_INITDB_DATABASE=${DB_DATABASE}
|
||||||
- MONGO_INITDB_ROOT_USERNAME=${MONGO_INITDB_ROOT_USERNAME}
|
- MONGO_INITDB_ROOT_USERNAME=${MONGO_INITDB_ROOT_USERNAME}
|
||||||
- MONGO_INITDB_ROOT_PASSWORD=${MONGO_INITDB_ROOT_PASSWORD}
|
- MONGO_INITDB_ROOT_PASSWORD=${MONGO_INITDB_ROOT_PASSWORD}
|
||||||
platform: linux/arm64/v8
|
platform: linux/arm64/v8
|
||||||
|
expose:
|
||||||
|
- 27017
|
||||||
volumes:
|
volumes:
|
||||||
mongodb_data:
|
data:
|
|
@ -4,12 +4,42 @@
|
||||||
use Illuminate\Support\Facades\Route;
|
use Illuminate\Support\Facades\Route;
|
||||||
use GuzzleHttp\Client;
|
use GuzzleHttp\Client;
|
||||||
use App\Http\Controllers\WebCrawlController;
|
use App\Http\Controllers\WebCrawlController;
|
||||||
|
use GuzzleHttp\Psr7\Response;
|
||||||
|
|
||||||
|
Route::get('/crawl', function (Request $request) {
|
||||||
|
// invode WebCrawlController index method in WebCrawlController
|
||||||
|
$url = $request->input('url');
|
||||||
|
// check if the url is valid URL
|
||||||
|
if (!$url || !filter_var($url, FILTER_VALIDATE_URL)) {
|
||||||
|
return response()->json([
|
||||||
|
'error' => 'Missing required parameter `url`'
|
||||||
|
], 400);
|
||||||
|
}
|
||||||
|
$depth = $request->input('depth', 3); // default depth is 3 if not provided
|
||||||
|
|
||||||
|
$crawlerController = new WebCrawlController();
|
||||||
|
$isAlreadyDone = $crawlerController->getOne($url);
|
||||||
|
if(!$isAlreadyDone){
|
||||||
|
$crawlerController->crawlWebsite($url, $depth);
|
||||||
|
} else {
|
||||||
|
return response()->json([
|
||||||
|
'error' => 'This URL has already been crawled',
|
||||||
|
'data' => $isAlreadyDone
|
||||||
|
], 400);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Route::post('/crawl/{id}', function (String $id, Request $request, Response $response) {
|
||||||
// Route::get('/crawl', function (Request $request) {
|
// $id = $request->input('id');
|
||||||
// // invode WebCrawlController index method
|
// $crawlerController = new WebCrawlController();
|
||||||
|
// if(!$crawlerController->destroy($id)) {
|
||||||
|
// return response()->json([
|
||||||
|
// 'error' => 'Url Not Found',
|
||||||
|
// ], 404);
|
||||||
|
// } else {
|
||||||
|
// return response()->json([
|
||||||
|
// 'success' => 'This URL has been deleted',
|
||||||
|
// ], 200);
|
||||||
|
// }
|
||||||
// });
|
// });
|
||||||
|
|
||||||
Route::get('/crawl', [WebCrawlController::class, 'index']);
|
|
|
@ -1,18 +0,0 @@
|
||||||
<?php
|
|
||||||
|
|
||||||
use Illuminate\Support\Facades\Broadcast;
|
|
||||||
|
|
||||||
/*
|
|
||||||
|--------------------------------------------------------------------------
|
|
||||||
| Broadcast Channels
|
|
||||||
|--------------------------------------------------------------------------
|
|
||||||
|
|
|
||||||
| Here you may register all of the event broadcasting channels that your
|
|
||||||
| application supports. The given channel authorization callbacks are
|
|
||||||
| used to check if an authenticated user can listen to the channel.
|
|
||||||
|
|
|
||||||
*/
|
|
||||||
|
|
||||||
Broadcast::channel('App.Models.User.{id}', function ($user, $id) {
|
|
||||||
return (int) $user->id === (int) $id;
|
|
||||||
});
|
|
Loading…
Reference in a new issue