This project is designed to analyze product data from multiple sources such as Amazon, Best Buy, Google Shopping, and Walmart. It provides insights into the top products based on various metrics including price, ratings, review counts, and sentiment analysis. The tool is aimed at helping users make informed purchasing decisions by identifying the best value products across these platforms.
To run this project, you will need Python installed on your machine along with several packages.
pip install pandas numpy matplotlib textblob
pip install loguru
pip install scrapfly-sdk
pipx install poetry
pip install jmespath
pip install parsel
python -m textblob.download_corpora
To run the program in the UI we designed, execute the main script from the command line in the directory you downloaded. A new window will pop up in the background of your current window:
Follow the instructions prompted to proceed with our program. You can always choose to exit the program when at the main menu.
There will be an option for you to get updated scraped data. Check the box to run the data Python files you want and choose a product you want to see analyzed using the dropdown box, then click the search button right below it. It might take several minutes to run if you choose to download all the data (you can always check back in the terminal to see progress).
All data files will be automatically saved in the same directory as the Python files you are running. This program uses four processed data files from scraped data. Each file contains combined information for five products (iPhone, iPad, MacBook, Nintendo Switch, and PlayStation) from each source (previously extracted files already exist in the directory):
amazon_product_reviews.csv
bestbuy_combined.csv
google_shopping_combined.csv
walmart_products.json
parent_asin
. Customer opinion data is used for sentiment analysis combined with scores of pricing and rating to find which product-review combination has the best-rated quality.load_data()
: Loads data from files stored in the same directory as the script.process_data(data_frames, product_name)
: Processes data to find top products based on specified criteria.process_amazon_data(df)
: Retrieves top three Amazon reviews with product information (processed in amazon_data_process.py
).bestbuy_top_cheapest(df, title)
, google_top_cheapest(df, title)
: Process Best Buy and Google Shopping data to find top three products with best combination scores of ratings, prices, and review counts/rating counts.process_walmart_top_cheapest(df)
: Processes Walmart data to find top three products according to price, rating, and review sentiment.perform_sentiment_analysis(df)
: Analyzes sentiments of product reviews.This project is licensed under the MIT License.