56k Czech movie reviews were collected using the /data_preparation/data_collector_movie_review_scraper.py
multithreaded HTML scraping module. These reviews were scrubbed using
langdetect module to remove reviews written in Slovak language. This dataset was also scrubbed against a collection of Czech stopwords. To have the data balanced with the same amount of negative and positive reviews, the
final dataset had to be reduced to 11.5k positive and 11.5k negative reviews. Collected data was also stemmed before training the models.
Scikit-Learn Python library,
Logistic regression and
Support Vector Machine ML models were used
for training and testing data for text sentiment analysis.
The scripts for training and testing are located here:
The overall sentiment score for the specified text input is calculated as a weighted average based on the precision score accuracy of these 3 model predictions.
The Flask web application is currently hosted at http://czester.herokuapp.com, source code can be found in this location /flask_webapp/.
This application backend is written in Python using the
Flask framework and
Bootstrap for the templates styling. This app also provides the users with a simple API. The stats module is a result of an integration between
Flask where the statistics data persistence layer can be either
If you provide this app with a environment variable named
DATABASE_URL containing the Heroku Postgres DB URL like
postgres://YourPostgresUrl, then remote
Heroku Postgres will be used, otherwise local
Sqlite3 db instance will be used.
1) create and activate a standard Python virtual or pipenv environment
pip3 install the requirements from
3) set the working directory for instance to the path where you cloned this repo (Make sure it's the path where the Heroku
Procfile file is located)