This repository is dedicated to the Reviews Analysis model.
The current model shows several insights about the review:
- Decision whether review is fake or not
- Review polarity
- List of Keywords (in the model it's called tags)
- Review summarization. That would decrease amount up to 2 times of sentences without meaning loss
The dataset and approach that we've used is based on studies of University of Illinois Chicago (UIC). This model includes review content only signals of reviews' fakeness.
Briefly explaining, there're several main points to consider:
- Lexical features such as word n-grams, part-of-speech n-grams, and other lexical attributes
- Content and style similarity of reviews from different reviewers
- Semantic inconsistency (we have never used this kind of features). For example, a reviewer wrote "My wife and I bought this car ..." in one review and then in another review he/she wrote "My husband really love ..." (I heard this example from a friend in a company which actively detects fake reviews).
The dataset we've used is a dataset of Amazon's products and their reviews with the mark if it's fake or no. It was provided by Liu, Bing (firstname.lastname@example.org) for personal research purpose. The dataset is not included for privacy reasons upon the dataset's owner request.