Association for Computational Linguistics. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 35–43, Valencia, Spain. Classifying Illegal Activities on Tor Network Based on Web Textual Contents. Anthology ID: E17-1004 Volume: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers Month: April Year: 2017 Address: Valencia, Spain Venue: EACL SIG: Publisher: Association for Computational Linguistics Note: Pages: 35–43 Language: URL: DOI: Bibkey: al-nabki-etal-2017-classifying Cite (ACL): Mhd Wesam Al Nabki, Eduardo Fidalgo, Enrique Alegre, and Ivan de Paz. The good performance of the classifier might support potential tools to help the authorities in the detection of these activities. We found that the combination of TFIDF words representation with Logistic Regression classifier achieves 96.6% of 10 folds cross-validation accuracy and a macro F1 score of 93.7% when classifying a subset of illegal activities from DUTA. We also fixed the pipeline elements and identified the aspects that have a critical influence on the classification results. Using DUTA, we conducted a comparison between two well-known text representation techniques crossed by three different supervised classifiers to categorize the Tor hidden services. We built DUTA by sampling the Tor network during two months and manually labeled each address into 26 classes. In this paper, we present and make publicly available a new dataset for Darknet active domains, which we call ”Darknet Usage Text Addresses” (DUTA). Abstract The freedom of the Deep Web offers a safe place where people can express themselves anonymously but they also can conduct illegal activities.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |