nlp

Tafrigh

A user-friendly tool to convert YouTube videos into text, SRT, or VTT files using OpenAI's Whisper or Facebook's Wit.ai.

Comprehensive suite of models for generating, diacritizing, and analyzing Arabic poetry using GPT2, BERT, and CANINE transformers. Secured third place in the [Arabthon](https://twitter.com/arabthon/status/1538220050791940102) competition.

Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde

Authorship identification is essential to the detection of undesirable deception of others’ content misuse or exposing the owners of some anonymous malicious content. While it is widely studied for natural languages, it is rarely considered for …

JUSTers at SemEval-2020 Task 4: Evaluating Transformer Models against Commonsense Validation and Explanation

In this paper, we describe our team’s (JUSTers) effort in the Commonsense Validation and Explanation (ComVE) task, which is part of SemEval2020. We evaluate five pre-trained Transformer-based language models with various sizes against the three …

QWERTY Keyboard? }.?BZQ is Better!

In this work, we provide a Genetic-based algorithm that is used to quickly find a placement for a set of objects within a given layout such that access to these objects is optimized. The given layout describes the free locations of the objects and …

Tha3aroon at NSURL-2019 Task 8: Semantic Question Similarity in Arabic

In this paper, we describe our team's effort on the semantic text question similarity task of NSURL 2019. Our top performing system utilizes several innovative data augmentation techniques to enlarge the training data. Then, it takes ELMo pre-trained …

Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several …

Pretrained Ensemble Learning for Fine-Grained Propaganda Detection

In this paper, we describe our team’s effort on the fine-grained propaganda detection on sentence level classification (SLC) task of NLP4IF 2019 workshop co-located with the EMNLP-IJCNLP 2019 conference. Our top performing system results come from …

Team JUST at the MADAR Shared Task on Arabic Fine-Grained Dialect Identification

In this paper, we describe our team’s effort on the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. The task requires building a system capable of differentiating between 25 different Arabic dialects in addition to MSA. Our approach …

Arabic Text Diacritization Using Deep Neural Networks

Diacritization of Arabic text is both an interesting and a challenging problem at the same time with various applications ranging from speech synthesis to helping students learning the Arabic language. Like many other tasks or problems in Arabic …