Ali Fadel

Machine Learning Engineer II

Amazon

Biography

I am a machine learning engineer at Amazon, working on the search domain. Beyond my professional role, I am passionate about researching natural language processing (NLP) for the Arabic language, such as diacritization, as well as investigating source code-related issues such as author identification and generation. Simultaneously, I am involved in numerous open-source initiatives, aimed at expanding my software engineering knowledge and offering valuable resources for Muslims and Arabic speakers. Additionally, I maintain a modest YouTube presence where I educate viewers on problem-solving, fundamental machine learning concepts, and other subjects through my channel, YAGs.

Interests

Natural Language Processing
Software Engineering
Open-Source Projects
Problem Solving
Content Creation

Education

BSc in Computer Science, 2019
Jordan University of Science and Technology

Experience

Machine Learning Engineer II

Amazon

Oct 2019 – Present Amman, Jordan

Significantly improved the Arabic-to-English machine translation model utilized in search, achieving an increase of ~10 COMET scores by meticulously analyzing data processing procedures and implementing robust filtration processes on the dataset.
Enhanced the Arabic linguistic analysis within the search pipeline by introducing an innovative stemming algorithm and implementing advanced synonym mining techniques to optimize search results.
Designed and developed multiple high-performance data pipelines capable of handling terabytes of data, delivering daily training and inferencing-ready builds for seamless integration and deployment.
Actively contributed to the design and development of various search systems and experiments, successfully supporting production traffic across multiple marketplaces and driving continuous improvement in search performance.

Research Assistant

Jordan University of Science and Technology

May 2019 – May 2021 Irbid, Jordan

Developed expertise in natural language processing (NLP) tasks, including text classification such as Semantic Text Similarity (STS), token labeling like Arabic Text Diacritization (ADT), and sequence-to-sequence challenges like neural machine translation (NMT).
Achieved notable success in machine learning competitions, including 2nd place out of 10 in NSURL Semantic Question Similarity (Arabic), 4th place out of 19 in WANLP MADAR, and 3rd place out of 17 in SemEval ComVE.
Spearheaded the organization of multiple machine learning competitions, such as AI-SOCO at FIRE and ArEnMulti30K at WAT.
Authored numerous research papers presented at various academic conferences.

Machine Learning Engineer Intern

Samsung Electronics

Feb 2019 – May 2019 Amman, Jordan

Tackled text classification challenges by employing machine and deep learning methods, including TF-IDF, SVMs, RNNs, CNNs, and Transformers, for dialect identification in support of a multi-dialect translation system.
Enhanced the dialect identification system’s accuracy by 3% through the introduction of a novel model architecture for the Arabic language, utilizing RNNs and word embeddings.
Assessed various word embedding techniques such as Word2Vec and FastText, and visualized their effectiveness for Arabic words using the t-SNE dimensionality reduction algorithm.
Initiated a noise-cleaning project aimed at automatically removing noise from Bixby audio segments and identifying suitable samples for use as training data by the end of the internship.

Freelance Trainer

Hsoub

Jul 2017 – Feb 2019 Remote

Producing video tutorials for three distinct courses that demonstrate the application of the Ruby on Rails framework in creating real-life projects.
Designing an introductory course covering Ruby on Rails basics and guiding learners in constructing a straightforward Content Management System (CMS) application.
Developing an intermediate course that teaches the use of the Ruby on Rails framework in building a forum similar to HsoubIO (akin to StackOverflow).
Crafting an advanced course focused on scaling the Ruby on Rails framework for the development of large-scale projects, such as Twitter.
Ensuring the availability of these courses through the Hsoub Academy platform.

Accomplishments

Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning

deeplearning.ai Mar 2019

See certificate

Deep Learning Spepcialization

deeplearning.ai Jan 2019

See certificate

Machine Learning

Stanford University Aug 2018

See certificate

Systematic Program Design

The University of British Columbia Feb 2017

See certificate

CS50: Introduction to Computer Science

HarvardX Jan 2017

See certificate

Projects

Tafrigh

A user-friendly tool to convert YouTube videos into text, SRT, or VTT files using OpenAI’s Whisper or Facebook’s Wit.ai.

Taqtie

Intuitive audio and video editing GUI for content creators, enabling easy cutting and merging with a straightforward interface and minimal steps.

Islam200QA

Easy-to-use app offers Muslims a straightforward beliefs book in a Q&A format, featuring a scholar providing detailed explanations for each answer.

Sha3bor

Comprehensive suite of models for generating, diacritizing, and analyzing Arabic poetry using GPT2, BERT, and CANINE transformers. Secured third place in the Arabthon competition.

Shakkelha

Advancing Arabic NLP through a deep learning-based system for automated diacritization of Arabic text in a concise, efficient, and cutting-edge research project.

KONTESTS

Unified web crawler aggregates programming contests from multiple online judges, streamlining and centralizing scheduling in a single platform.

codeforces2pdf

User-friendly tool for effortlessly extracting CodeForces contests and problems into accessible, well-formatted PDF files.

Publications

Quickly discover relevant content by filtering publications.

Ali Fadel, Husam Musleh, Ibraheem Tuffaha, Mahmoud Al-Ayyoub, Yaser Jararweh, Elhadj Benkhelifa, Paolo Rosso

December 2020 FIRE nlp

Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde

Authorship identification is essential to the detection of undesirable deception of others’ content misuse or exposing the owners of some anonymous malicious content. While it is widely studied for natural languages, it is rarely considered for programming languages. Accordingly, a PAN@FIRE task, named Authorship Identification of SOurce COde (AI-SOCO), is proposed with the focus on the identification of source code authors. The dataset consists of crawled source codes submitted by the top 1,000 human users with 100 correct C++ submissions or more from the CodeForces online judge platform. The participating systems are asked to predict the author of a given source code from the predefined list of code authors. In total, 60 teams registered on the task’s CodaLab page. Out of them, 14 teams submitted 94 runs. The results are surprisingly high with many teams and baselines breaking the 90% accuracy barrier. These systems used a wide range of models and techniques from pretrained word embeddings (especially, those that are tweaked to handle source code) to stylometric features.

Ali Fadel, Mahmoud Al-Ayyoub, Erik Cambria

December 2020 SemEval nlp

JUSTers at SemEval-2020 Task 4: Evaluating Transformer Models against Commonsense Validation and Explanation

In this paper, we describe our team’s (JUSTers) effort in the Commonsense Validation and Explanation (ComVE) task, which is part of SemEval2020. We evaluate five pre-trained Transformer-based language models with various sizes against the three proposed subtasks. For the first two subtasks, the best accuracy levels achieved by our models are 92.90% and 92.30%, respectively, placing our team in the 12th and 9th places, respectively. As for the last subtask, our models reach 16.10 BLEU score and 1.94 human evaluation score placing our team in the 5th and 3rd places according to these two metrics, respectively. The latter is only 0.16 away from the 1st place human evaluation score.

Ali Fadel, Ibrahim Tuffaha, Mahmoud Al-Ayyoub, Yaser Jararwch

October 2020 IDSTA nlp

QWERTY Keyboard? }.?BZQ is Better!

In this work, we provide a Genetic-based algorithm that is used to quickly find a placement for a set of objects within a given layout such that access to these objects is optimized. The given layout describes the free locations of the objects and the object handles and the access is done through a corpus of object requests. The proposed algorithm optimizes the placement of the objects by searching through a small fraction of the search space. As a case study, we use the algorithm to find a better placement for the keyboard characters than QWERTY and Dvorak Simplified characters placements. The algorithm finds a placement that is better than both QWERTY and Dvorak Simplified by 32.68% and 15.79% respectively on the training set, and 32.71% and 15.84% respectively on the testing set. This result is achieved after searching through only 500K possible solutions, which is about 1.23 × 10-19percent only of the total search space. Both training and testing sets are extracted randomly from TED2013 v1.1 English corpus. Moreover, we release the dataset, code and experimental results on our GitHub repository.

Ali Fadel, Ibraheem Tuffaha, Mahmoud Al-Ayyoub

December 2019 NSURL nlp

Tha3aroon at NSURL-2019 Task 8: Semantic Question Similarity in Arabic

In this paper, we describe our team’s effort on the semantic text question similarity task of NSURL 2019. Our top performing system utilizes several innovative data augmentation techniques to enlarge the training data. Then, it takes ELMo pre-trained contextual embeddings of the data and feeds them into an ON-LSTM network with self-attention. This results in sequence representation vectors that are used to predict the relation between the question pairs. The model is ranked in the 1st place with 96.499 F1-score (same as the second place F1-score) and the 2nd place with 94.848 F1-score (differs by 1.076 F1-score from the first place) on the public and private leaderboards, respectively.

Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh, Mahmoud Al-Ayyoub

November 2019 WAT nlp

Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF) and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models, which require language-dependent post-processing steps, unlike ours. Moreover, we show that diacritics in Arabic can be used to enhance the models of NLP tasks such as Machine Translation (MT) by proposing the Translation over Diacritization (ToD) approach.

See all publications

Ali Fadel

Machine Learning Engineer II

Biography

Experience

Accomplish­ments

Projects

Publications

Contact

Accomplishments