Arabic Resources

Arabic NLP Resources for the Arabic WordNet Project

William BLACK,Sabri ELKATEB

School of
Informatics

University of
Manchester

Sackville Street,
Manchester, M60 1QD,

w.black@manchester.ac.uk,

sabri.elkateb@manchester.ac.uk

Manuel BERTRAN,
Xavier FARRERES,
David FARWELL,
Reda HALKOUM,
Horacio RODRIGUEZ

Politechnical University of
Catalonia,

horacio@lsi.upc.edu,

mbertran@lsi.upc.edu

Musa ALKHALIFA,

Tànit ASSAF,

M.A. MARTI,

University of Barcelona

Gran Via 585, 08007-Barcelona

{musa,tanit}@thera-clic.com,

amarti@ub.edu

Piek VOSSENIrion
Technologies

Delftechpark 26,
2628XH,

Delft, The
Netherlands

piek.vossen@irion.nl

Adam PEASE

Articulate Software Inc,

420 College Ave

Angwin, CA 94508

apease@articulatesoftware.com

Christiane
FELLBAUM

Princeton University,

Department of Psychology,

Green
Hall, Princeton,
NJ 08544

fellbaum@clarity.princeton.edu

Table
of Contents

 


1. INTRODUCTION

This
report is intended to be a guide to resources (both linguistic data
and linguistic processors and tools) that have been used (or at least
tried) or simply considered for use during the development of AWN.

Our
intention is to maintain an evolving document, for the duration of
the project, where new resources and new comments or assessments on
previous items could be added on the fly. Thus, this initial version
0 will be followed (we hope) by other increasingly useful versions.

The
report is not intended to be a complete survey of Arabic NLP
resources and tools. We have focused on resources related to the
needs of AWN and on free resources.

For
more in depth information on Arabic NLP resources, besides the
content of this report and the links included in it, the following
references could be useful:

  • CADIM
    (Columbia Arabic Dialect Modeling):

http://www.ccls.columbia.edu/cadim/links.html

  • NEMLAR
    (Network for Euro-Mediterranean LAnguage Resources):

http://www.nemlar/org/

  • Linguistlist,
    Resources from Linguistlist on Arabic:

http://cf.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/search/search-all-

res2.cfm?res=All&AppLanguageId=43&search1=search1

Non
Arabic-specific resource repositories (but including valuable Arabic
resources and tools) can be found in:

  • LDC
    (Linguistic Data Consortium):


Arabic Gigaword

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T12


Arabic Gigaword Second Edition:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02

  • ELSNET
    (European Network of Excellence in Human Language Technologies):

http://www.elsnet.org/

 

  • ELDA
    (Evaluation and Language resources Distribution):


An-Nahar Newspaper Text Corpus

http://www.elda.org/catalogue/en/text/W0027.html


DixAF (Bilingual Dictionary French Arabic, Arabic French)


http://www.elda.org/catalogue/en/text/M0040.html

– Arabic
Data Set

http://www.elda.org/catalogue/en/text/W0030.html

– “Le
Monde Diplomatique” Text corpus in Arabic

http://www.elda.org/catalogue/en/text/W0030.html

  • SIGLEX
    http://www.siglex.org/
    a
    Special Interest Group on the Lexicon of ACL (Association for
    Computational Linguistics)

Additional
useful information and useful links can be found on the Web pages of
a number of people or institutions:

http://www-nlp.stanford.edu/links/statnlp.html.
An annotated list of resources for Satistical NLP and corpus based
Computational linguistics

http://www.siglex.org/:
A Special Interest Group on the Lexicon of ACL (Association for
Computational Linguistics)


http://sites.univ-lyon2.fr/langues_promodiinar/Accueil.htm


http://elsap1.unicaen.fr/

 

A
useful recent survey (very extensive but mainly focussing on
commercial products) is:


Mahtab Nikkhou, Khalid Choukri (2005), Survey of Arabic Language
Resources and Tools in the Mediterranean Countries
. Nemlar
Report, March 2005

For
the sake of completeness a slightly commented bibliography of Arabic
NLP is included.


2. OPEN DOMAIN LEXICAL RESOURCES

 

2.1
Arabic Monolingual Corpora.

 

  • A
    reference corpus of Arabic to be used for estimating the relevance
    of terms and roots (we say “reference” only as regard our
    project; we do not mean that the required corpus should be a true
    reference corpus, which for Arabic is probably nonexistent).


From LDC several corpora are available


This corpus must be pre-processed in order to use it for probability
estimation. To this end normalization and light stemming should be
sufficient (see available tools for this purpose below).


Nijmegen Corpus:
http://www.let.kun.nl/wba/Content2/1.4.5_Nijmegen_Corpus.htm


News articles


Arabic Corpus

2.2
Arabic/English/… Parallel Corpora.

 

  • UN
    Bidirectional Multilingual (English, French, Arabic, Russian,
    Chinese)

http://157.150.97.21/dgaacs/unterm.nsf

http://lib-thesaurus.un.org/LIB/DHLUNBISThesaurus.nsf/$$searcha?OpenForm

 

  • ArabiCorpus

http://arabiCorpus.byu.edu

Is being designed to
allow students and scholars to search large untagged Arabic corpora
for words and structures. It provides information on word frequency,
citations giving 10 words before and 10 words after, and information
on collocates of the word in question’.

  • Aljazirah

http://english.aljazeera.net/HomePage

  • The
    Algerian Press Agency

www.aps.dz

New
webpage with intersting parallel articles.

  • UNESCO

http://termweb.unesco.org/Default.asp?admin=1&internet=1

  • FAO

http://www.fao.org

 

  • ALESCO
    Turjman Online (with ontology)

http://www.arabization.org.ma/dictionnaire.asp

 

  • MICROSOFT

ftp://ftp.microsoft.com/developr/msdn/newup/Glossary
27
Mb of Computer Science Glossary

 

  • Hebrew-Arabic-English
    from the Agava Institute


Environmental Terms


A Trilingual Glossary by Yaron Batit.


  • EGYPT
    Gizza Toolkit Quran Parallel corpus ( En-Ar)

http://www.clsp.jhu.edu/ws99/projects/mt/

http://enlil.ff.cuni.cz/veda/projekty/clara.htm

2.3
Arabic Monolingual Dictionaries and Lexicons

 

 

  • List
    of dictionaries (in Arabic) from Dar El Ilm includes monolingual,
    bilingual and multilingual dictionaries:

http://www.malayin.com/laut.asp?catid=2

  • The
    best Arabic Arabic Dictionaries from
    Lisan
    Al-Arab

    ($150 ) to Taj al-Arus ($265)


http://fadakbooks.com/ardia.html

  • We
    are putting on line the copious dictionary of Ibn Mandhur (8
    volumes). For having an idea about the quality of its content look
    here:


http://dictionary.sakhr.com/


http://qamoos.sakhr.com/intro/introles.asp?lex_id=6
( In Arabic)

  • At
    the moment we have finshed the letter Kaf which consists in 412
    masdars (roots) and almost 400 pages in Word format. It is
    available on line at:


http://www.lsi.upc.es/~halkoum/aralisan.php

  • TANMART:



Vegetables
http://www.tammar.4t.com/vegta.htm



Plants
http://www.tammar.4t.com/herb.htm




Fruits
http://www.tammar.4t.com/fruit.htm

  • Traditional
    Medecine


http://www.khayma.com/roqia/nabaway.HTM
(5 old books online)


2.4
Arabic/English Bilingual dictionaries and lexicons

 

2.4.1
Printed bilingual dictionaries

 

From the large
quantity of dictionaries that are available, the most relevant
sources for this section are:

 

  • List
    of dictionaries (in Arabic) from Dar El Ilm includes monolingual,
    bilingual and multilingual dictionaries:

http://www.malayin.com/laut.asp?catid=2

The
list includes the popular Al Mawrid English-Arabic Dictionary and Al
Mawrid Arabic-English Dictionary (printed version with CD-ROM).

  • Jean-Jacques
    Schmidt, Dictionnaire français-arabe, arabe-français.
    Publishing House Dauphin, 1998. This
    book was the work reference for many French and Canadian
    researchers. It is an old dictionary and it has more than 1 edition.

http://www.bibliomonde.com/pages/fiche-auteur.php3?id_auteur=1508


Dictionnaire Larousse Saturne arabe – français / français
– arabe
. Publishing House Larousse: 150,000 words
and phrases

  • Van
    Mol, M and Berghman, K. (2001), Leerwoordenboek Nederlands Modern
    Arabisch

The
Dutch Language Union, Amsterdam

MSA Dutch dictionary
is based on a corpus of 3,000,000 words. Mark Van Mol has compiled
the lexical data base and he may have an electronic version of this
dictionary. He is the director of the Leuven Group (Belgium) and he
has many publications in ANLP.

 

http://www.kuleuven.ac.be/ilt/arabic/index_en.htm

http://mark.vanmol
at ilt.kuleuven.ac.be

  • Hans
    Wehr, Arabic-English Dictionary: The Hans Wehr Dictionary of
    Modern Written Arabic

 

This dictionary is
one of the most important for many, perhaps the only one in use for
many years and it has been quoted by numerous English language
authors

It can be found at:

 

www.spokenlanguage.com
or

http://www.amazon.com/gp/reader/0879500034/ref=sib_dp_pt/103-9733622-2675046

We consider this
dictionary to be necessary given for our needs.

  • The
    Nijmegen Arabic/Dutch Dictionary

www.let.kun.nl

 

This
dictionary must be considered to be very important and useful.

2.4.2.
On-line MRD (Machine Readable Dictionaries)

 

In this section we
deal with a large quantity of information that is continuously
changing and being updated.

 

  • Basic
    Arabic Spanich Dictionary, Consejería de Educación y
    Ciencia Castilla-La Mancha

http://www.jccm.es/educacion/atenc_div/diccionario_arabe/

 

  • Edward William
    Lane’s Arabic-English Dictionary:

 

www.StudyQuran.co.uk

 

A
complete version is available for free on line (even though
www.aramedia.com

is selling it for over than $450??).

 

This is the largest
lexicon available comprising 8 volumes (about 3200 pages). The
dictionary’s author spent over 30 years on compiling it.

 

As
with any Arabic dictionary it is organized by roots, and it is also
available on line.

 

Although this
dictionary was expected to be finished months ago, it was only
available as of December 2005. Because they expected heavy online
traffic they announced that the links sometimes would not be working
properly.

 

  • CRL New Mexico
    Arabic English Dictionary

 

It contains 122,920
entries in XML format including Arabic proper names and it is
organized as follows:

Arabic
word # Part of Speech # English word

 

  • Links
    to several resources in Arabic:

http://crl.nmsu.edu/~ahmed/

 

http://crl.nmsu.edu/Resources/lang_res/arabic.html

 

 

  • Mesiti Dictionary

www.mesiti.it/arabic/search_dict.asp

 

This dictionary it
has been created by a group of teachers from Italy. It allows the
user to introduce Arabic or English words although the search is done
using roots. Plurals and feminine forms of adjectives and nouns are
also provided when necessary using a manify function (Javascript).

 

  • English-Arabic
    dictionary from Germany

www.arabsun.de

 

It also allows the
user to introduce English and Arabic words using an Arabic keyboard.

 

  • Basic
    illustrated dictionary Arabic-Spanish:

http://www.jccm.es/educacion/atenc_div/diccionario_arabe

  • (Flash).
    Child illustrated dictionary Arabic-Catalan:

http://www.edu365.com/agora/dic/catala_arab/

  • Arabic(Algerian)-English
    Lexicon.
  • Transcription only.
  • word
    (Arabic) – explanation (English)

 

 

  • Multilingual
    dictionary. (Mono/directional $590. Bidirectional $990).
  • Dictionaries exist
    in two versions:
  • Stand
    alone version
  • Client-server
    version
  • There
    are 4 types of dictionaries:

 

  • General
    dictionary contains approximately 300,000 words and phrasal verbs
    in common usage.
  • Specific
    dictionary contains words used by specialists and experts in a
    selected subject area.
  • Idioms
    dictionary contains fixed expressions and phrasal verbs.
  • User
    dictionary contains words added or updated by the user.
  • Main
    features :
  • Able
    to find all inflected word forms
  • Search
    of phrasal verbs
  • Suitable
    for accessing Internet pages
  • Easily
    integrated with multimedia applications
  • Multilingual
    user interface (English, French, Arabic…)

 

– Bidirectional
dictionary. ($49.9). Free sample.


English <-> Arabic lexicon ($159).

  • English/Arabic/Spanish
    dictionary from UB.
  • 83,000
    entries.
  • It
    appears to be extracted from the following source
  • English-Arabic
    Word list (able to be seen with Explorer (windows)
  • Structure: <Arabic
    word> <English word>
  • There
    are several explicit variants of the same word (ex: access,
    accessibility, accessing…)

– French <->Arabic
dictionary (800 words).

  • Dutch
    <-> Arabic dictionary.
  • Information about
    the compilation of the dictionary.

 

  • The webpage
    http://www.arabsun.de/dictionary.php

    includes an English-Arabic (and German) word pairs and is very
    useful. If you search for a letter or a word you will get a long
    list of words and phrases containing that letter/word in English and
    Arabic in different domains. Its content can be extracted.
  • IT
    Dictionary (AR<->EN)
    :
    This includes an alphabetical list in English and Arabic of around
    1613 terms with a special field for gloss/explanation of the
    terms.
  • QaMoose
    Dictionary (AR<->EN): You can download a recent stable version
    QaMoose
    v-2.1
    from
    Arabeyes.org, the Arabic Unix Project.
  • Sakhr
    English-Arabic dictionary:

http://qamoos.sakhr.com/intro/mgz01.asp

This has information about the
Sakhr English-Arabic dictionary and useful information on Arabic
grammar and Arabic language technology in general.

  • UN

http://lib-thesaurus.un.org/LIB/DHLUNBISThesaurus.nsf/$$searcha?OpenForm

 

2.5
Lexicons obtained from (selective) access to online MT systems:

 

  • <PLANG=”da-DK” ALIGN=JUSTIFY >Almisbar
      1. (online
        MT)
        http://www.almisbar.com/salam_trans.html

<PLANG=”da-DK” ALIGN=JUSTIFY >

  • Tarjim
  1. http://tarjim.ajeeb.com/ajeeb/elogin_ET.asp
    (registration required)

 

  • Ajeeb
  1. http://ajeeb.sakhr.com/
        1. [
          only links to different sites related to Sakhr and its
          Arabic solutions, like Tarjim, Johaina (news), Siraj (text
          mining), etc.]

http://www.sakhr.com/Sakhr_e/Products/Idrisi.htm?Index=2&Main=Products&Sub=Idrisi
]

  • Use
    this link with a robot for example and change the last word to
    maintain the free access. For example, for looking for the
    translation of “car”


http://qamoos.sakhr.com/idrisidic_1.asp?Sentence=car

2.6
Stopwords

Some
papers and tools related in a way to Arabic stopwords:

  • The
    Lemur Toolkit for Language Modeling and Information Retrieval

http://www.lemurproject.org/

The
toolkit supports indexing of large-scale text databases, the
construction of simple language models for documents, queries, or
subcollections, and the implementation of retrieval systems based on
language models as well as a variety of other retrieval models. The
system is written in the C and C++ languages, and is designed as a
research system to run under Unix operating systems, although it can
also run under Windows’.

 

 

 

2.7
Gazetteers

  • World
    Map (AR-EN)
    : In addition to
    the World Map in Arabic and English, this website has list of Arabic
    names, important figures and places (useful for named entities)
  • http://www.geonames.de/indcou.html:
    This is a list of countries rivers, mountains, oceans, seas,
    rivers, international organizations, languages, days, months,
    seasons, religion, wonders, etc. in many languages including English
    and Arabic. It is very useful for NER (Named Entity Recognition).

2.8
Online newspapers

Title

Web page

Country

El Khabar

http://www.ech-chaab.com/ Algeria
El Moudjahid http://www.elkhabar.com/accueil/ Algeria
Ech-Chaab http://www.elmoudjahid-dz.com/ Algeria
El Rai http://www.elrai.com/ Algeria
El Watan http://www.elwatan.com/ Algeria
El Youm http://www.el-youm.com/ Algeria

Al Moharer http://al-moharer.freeservers.com/ Australia
Al Ayam http://www.alayam.com/ Bahrain
Al Ahram http://www.ahram.org.eg/

Egypt

Al Shaab

http://www.alshaab.com/

Egypt

Ashar
Al-Awsat
http://www.asharqalawsat.com/

England

Al Ahali

http://www.ahali-iraq.com/ Iraq
Al-Efyaa

Iraq

Alquds Daily Newspaper

http://www.alquds.com/ Israel
Arabynet (Yediot Achronot) www.arabynet.com Israel
Ad Dustour http://www.addustour.com/Default/Default.asp Jordan
Al Arab Al Yawm http://www.alarabalyawm.net/ Jordan
Al Rai http://www.alrai.com/ Jordan
Albawa http://www.albawa.com/ Jordan
Al Qabas http://www.alqabas.com.kw/ Kuwait
Al Rai Al Aam http://www.alqabas.com.kw/

Kuwait

Al Seyassah

http://www.alarabalyawm.net/

Kuwait

Al Watan http://www.alwatan.com.kw/

Kuwait

Al Anwar

http://www.alanwar.com/ar/ Lebanon
Al Liwaa http://www.aliwaa.com/ Lebanon
Al Mustaqbal http://www.almustaqbal.com/ Lebanon
Annahar http://www.annaharonline.com/

Lebanon

An-Nahar

http://www.annahar.com.lb/

Lebanon

As-Safir

http://assafir.com/iso/today/front/summary.html Lebanon
Al Fair al Jadid http://www.alfajraljadeed.com Libya
Al Jamahiriyah http://www.aljamahiria.com/ Libya
Al Shames http://www.alshames.com/

Libya

Libyan Press

http://www.libyanpress.com/ Libya
Al Watan http://www.alwatan.com/

Oman

Oman Arabia Daily

http://www.omandaily.com/ Oman
Palestine
Al Ayyam Palestine
Al hayat Al Jadedah http://www.alhayat-j.com/ Palestine
Al Manar http://www.manar.com/ Palestine
Al Quds http://www.alquds.com/ Palestine
Al Sabar http://www.hanitzotz.com/alsabar/ Palestine
Fasl Al-Maqal http://www.fasl-almaqal.com/ Qatar
Al Watan http://www.al-watan.com/ Qatar
Al-Sharq http://www.al-sharq.com/site/topics/index.asp?cu_no=1&temp_type=44 Qatar
Raya http://www.al-sharq.com/site/topics/index.asp?cu_no=1&temp_type=44 Qatar
Akhbar http://www.elakhbar.org.eg/ Saudi Arabia
Al Hayat http://www.alhayat.com/ Saudi Arabia
Al Itidal http://www.ynh.com/al-itidal/ Saudi Arabia
Al Jazirah http://www.ynh.com/al-itidal/ Saudi Arabia
Al Madinah http://www.almadinah.com/ Saudi Arabia
Al Riyadth Saudi Arabia
Al-Aalam Al-Islami http://www.muslimworldleague.org/paper/1875/index.htm

Saudi Arabia

Asharq Al-Awsat

http://www.asharqalawsat.com/

Saudi Arabia

Okaz

http://www.okaz.com.sa/okaz/

Adaraweesh

http://members.tripod.com/~adaraweesh/ Sudan
Al Rayaam http://www.rayaam.net/ Sudan
Alosbua Sudanese Daily http://alosbua.com/alosbua/ Sudan
Alray Alaa’m http://www.rayaam.net/ Sudan
Alwaan http://www.qatartop.com/ Sudan
Al Baath http://www.albaath.news.sy/epublisher/user/ Syria
Al Bawaba http://www.albawaba.com/ar/countries/Syria/ Syria
Al Furat http://furat.alwehda.gov.sy/ Syria

 

Al Jamahir http://jamahir.alwehda.gov.sy/ Syria
Al Maukef Al Riadi http://riadi.alwehda.gov.sy/_View_news2.asp?FileName=2748410720060206235922 Syria
Al Ouruba http://ouruba.alwehda.gov.sy/ Syria
Al Thawra http://www.thawra.com/ Syria
Al Wahda http://www.thawra.com/data/wehda/ Syria
Teshreen http://tishreen.info/ Syria
Assabah Tunisia
Essahafa http://www.tunisie.com/LaPresse/ Tunisia
Akbar Al Arab http://www.akhbaralarab.co.ae/

United Arab Emirates

Al Bayan http://www.albayan.ae/servlet/Satellite?pagename=Bayan/Page/BayanPage&c=Page&cid=1039065325549

United Arab Emirates

Al Ittihad http://www.alittihad.co.ae/

United Arab Emirates

Al Khaleej http://www.alkhaleej.co.ae/

United Arab Emirates

26 September

http://www.26september.com/ Yemen
Al Mathak http://www.gpc.org.ye/mathak.htm Yemen
Al Thagafiah http://www.y.net.ye/althaqafiah/ Yemen
Al-Ayyam http://www.al-ayyam.info/ Yemen
Al-Gumhuryah http://www.y.net.ye/al-gumhuryah/ Yemen
Al-Sahwa http://www.alsahwa-yemen.net/

Yemen

Al-Shoura

http://www.y.net.ye/shoura/

Yemen

Al-Thawrah

http://www.althawra.gov.ye/

Yemen

Al-Wahdawi

http://www.alwahdawi.net/

Yemen

Attariq

Yemen

Naba Al Hakekah http://www.y.net.ye/naba/ Yemen
Ray http://www.ray-yem.com/

Yemen

2.9
On line Press Agencies

  • Algérie
    Presse Service (ALGERIA)

www.aps.dz
//
Fr
En Ar

  • Agence
    France Press (FRANCE)

http://www.afp.fr/arabic/home/
//Fr En Ar Sp

  • Maghreb
    Press Agency (MAROC)

http://www.map.ma/ar
//Fr En Ar Sp

  • Xinhuanet
    Agency (CHINA)

http://www.arabic.xinhuanet.com/arabic/index.htm
//En Ar

  • BBC
    (UK)

http://news.bbc.co.uk/hi/arabic/news/
//En Ar

The
agencies quoted do not lay out s paralle articles except the Chinese
and Algerian agency.

2.10
List of verbs

 

  • A
    database of 955 Arabic verbs containing their full conjugations is
    available from LOGOS.

http://www.verba.org/verbi_utf8/all_verbs_index_ar.html

The
database includes for each verb the vowelized forms of:

      • stem
      • verbal
        noun
      • full
        conjugation in active and passive voice and perfect and imperfect
        tenses

imperative,
conditional, jussive

620
verbs

2.11
List of roots

A
list of triliteral and quadriliteral roots organized in Arabic
alphabetical order compiled by Tim Buckwalter but not available in
his webpage
www.qamus.org

We
found it at:
www.angelfire.com/tx4/lisan/roots1.htm

  • Openburhan:


http://www.openburhan.com/ob_main_frame.html

We
will use this list for generating automatically a corpus. Instead of
extracting the root of the word, we make the opposite step from the
root and the various forms of patterns, then reconsitue a lexicon.

2.12
Electronic Books

Here
we can find a vast and copious collection of free Arabic books.

http://www.almeshkat.net/books/index.php
(
2282
books)

http://www.al-eman.com/Islamlib/

http://tafsir.org/books/menu.php?action=new


3. DOMAIN RESTRICTED LEXICAL RESOURCES

 

UNTERM
United Nations Terminology Database (AR-EN-ES-FR-RU-ZH)

:

This has
70,000 entries in 6 Official Languages and its content can be
extracted because the queries result in long lists of words in
English and Arabic. It covers over 80 different domains:

COUNTRY
NAME, AIDS, agriculture, atmospheric science, biodiversity,
bioscience, budget and management, cartography and geography, child
welfare, climate change, codes and regulations, communication, core
concept, culture, declarations, demographics, development,
disarmament, disasters, discrimination, documents, economics,
education and training, energy, environment, export controls and
sanctions, finance, fisheries, food, forestry, functional and other
titles, geoscience, governance, Greek, habitat, health and medicine,
human rights, humanitarian issues, indigenous peoples, information
technology, intellectual property, international law, international
relations, international trade, labour, landmines and mine action,
Latin, law enforcement, law of the sea, logistics and supplies,
meetings, migrations and refugees, military abbreviations, military
issues, multilateral instruments, narcotic drugs, national law,
natural resources, nuclear science, oceanography, organizational
structure, peace and security, peace operations, plans of action and
initiatives, political life, poverty, religions, science and
technology, set phrases, small arms, social issues, space, staff
matters, statistics, TALOS, terrorism, transport and communications,
water, weapons of mass destruction, women.

UNESCOTERM
Search (AR-DE-EN-ES-FR-RU-ZH)

:

This can be used as reference
and its content can be extracted. It includes terms related to UNESCO
such as administrative and financial terms, education, conferences
and meetings, etc).

UNESCO Structures,
Superseded UNESCO Structures, Institutions: (IGOs, NGOs, etworks,
Systems, Foundations), IOC: Titles, Terms and Acronyms ,
Administrative and Financial Terms , International (Days, Weeks,
Years and Decades), Campaigns and Appeals, UNESCO’s Member States,
UNESCO’s Standard-Setting Instruments, International Prizes,
(Non-Member States, Non-Self-governing Territories, Dependent
Territories etc.), UNESCO Chairs, Miscellaneous, UN and International
Legal Instruments, UNESCO Functions and Titles, (Conferences,
Meetings etc.), Terms in the field of Education, (UNESCO’s
Programmes, Projects, Initiatives), (International Programmes,
Projects, Initiatives), Former Institutions: (IGOs, NGOs, Networks,
Systems, Foundations)

3.1
Medical domain

This has the
Unified Medical Dictionary (UMD) from the World Health Organization
along with its specialized UMD dictionaries which cover more than
70 domains. Entries are arranged by alphabetical order in every
domain and one can see all the English entries with their Arabic
equivalents page by page. All medical terms were approved by the
Arab Academies in Cairo, Damascus, Baghdad and Amman. They also made
sure that the Arabic terms were selected carefully in accordance with
a very strict, clear, simplified and user-friendly methodology. An
electronic version of this edition is available on CD-ROM in a
Windows environment, and comprises about 150 000 terms.

A copy of this
CD-ROM is available from khayat@emro.who.int

The domains include (numbers
refers to number of entries of the domains sampled):

All specialised UMD dictionaries:

Abbreviations
(799 entries), Acidology
(1669), Acronyms
(248),
Anatomy
(2000),
Anesthesiology
(484),
Anthropology
and
anthropometrics
(1427),
Bacteriology
(1827),
Biochemistry
and Chemistry
(2000),
Biology,
Biomedical
engineering
,
Biomedical
ethics
,
Biostatistics,
Blood
transfusion medicine
,
Botany,
Cardiology
and cadiovascular surgery
,
Cell
biology
, Demography,
Dentistry(2000),
Dermatology,
Diagnostics(symptoms&signs),
Embryology
& teratology
,
Emergency
medicine
,
Endocrinology
& metabolism
,
Entomology,
Environmental
health
, Enzymology
and Zymology
, Family
and community medicine
,
Food
safety
, Forensic
medicine
,
Gastroenterology,
Genitourinary
medicine , venereology and STDs
,
Health
services
,
Helminthology,
hematology,
Histology,
Hospital
administration
,
Immunology,
Infectious
diseases
,
Informatics,
Laboratory
medicine
, Maternal
and child health
,
Measures,
Microbiology,
Mycology,
Nephrology,
Neurology,
Nutrition
and dietetics
,
Obstetrics
and gynecology
,
Occupational
medicine,
industrial
medicine
, Oncology,
Ophthalmology
and optics
,
Orthopedics,
Otorhinolaryngology,
Parasitology,
Pathology,
Pediatrics,
Pharmacology
and therapeutics
,
Physiatrics
and physical medicine
,
Physiology,
Prefixes,
Preventive
medicine
, Public
health,
community
mdeicine and hygiene
,
Reproductive
health
, Sexology,
Suffixes,
Surgery,
Taxonomy,
nosology
and classification

(1118), Toxicology,
Transplantation,
Tropical
medicine
, Virology,
WHO
managerial terms

(2000), Zoology
(997).

Helminthology,
hematology,
Histology,
Hospital
administration
,
Immunology,
Infectious
diseases
,
Informatics,
Laboratory
medicine
,  Maternal
and child health
,
Measures,
Microbiology,
Mycology,
Nephrology,
Neurology,
Nutrition
and dietetics
,
Obstetrics
and gynecology
,
Occupational
medicine, industrial medicine
,
Oncology,
Ophthalmology
and optics
,
Orthopedics,
Otorhinolaryngology,
Parasitology,
Pathology,
Pediatrics,
Pharmacology
and therapeutics
,
Physiatrics
and physical medicine
,
Physiology,
Prefixes,
Preventive
medicine
,  Public
health, community mdeicine and hygiene
,
Reproductive
health
,  Sexology,
Suffixes,
Surgery,
Taxonomy,
nosology and classification

(1118),  Toxicology,
Transplantation,
Tropical
medicine
,  Virology,
WHO
managerial terms

(2000),  Zoology
(997).

3.2
Agriculture and related domains

  • AGROVOC is a
    multilingual structured and controlled vocabulary designed to cover
    the terminology of all subject fields in agriculture, forestry,
    fisheries, food and related domains (e.g. environment).
    At present AGROVOC contains more
    than 16,700 descriptors and more than 10,900 non-descriptors
    (synonyms).

You
can download a copy of AGROVOC from

http://www.fao.org/aims/ag_download.htm

Each descriptor has its
equivalent in other languages. Descriptors are indexing terms which
consist of one or more words representing one and the same concept.
Non-descriptors are terms which help the user to find the appropriate
descriptor(s). Non-descriptors are followed by a reference (USE
operator) to the descriptor, which is the preferred term. For
indexing purposes, it is important that only descriptor terms are
used.

  • AGROVOC is available in 9
    languages: the five FAO official languages (which are English,
    French, Spanish, Chinese and Arabic), Czech, Portuguese, Japanese
    and Thai. Other languages like German, Italian, Korean, Hungarian,
    Slovak and Lao are currently being prepared.

It is stated
clearly in their website that AGROVOC is free of charge for
educational or other strictly non-commercial purposes.


AGROVOC
is available for downloading in MySQL, TagText, ISO2709 and Microsoft
Access formats. To download the AGROVOC database for off-line use,
please send your request to fao-agris-caris@fao.org. When sending the
request please specify the following: Full Name, Email, Organisation,
Reason for downloading AGROVOC, Comments. AGROVOC is also available
through web services. More information available here:
http://www.fao.org/aims/ag_webservices.jsp

3.3
Psycology

 

 

 

3.4
Hydrology

  • International Glossary of
    Hydrology (1418 entries): This is a multilingual resource that
    includes Arabic and English (to view Arabic characters choose
    Unicode UTF-8).

http://www.disclic.unige.it/glos_idro/indice.php?list=0&lang=ar&style=1

 

 

  • Lexique
    Hydrologique pour l’Ingénieur English Arabic French
    Romanian


http://www.cemagref.fr
(2000 entries Pdf format
)

<PLANG=”fr-FR” ALIGN=LEFT >

<PLANG=”fr-FR” ALIGN=LEFT >

3.5
Urbanism

Habitat
and Urbanism Glossary (AR-EN-FR)
:
This has 3850 Arabic-English-French entries in PDF.

3.6
Chemistry

Elementymology
& Elements Multidict (MULTI)
:
This is a multilingual dictionary of the names of chemical elements
in many languages. There are alphabetical and numerical lists.
Clicking on the name of an element brings the element information
page up in the main window. It can be used as a reference.

3.7
Zoology

Zoology
Dictionary (EN>AR)
:
This has 2500 terms in alphabetical order.

3.8
Mathematics

Glencoe
Online

This is a multilingual Mathematics Glossary (AR-EN-ES-KO-RU-UR-VI-ZH)

in pdf files, in the form of an alphabetical list with glosses.

3.9
Islamic terms

 


3.10
Finance and Banking

 

 

 

3.11
Botanic

 

  • Spices
    http://stephkup.nexenservices.com/epices/affichage/liste.htm

4.
OTHER LINGUISTIC RESOURCES

 

4.1
Arabic Conjugators

This allows the
online generation of individual verb forms (from I to X) for Arabic
verbs with tri-consonantal given roots. ( in Arabic letters).

  • Arabic word
    form generator. Rudolf W. Meijer

This is more
complete than the above. It uses the Latin characters for introducing
the Arabic root and it is off line. It has been downloaded and it
works for Windows.

  • Jerzy
    Łacina Poland (MS-Dos programs)

 

  • Muhallil
    a simple analyzer of Arabic verbs.
    Musarrif
    a simple generator of Arabic verbs.

 

http://www.staff.amu.edu.pl/~lacina/page4.html

http://www.verba.org/verbi_utf8/all_verbs_index_ar.html

  • Conjugation
    of Arabic verbs

www.freshmeat.net

 

  • fa.ala: This is a tool that
    conjugates Arabic verbs

 

  • Morfix Arabic
    Search This is a multilingual search engine using Arabic Morphology
    and cross-language search.

www.morfix.il

Interesting and useful online
tool. Arabic Morfix has a big capacity of morphological searching and
is standalone search engine. This tool is a demonstration and it is
based on a collection of 200 articles which contain general news
items form various sources. In its searching it takes into account
the following features: context sensitivity, expanded morphological
search, thesaurus search and entering queries in Latin Transcription
for Arabic names.

  • Off line Conjugator

www.geocities.com/effel_dahling

  • aConCorde:
    concordancy program for Arabic by Andrew Roberts. A multilingual
    tool for processing a corpus. It has been downloaded and tested. It
    works.

www.comp.leeds.co.uk

www.freshmeat.net

The tools called concordancers
have as main tasks searching, sorting and classifying words and they
are a real help in which concerns the manipulation of corpus.

4.2
Arabic Dependency Treebank


http://www.ircs.upenn.edu/arabic/

5. ARABIC
NL PROCESSORS

 

5.1.
Morphological Analyzers

 

  • Sebawai

Morphological
Analyzer (Kareem Darwish)

  • Xerox

 

http://www.xrce.xerox.com/competencies/content-analysis/arabic/

  • Aramorph

 

  • Buckwlater

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004L02

  • Morphological
    Analizer ($590)

http://www.cimos.com/

 

 

5.2.
Stemmers

 

  • Al-Stem


Light stemmer (Kareem Darwish)

  • Light10


Larkey

  • Chen
  • Khoja

 


5.3.
Root extractors

 

  • gendic: reduces Arabic words to
    their roots

www.freshmeat.net

 

5.4
Transliterators

 

  • Jtransliterator: a tool that
    transliterates Arabic scripts to Latin script

 

www.freshmeat.net

 

5.5
Other Arabic NL Processors

 

  • Syntactic
    Analyser ($990)

http://www.cimos.com/

 


6. OTHER ARABIC NL TOOLS

Illinois
Institute of Technology

Information
Retrieval System Online Query the TREC Arabic collection

http://www.ir.iit.edu:8180/arabic-interface/index.html

  • Arabeyes:
    It includes various resources.

http://www.arabeyes.org/

Arabeyes is a Meta
project that is aimed at fully supporting the Arabic language in the
Unix/Linux environment. It is designed to be a central repository for
standardizng the Arabization process. Arabeyes relies on voluntary
contributions by computer professionals and enthusiasts from all over
the world.


Katoob: Editor of Arabic texts


Mozilla: Arabization of Mozilla


ITL: Islamic tools (data calculus,…)


BiCon: Console in Arabic


Quran: Tools for reading the Coran


QaMoose: a oOn-line access to a dictionary (information extracted
from the word list)


Akka: Arabization of Linux Consoles


Arabbix: Arabized Linux Live-CD


Bayani: arabized scientific plotter.


Distros: Arabized Linux distributions


Duali: Orthographical corrector


FreeBSD: FreeBSD Arabization

 

lala: a localization tool for
LINUX Arabic support.

conv_ara_html: a tool for
converting Arabic numeric character references

PostArabic: Arabic shaping for
PostgreSQL

ToIpt: PHP class for writing
Farsi and Arabic text on images

mule: multilingual emacs

ClearlyU: BDF fonts useable for
Unicode text

Arabeske: an arabesque-like
pattern design tool

buckwalter2unicode: A Python
script to convert from buckwalter to Unicode

Encode::Arabic : Perl module that
can convert from and to some Arabic encodings (including buckwalter,
araTeX, …)

FriBidi: a free implementation of
the Unicode Bidi algorithm.


7. SLIGHTY COMMENTED BIBLIOGRAPHY

Bibliography
on Arabic Linguistics
http://www.lib.umich.edu/area/Near.East/ALSLING.html

Selective
Bibliography on Arabic Grammar and linguistics

http://www.lib.umich.edu/area/Near.East/WFischerBibliography.pdf

7.1
Thesis

 

 

Ahmed Farouk
Ahmed. () Developing an Arabic Parser in a Multilingual Machine
Translation System.

Master Thesis. Cairo University
(with PROLOG CODE)

 

Azza Abd and
El-Moniem Mohamed. Machine Translation of Noun Phrases:
From English to Ara
bic. Master Tesis.
Cairo University.

 

Kadri Y., Benyamina A. (1992).
“Un système d’analyse syntaxico-sémantique
du langage arabe non voyellé”, Mémoire
d’ingénieur, Université d’Oran.

<PLANG=”fr-FR” ALIGN=LEFT >

Kareem
Darwish (2003). Probabilistic Methods for Searching OCR-Degraded
Arabic Text

PHD
Thesis.

Mohamed
Attia and Mohamed Elaraby Ahmed (2000). A large-scale
computational processor of the Arabic morphology and application.

Master thesis, Cairo University

MORPHO3
morphological analyzer

4000 roots

1000 patterns

Mona
Diab (2003). Word Sense Disambiguation within a Multilingual
Framework.
PHD Thesis.

R. Al-shalabi
(1996). Design and implementation of an
Arabic morphological system to support natural language processing.

Ph.D. Dissertation. Computer Science
Department, Illinois Institute of Technology. Chicago, 1996.

Sabri
Elkateb (2005) Design and implementation of an English Arabic
dictionary/editor. PhD thesis, The University of Manchester, United
Kingdom.

Sebawai
Morphological Analyzer.

Al-Stem Light
stemmer

7.2.
Articles

Abdelhadi
Soudi, Violetta Cavalli-Sforza () “Interfacing an Arabic
Morphology and sentence generation system with an English-to-Arabic
knowledge-based Machine Translation System”.

  • KANT
    MT system

Abdelhadi
Soudi, Violetta Cavalli-Sforsa, Abderrahim Jamari (2002a) “The
Arabic Noun System Generation”, in Proceedings of the
International Conference on Arabic Processing
, Manouba
University,Tunisia.

  • Arabic
    broken plural
  • Lexema-based
    model for broken plural
  • Implemented
    in Morphe

Abdelhadi
Soudi, Violetta Cavalli-Sforsa, Abderrahim Jamari (2002b) “A
Prototype English-to-Arabic Interlinguabased MT System”, in
Proceedings of the Processing of Arabic Workshop, Language
Resources Evaluation Conference,
Las Palmas, Spain.

 

Abdelhadi
Soudi, Jim Cowie, Hamdy S.Soliman (1999) “Interfacing an Arabic
Morphological Generator with an Interlingua-based Machine Translation
System”, Carnegie Mellon University, USA.

Abdelmajid
Ben Hamadou (1986) “A Compression Technique for Arabic
Dictionaries: The Affix Analysis” COLING 1986-


Morphological Analyzer

Ahmed
Rafea, Khaled Shaalan (1993) “Lexical Analysis of Inflected
Arabic Words using Exhaustive Search of an Augmented Transition
Network”. Software Practice & Experience, Vol 23 (6),
pags. 567-588.

  • [Begin_1
    | Begin_1] + Stem + [Last_1] + [Last_2] + [Last_3]
  • ATN
    implemented in Pascal
  • 5
    registers
  • 17
    flags
  • types
    of rules
  • Some
    details are given

Alexander
Fraser, Jinxi Xu, Ralph Weischedel (2002) “Cross-Lingual
Retrieval at BBN”, TREC 2002

Allan
Ramsay, Hanady Mansur (2000) “Arabic Morphology: a categorial
approach”

  • Recover
    diacritics missing in MSA texts

Allan
Ramsay, Hanady Mansur (2004) “The parser from an Arabic Text-to
speech system”, Le traitement automatique de l’arabe,
JEP-TALN, Fes, 19-21 april 2004

  • Sign-based
    system

Alshalabi, R.
and Evens, M. (1998). “A Computational Morphology System for
Arabic”, In Workshop on Computational Approaches to Semitic
Languages COLING-ACL98
, August 16, Montreal, 1998.

Azza
Abdel Monem, Khaled Shaalan, Ahmed Rafea, Hoda Baraka. () “A
Proposed Approach for Generating Arabic from Interlingua in a
Multilingual Machine Translation System”

  • Nespole
  • Grammar
    rules: Cavalli-Sforza, Soudi
  • Morphological
    rules: Timothy

 

Azzah
Al-Maskari and Mark Sanderson, “The effect of Machine
Translation on the performance of Arabic-English QA System”

Black,
W. J., and Elkateb, S. (2004) A Prototype English-Arabic Dictionary
Based on WordNet, Proceedings of 2nd Global WordNet
Conference, GWC2004, Czech Republic, 67-74.

  • AE
    bilingual WN
  • Good
    editor
  • Using
    Prolog for WN navigation

Black,
W., Elkateb, S., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A.
and Fellbaum, C., (2006). Introducing the
Arabic WordNet Project
, in Proceedings of the Third
International WordNet Conference, Sojka, Choi, Fellbaum and Vossen
eds.

Beesley, K. R.
and L. Karttunen: (2000) ‘Finite-State Non-Concatenative
Morphotactics’. In: Proceedings of the fifth workshop of the
ACL special interest group in computational phonology, SIGPHON-2000
.
Luxembourg.

 

Beesley, K. R.
and L. Karttunen (2003). Finite-State Morphology: Xerox Tools and
Techniques
.Cambridge University Press.

Berg, H.
(2001) ‘Computers and the Qur’¯an’. In: J. D.
McAuliffe (ed.): Encyclopaedia of the Qur’¯an, Vol.
One. Leiden–Boston–K¨oln: Brill, pp. 391–395.

Chen,
A., Gey, F (2002).”Building an Arabic Stemmer for Information
Retrieval”. The Eleventh Text Retrieval Conference (TREC
2002)

  • Two
    light stemmers:

MT-based

light stemmer
(similar to Larkey’s)

Chiang, David, Mona Diab, Nizar Habash, Owen Rambow and Safi Sharif. 2006.
Parsing Arabic Dialects. In Proceedings of the 11th Conference of the
European Chapter of the Association for Computational Linguistics. Trento,
Italy. [
PDF ]

Chowdhury,
A., Aljlayl, M., Jensen, E., Beitzel, S.,Grossman, D., Frieder, O.
(2002).”IIT at TREC 2002 Linear Combinations Based on Document
Structure and Varied Stemming for Arabic Retrieval.”The
Eleventh Text Retrieval Conference
(TREC 2002)

  • Two
    stemmers:

pattern-based (how
to get the patterns)

deeper light
stemmer

Darwish, K., Oard, D. W.
(2002).“CLIR experiments at Maryland for Trec-2002 : Evidence
combination for Arabic-English retrieval”,
Eleventh Text Retrieval Conference (TREC 2002).

Mona
Diab.() “Feasibility of Bootstrapping an Arabic WordNet
Leveraging Parallel Corpora and an English WordNet”

Mona
Diab (2004) “An Unsupervised Approach for Bootstrapping Arabic
Sense Tagging”

Diab, Mona, Kadri Hacioglu and Daniel Jurafsky. Automated Methods for
Processing Arabic Text: From Tokenization to Base Phrase Chunking. Book
Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical
Methods. Editors Antal van den Bosch and Abdelhadi Soudi. Kluwer/Springer
Publications, 2007.

Diab,
Mona, Kadri Hacioglu and Daniel Jurafsky (2004). Automatic Tagging of
Arabic Text: From raw text to Base Phrase Chunks. In Proceedings
of HLT-NAACL
2004.

Dichy. (2001)
“On Lemmatization in Arabic – A FormalDefinition of the
Arabic Entries of Multilingual Lexical Databases,” Proc. of
the Workshop on Arabic LanguageProcessing
, Toulouse, 2001.

 

Dichy, J. / A.
Farghaly (2003) “Roots & Patterns vs. Stems: on what
grounds should a multilingual database centred on Arabic be built?”,
in Proceedings of the MT Summit IX Workshop on Machine Translation
for Semitic Languages: Issues and Approaches
,September 23, 2003,
New Orleans, Louisiana, U.S.A.

Elkateb,
S., Black, W., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A. and
Fellbaum, C., (2006). Introducing a WordNet for
Arabic
, in Proceedings of the Fifth International Conference
on Language resources 2006, Genoa Italy.

El-Sadany, T.
A. and M. A. Hashish, (1989) “An Arabic Morphological
System.”In IBM Systems Journal, Vol. 28, No. 4, 600-612,
1989.

Feddagi, A., (1992) ‘Arabic
Morpho-syntax and semantic parsing’, Department of Computer
Science, University of Manchester, 3rd International Conference on
Multilingual
, 10-12 Dec., 1992, Univ. of Durham.

Franz,
M., McCarley, J. S. (2002).”Arabic Information Retrieval at
IBM”. The Eleventh Text Retrieval Conference (TREC 2002).


Presentation of two models for crosslanguage IR (English queries,
Arabic documents)

George
Anton Kiraz (1994) “Computational Analysis of Arabic
Morphology.” In Narayanan A. and Ditters E. (eds) The
linguistic Computation of Arabic

  • multi-tape
    two level FST
  • grammars
    and sample lexicon are included

George Anton
Kiraz, (1998)”Arabic Computational Morphology in the West.”
In Proceedings of the 6th International Conference and
Exhibition on Multi-lingual Computing
, Cambridge, 1998.

Habash, Nizar, Owen Rambow and George Kiraz. Morphological Analysis and
Generation for Arabic Dialects. In Proceedings of the Workshop on
Computational Approaches to Semitic Languages at the Conference of American
Association for Computational Linguistics (ACL’05). [
PDF ]

Habash, Nizar and Owen Rambow. Arabic Tokenization, Morphological Analysis,
and Part-of-Speech Tagging in One Fell Swoop. In Proceedings of the
Conference of American Association for Computational Linguistics (ACL’05).
[PDF ]

Habash, Nizar. Large Scale Lexeme Based Arabic Morphological Generation. In
Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez,
Morocco, 2004. [
PDF]

Habash, Nizar and Owen Rambow. MAGEAD: A Morphological Analyzer and
Generator for the Arabic Dialects. In Proceedings of COLING-ACL, Sydney,
Australia, 2006 (Main Volume). [
PDF ]

Habash, Nizar and Owen Rambow. A Morphological Analyzer for MSA and the
Arabic Dialects. Presented at the Arabic Linguistic Society annual meeting,
Kalamazoo. 2006.

Habash, Nizar. “Arabic Morphological Representations for Machine
Translation.” Book Chapter. In Arabic Computational Morphology:
Knowledge-based and Empirical Methods. Editors Antal van den Bosch and
Abdelhadi Soudi. Kluwer/Springer Publications, 2007.

Habash, Nizar, Bonnie Dorr and Christof Monz. Challenges in Building an
Arabic Generation-heavy Machine Translation System and Extending it with
Statistical Components. In Proceedings of the Association for Machine
Translation in the Americas (AMTA-2006). [
PDF ]

Habash, Nizar and Fatiha Sadat. Arabic Preprocessing Schemes for Statistical
Machine Translation, In Proceedings of the North American Chapter of the
Association for Computational Linguistics (NAACL), New York, 2006. [
PDF]

Habash, Nizar, Abdelhadi Soudi, and Tim Buckwalter. “On Arabic
Transliteration.” Book Chapter. In Arabic Computational Morphology:
Knowledge-based and Empirical Methods. Editors Antal van den Bosch and
Abdelhadi Soudi. Kluwer/Springer Publications, 2007.

Habash, Nizar. “On Arabic and its Dialects,” Multilingual Magazine. Volume
21 Number 3. 2006.

Habash, Nizar and Owen Rambow. Extracting a Tree Adjoining Grammar from the
Penn Arabic Treebank. In Proceedings of Traitement Automatique du Langage
Naturel (TALN-04). Fez, Morocco, 2004. [
PDF ]

Habash, Nizar, Clinton Mah, Randy Calistri-Yeh, Sabiha Imran and Paraic
Sheridan. The Design and Validation of an Arabic Conceptual Interlingua for
Information Retrieval. In Proceedings of the International Conference on
Language Resources and Evaluation (LREC). 2006.

Haidar M.
Harmanani, Walid T. Keirouz, Saeed Raheel ()”A rule-based
extensible stemmer for information retrieval with application to
arabic”.

 

Hasnah, A. /
Evens, M. (2001), “Arabic/English Cross Language Information
Retrieval Using a Bilingual Dictionary”, in: Proceedings of
the ACL/EACL 2001 Workshop on Arabic Language Processing: Status and
Prospects
, July 6, 2001, Toulouse, France.

Hassan
Sawaf, Jörg Zaplo, Hermann Ney (2001) “Statistical
Classification Methods for Arabic News Articles”

  • Character
    3gram and full words
  • MaxEntropy
  • Document
    clustering
  • Mutual
    Information

 

 

HLAL Y. (1987)
‘Information system and Arabic: the use of Arabic in
information system’, Linguistics and Signal &
information processing
, A subsidiary of Haarper & Row
publishing, Inc. 191-197, 1987.

Hudson, G.
(1986) “Arabic Root and Pattern Morphology without Tiers”
Journal of Linguistics, 22:85-122.

Imad A.
Al-Sughaiyer and Ibrahim A. Al-Kharashi. () “Arabic
Morphological Analysis Techniques: A Comprehensive Survey”. 25
pages, very good. See there sakhr link.

 

Imad A.
Al-Sughaiyer and Ibrahim A. Al-Kharashi. () “Rule Parser for
Arabic Stemmer”

Jawad Berri,
Hamza Zidoum and Yacine Atif (2001),
“Web-based Arabic Morphological Analyzer.”

In: A.Gelbukh (ed.): CICLing
2001
, No. 2004 in Lecture Notes in Computer

John
Maloney and Michael Niv. () “TAGARAB: A Fast, Accurate Arabic
Name Recognizer

Using High-Precision
Morphological Analysis”.

Judith Dror.
() “Morphological Tagging of the Qur’an”, Department
of Arabic Language and Literature
, University of Haifa.

 

Kadri,
Y. (2003) “Recherche d’information translinguistique sur
les documents en arabe”, Rapport de prédoctoral, DIRO,
Université de Montréal.

Kenneth,
R. Beesley (1996).”Arabic Finite-State Morphological Analysis
and Generation” . In Using Xerox tools for Arabic morphology

Kenneth,
R. Beesley (1998). “Arabic Morphological Analysis on the
Internet”, In Proceedings of the International Conference on
Multi-Lingual Computing (Arabic & English)
, Cambridge
G.B.,17-18 April, 1998. Using Xerox tools for
Arabic morphology

Kenneth,
R. Beesley (2001). “Finite-State Morphological Analysis and
Generation of Arabic at Xerox Research: Status and Plans in 2001”.
Using Xerox tools for Arabic morphology

Kareem
Darwish, Douglas W. Oard.()”Term Selection for Searching Printed
Arabic”

Kareem
Darwish, Douglas W. Oard.()”Probabilistic Structured Query
Methods”

Kazem
Taghva
,
Rania Elkhoury,
Jeffrey
S. Coombs

(2005) “Arabic Stemming Without A Root Dictionary”.
ITCC
(1) 2005
:
152-157.

More
works by Kazem can be found in:

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/t/Taghva:Kazem.html

  • ISRI
    proposal: Khoja’s without root dictionary
  • Complete
    Algorithm (without pattern sets) is provided

Khoja,
Shereen and Garside, Roger (1999) “Stemming Arabic Text”
Computer Departament, Lancaster University, Lancaster 1999

http://www.comp.lancs.ac.uk/computing/users/khoja/stemmer.ps

  • System
    of Arabic stemming. Accuracy over 96%

Larkey,
Leah S., Ballesteros, Lisa, and Connell, Margaret. (2002) “Improving
Stemming for Arabic Information Retrieval: Light Stemming and
Co-occurrence Analysis” In Proceedings of the 25th Annual
International Conference on Research and Development in Information
Retrieval
(SIGIR 2002), Tampere, Finland, August 11-15, 2002, pp.
275-282.

http://ciir.cs.umass.edu/pubfiles/ir-249.pdf

  • Improves
    previous approaches to Arabic stemming using co-occurrence
    statistics

 

  • Participation
    in TREC-11, Light1, Light2, Light3, Light8

Larkey,
Leah S. and Connell, Margaret, (2002) “Arabic Information
Retrieval at UMass in TREC-10” In Voorhees, E.M. & Harman,
D.K. (Eds.), The Tenth Text Retrieval Conference, TREC 2001,
NIST Special Publication 500-250, pp. 562-570.
http://ciir.cs.umass.edu/pubfiles/ir-254.pdf

  • Participation
    of UMass in TREC-10 Cross-language track,
  • INQUERY
    + Language Modelling (LM)
  • Arabic
    corpus, normalization, using Khoja stemmer
  • several
    resources
  • AFP
    Arabic Corpus 383,872 documents
  • Ectaco
    Dictionary
  • Sakhr
    Dictionary
  • Sakhr
    SET MT
  • Place
    Name Lexicon
  • Stop
    words

Larkey,
Leah S., Ballesteros, Lisa, and Connell, Margaret. (2002) “
Improving
Stemming for Arabic Information Retrieval: Light Stemming and
Co-occurrence Analysis”,
In
Proceedings of the 25th Annual International Conference on
Research and Development in Information Retrieval (SIGIR 2002)
,
Tampere, Finland, August 11-15, 2002, pp. 275-282.

Larkey,
Leah S. and Connell, Margaret, (2002) “
Arabic
Information Retrieval at Umass”.

In Voorhees, E.M. & Harman, D.K. (Eds.) The Tenth Text
Retrieval Conference, TREC 2001
NIST Special Publication 500-250,
pp. 562-570.

Larkey,
Leah S., Ballesteros, Lisa, and Connell, Margaret. (2005) “Light
Stemming for Arabic Information Retrieval”

  • Light10
  • Lemur
    toolkit
  • Affix
    Removal
  • Statistical
    Techniques
  • See
    references
  • Good
    description of tools (stemmers & morphological analyzers)

 

Maamouri, Mohamed, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen
Rambow, Dalila Tabessi. Developing and Using a Pilot Dialectal Arabic
Treebank. In Proceedings of the International Conference on Language
Resources and Evaluation (LREC). 2006.

Mahtab
Nikkhou, Khalid Choukri (2005) “Survey on Arabic Language
Resources and Tools in the Meditarranean Countries”, Nemlar
Report, March 2005.

Mark
Sanderson, Asaad Alberair (2001) “Keep it simple Sheffield – a
KISS approach to the Arabic track”.

  • Using
    almisbar, ajeeb MT systems

 

Rambow, Owen, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash,
Stephen Helmreich, Eduard Hovy, Lori Levin, Carnegie Keith J. Miller, Teruko
Mitamura, Florence Reeder, Advaith Siddharthan. Parallel Syntactic
Annotation of Multiple Languages. In Proceedings of the International
Conference on Language Resources and Evaluation (LREC). 2006.

Snider, Neal and Mona Diab. Unsupervised Induction of Modern Standard Arabic
Verb Classes Using Syntactic Frames and LSA. In Proceedings of the Joint
Conference of the International Committee on Computational Linguistics and
the Association for Computational Linguistics (ACL-Coling’06). Sydney,
Australia. 2006. [PDF

] Snider, Neal and Mona Diab. Unsupervised Induction of Modern Standard Arabic
Verb Classes. In Proceedings of the North American Chapter of the
Association for Computational Linguistics (NAACL), New York, 2006. [
PDF ]

René
Schneider, Thomas Mandl, and Christa Womser-Hacker ()”Integration
of Arabic to a Cross-Lingual Retrieval Tool:Challenges and
Perspectives”.

Riyad
Al-Shalabi and Martha Evens (1998) “A computational morphology
system for Arabic”. In Michael Rosner, editor, Proceedings of
the Workshop on Computational Approaches to Semitic languages
,
pages 66–72, Montreal, Quebec, August. COLING-ACL’98.

 

Saliba, B. and
Al Dannan, A. (1989) “Automatic Morphological Analysis of
Arabic: A study ofContent Word Analysis”, In Proceedings of
the Kuwait Computer Conference
, Kuwait, March 3-5, 1989.

Sabri
El-Kateb, William J. Black.(2004) “English-Arabic Dictionary for
translation”

Sabri
El-Kateb, William J. Black (2001)”Towards the design of
English-Arabic terminological and lexical knowledge base”

Schramm, G. (1962), An Outline of
Classical Arabic Verb Structure, Language vol. 38, pp. 360-75.

Shereen
Khoja (2001) “APT: Arabic Part-of-speech Tagger”
Proceedings of the Student Workshop at the Second Meeting of the
North American Chapter of the Association for Computational
Linguistics
(NAACL2001), Carnegie Mellon University, Pittsburgh,
Pennsylvania. June 2001.

http://www.comp.lancs.ac.uk/computing/users/khoja/NAACL.pdf

  • Tagger
    for Arabic.

 

  • Mixed
    statistic + rule based
  • Trained
    from a corpus of 50,000 words manually tagged
  • Accuracy
    90%

 

Shereen
Khoja, Roger Garside and Gerry Knowles (2001) “An Arabic Tagset
for the Morphosyntactic Tagging of Arabic”, Corpus
Linguistics
2001, Lancaster University, Lancaster, UK, March
2001. To appear in a book entitled A Rainbow of Corpora: Corpus
Linguistics and the Languages of the World
, edited by Andrew
Wilson, Paul Rayson, and Tony McEnery; Lincom-Europa, Munich.
http://www.comp.lancs.ac.uk/computing/users/khoja/CL2001.pdf

  • Proposed
    tagset for Arabic Language
  • Hierarchical
    tagset
  • 177
    tags

 

Smets, M.
(1998). “Paradigmatic Treatment of Arabic Morphology.”, In
Workshop on Computational Approaches to Semitic Languages COLING
-ACL98
, August 16, Montreal, 1998.

Soudi, A.
(2004) “Challenges in the Generation of Arabic from
Interlingua”.

 

Soudi, A.
(1999), “Interfacing an Arabic Morphological Generator with an
Interlingua-based Machine Translation System”, MS. Carnegie
Mellon University, USA.

Soudi, A.,
Eisele, A. (2004) “Generating an Arabic Full-Form Lexicon for
Bidirectional Morphology Lookup”, in Proceedings of Language
Resources Evaluation Conference (LREC),
Lisbon, Portugal.

 

Soudi, A.,
Cavalli-sforza, V., Jamari, A. (2001), “A Computational
Lexeme-based Treatment of Arabic Morphology”, in Proceedings
of The Arabic Processing Workshop, Association For Computational
Linguistics
, Toulouse, France, 2001.

Tomlinson,
S. (2002) “Experiments in Named Page Finding and Arabic
Retrieval with Hummingbird.”

Eleventh
Text Retrieval Conference (TREC 2002)

Violetta
Cavalli-Sforza, Abdelhadi Soudi, and Teruko Mitamura.() “Arabic
Morphology Generation Using a Concatenative Strategy”

  • Regular
    and Hollow verbs in detail
  • Using
    MORPHE for writing rules

 

Youssef Kadri
& Jian-Yun Nie, (1992) “Traduction des requêtes pour
la recherche d’information translinguistique anglais-arabe”.
IR Laboratoire RALI, Département d’informatique
et de recherché opérationnelle, Université de
Montréal

Zahed
Ahmed () “Arabic weak verb formulation and computation”.

  • Arabic
    weak verb formulation using FST implemented in Prolog

 

Zajac, R. and
Casper, M. (1997) “The temple Web Translator”, 1997
Available at:

http://www.crl.nmsu.edu/Research/Projects/tide/papers/twt.aaai97.html

Abdelghani
Bellaachia list of Bellaachia’s works are available in:

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/b/Bellaachia:Abdelghani.html

More
works by Leah are here
http://ciir.cs.umass.edu/~larkey

JEP-TALN
(2004), Traitement Automatique de l’Arabe, Fès,
20 avril 2004

 

 


8. OTHER LINKS

http://129.69.218.213/arabtex/doc/arabdoc.pdf

ArabTeX
link document on typesetting Arabic, Hebrew, etc.

http://cpan.uwinnipeg.ca/dist/Encode-Arabic
Encode-Arabic,
Perl extension for encodings of Arabic can be downloaded.

http://www.arabic-domains.org/intrnational-entites.php
Links
to International entities concerned with Arabic domain names for
completely Arabic internet

http://www.arabismo.com/

Arabic
resources list

http://www.alburaq.net/dictionary1/transform.cfm

English-Arabic,
Arabic-English, and it has a search facility for Arabic words by root
or free search. Online dictionary.

http://literary.ajeeb.com/

(Registration
needed). Only links to different sites related to Sakhr and its
Arabic solutions, like Tarjim, Johaina (news), Siraj (text mining),
etc.

http://english.ajeeb.com/

(Registration
needed)

in
English

On-line
literary dictionary

Virtual
keyboard

Only
links to different sites related to Sakhr and its Arabic solutions,
like Tarjim, Johaina (news), Siraj (text mining), etc.

http://www.cimos.com/

  • Can
    translate full text and words.
  • Multilingual
    NLP tools (English, French, Arabic….)

http://www.lexicool.com/

Lists
of several resources of many languages and different language pairs.

http://www.al-bab.com/arab/comp2.htm

Provides
links to several resources like dictionaries, keyboard layouts,
translation software, etc.

http://www.languageguide.org/arabic/

– In Arabic

– Visual vocabulary
classified on subjects

http://wordnet.princeton.edu/links
WordNet
WEB-GUIs

www.memodata.com

Alexandria:
application that allows to look for words in a dictionary with a
click on a word in a web page. Several to several languages.

 


9. MISCELLANEOUS

 

  • Workshops (for saving all the
    proceedings)

 

Atlas 1999, Arabic Translation
and Localization Symposium (University of Tunise)

ACL Workshop on Arabic Language
Processing: Status and Perspective (2001)

ACL Workshop on Computational
Approaches to Semitic Languages (2002, University of Pennsylvania)

TAL 06, France (EURAR project
DICO may be ongoing)

 

  • Gateway

 

ayna.com

alltheweb.com

alidrisi.com

hahoua.com

google.au (interesting Arabic
google version)