Arabic NLP Resources for the Arabic WordNet Project
William BLACK,Sabri ELKATEB
School of University of Sackville Street, sabri.elkateb@manchester.ac.uk Manuel BERTRAN, Politechnical University of Musa ALKHALIFA, Tànit ASSAF, M.A. MARTI, University of Barcelona Gran Via 585, 08007-Barcelona |
Piek VOSSENIrion Technologies Delftechpark 26, Delft, The Adam PEASE Articulate Software Inc, 420 College Ave Angwin, CA 94508 Christiane Princeton University, Department of Psychology, Green |
Table
of Contents
2.
OPEN DOMAIN LEXICAL RESOURCES 5
2.1
Arabic Monolingual Corpora. 5
2.2
Arabic/English/… Parallel Corpora. 5
2.3
Arabic Monolingual Dictionaries and Lexicons 6
2.4
Arabic/English Bilingual dictionaries and lexicons 7
2.4.1
Printed bilingual dictionaries 7
2.4.2.
On-line MRD (Machine Readable Dictionaries) 8
2.5
Lexicons obtained from (selective) access to online MT systems: 11
3.
DOMAIN RESTRICTED LEXICAL RESOURCES 16
3.2
Agriculture and related domains 18
4.
OTHER LINGUISTIC RESOURCES 20
4.2
Arabic Dependency Treebank 21
5.1.
Morphological Analyzers 22
5.5
Other Arabic NL Processors 23
1. INTRODUCTION
This
report is intended to be a guide to resources (both linguistic data
and linguistic processors and tools) that have been used (or at least
tried) or simply considered for use during the development of AWN.
Our
intention is to maintain an evolving document, for the duration of
the project, where new resources and new comments or assessments on
previous items could be added on the fly. Thus, this initial version
0 will be followed (we hope) by other increasingly useful versions.
The
report is not intended to be a complete survey of Arabic NLP
resources and tools. We have focused on resources related to the
needs of AWN and on free resources.
For
more in depth information on Arabic NLP resources, besides the
content of this report and the links included in it, the following
references could be useful:
- CADIM
(Columbia Arabic Dialect Modeling):
http://www.ccls.columbia.edu/cadim/links.html
- NEMLAR
(Network for Euro-Mediterranean LAnguage Resources):
- Linguistlist,
Resources from Linguistlist on Arabic:
http://cf.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/search/search-all-
res2.cfm?res=All&AppLanguageId=43&search1=search1
- …
Non
Arabic-specific resource repositories (but including valuable Arabic
resources and tools) can be found in:
- LDC
(Linguistic Data Consortium):
–
Arabic Gigaword
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T12
–
Arabic Gigaword Second Edition:
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02
- ELSNET
(European Network of Excellence in Human Language Technologies):
- ELDA
(Evaluation and Language resources Distribution):
–
An-Nahar Newspaper Text Corpus
http://www.elda.org/catalogue/en/text/W0027.html
–
DixAF (Bilingual Dictionary French Arabic, Arabic French)
http://www.elda.org/catalogue/en/text/M0040.html
– Arabic
Data Set
http://www.elda.org/catalogue/en/text/W0030.html
– “Le
Monde Diplomatique” Text corpus in Arabic
http://www.elda.org/catalogue/en/text/W0030.html
- SIGLEX
http://www.siglex.org/ a
Special Interest Group on the Lexicon of ACL (Association for
Computational Linguistics) - …
Additional
useful information and useful links can be found on the Web pages of
a number of people or institutions:
- Violetta
Cavalli-Sforza:
http://libra.sfsu.edu/~vcs/
- Kareem
Darwish: http://www.glue.umd.edu/~kareem/research/
- Mona
Diab: http://www-nlp.stanford.edu/~mdiab/ - Nizar
Habash: http://www.nizarhabash.com/
- Shereen
Khoja:
http://www-nlp.stanford.edu/links/statnlp.html.
An annotated list of resources for Satistical NLP and corpus based
Computational linguistics
http://www.siglex.org/:
A Special Interest Group on the Lexicon of ACL (Association for
Computational Linguistics)
- Mansour
Alghamdi: http://www.mghamdi.com/links.htm - Andrew
Roberts: http://www.comp.leeds.ac.uk/andyr/index.html - Anne
de Roeck: http://mcs.open.ac.uk/anr29/projects.htm#arabic
(corpus) - Everhard
Ditters: http://www.cs.ru.nl/agfl/arab/ditters.html
- Mathieu
Guidere: http://perso.univ-lyon2.fr/~mguidere/ - J.Dichy
–
http://sites.univ-lyon2.fr/langues_promodiinar/Accueil.htm
- Rali
(U. Montreal, Canada): www.kacst.edu.sa
- Harold
Somers: http://www.co.umist.ac.uk/~harold/,
- A
suryey on HLT: http://cslu.cse.ogi.edu/HLTsurvey/ - Natural
Language Lab at Simon Fraser University: http://natlang.cs.sfu.ca/ - Abbas
Benmamoun: http://www.linguistics.uiuc.edu/e-benma/#Semitic
- Ahmed
Abdelali’s. http://crl.nmsu.edu/%7Eahmed/
- Ophir
Frieder: http://ir.iit.edu/~ophir/ - Alan W Black.
http://www.cs.cmu.edu/~awb/ - Martha Evens:
http://www.cs.iit.edu/~martha/ - Hamze
Hassan:
http://www.univ-lyon2.fr/KSLAB_ELISA/1/fiche___laboratoire/ - Husni
Al-Muhtaseb: http://www.ccse.kfupm.edu.sa/~husni/:
many interesting publications but nothing Online. - Khaled
Shaalan: http://www.claes.sci.eg/claes/staff/shaalan.html - Kenneth R.
Beesley: http://www.xrce.xerox.com/people/beesley/home.html
- Yannis HARALAMBOUS:
http://omega.enstb.org/yannis/
: encoding transliteration - Leah Larkey:
http://ciir.cs.umass.edu/~larkey/ - Mokhtar Sellami:
http://www.lri-annaba.net/CV_Sellami.htm
A
useful recent survey (very extensive but mainly focussing on
commercial products) is:
–
Mahtab Nikkhou, Khalid Choukri (2005), Survey of Arabic Language
Resources and Tools in the Mediterranean Countries. Nemlar
Report, March 2005
For
the sake of completeness a slightly commented bibliography of Arabic
NLP is included.
2. OPEN DOMAIN LEXICAL RESOURCES
2.1
Arabic Monolingual Corpora.
- A
reference corpus of Arabic to be used for estimating the relevance
of terms and roots (we say “reference” only as regard our
project; we do not mean that the required corpus should be a true
reference corpus, which for Arabic is probably nonexistent).
–
From LDC several corpora are available
–
This corpus must be pre-processed in order to use it for probability
estimation. To this end normalization and light stemming should be
sufficient (see available tools for this purpose below).
–
Nijmegen Corpus:
http://www.let.kun.nl/wba/Content2/1.4.5_Nijmegen_Corpus.htm
–
News articles
–
Arabic Corpus
2.2
Arabic/English/… Parallel Corpora.
- UN
Bidirectional Multilingual (English, French, Arabic, Russian,
Chinese)
http://157.150.97.21/dgaacs/unterm.nsf
http://lib-thesaurus.un.org/LIB/DHLUNBISThesaurus.nsf/$$searcha?OpenForm
- ArabiCorpus
http://arabiCorpus.byu.edu
Is being designed to
allow students and scholars to search large untagged Arabic corpora
for words and structures. It provides information on word frequency,
citations giving 10 words before and 10 words after, and information
on collocates of the word in question’.
- Aljazirah
http://english.aljazeera.net/HomePage
- The
Algerian Press Agency
www.aps.dz
New
webpage with intersting parallel articles.
- UNESCO
http://termweb.unesco.org/Default.asp?admin=1&internet=1
- FAO
http://www.fao.org
- ALESCO
Turjman Online (with ontology)
http://www.arabization.org.ma/dictionnaire.asp
- MICROSOFT
ftp://ftp.microsoft.com/developr/msdn/newup/Glossary
27
Mb of Computer Science Glossary
- Hebrew-Arabic-English
from the Agava Institute
–
Environmental Terms
–
A Trilingual Glossary by Yaron Batit.
EGYPT
Gizza Toolkit Quran Parallel corpus ( En-Ar)
http://www.clsp.jhu.edu/ws99/projects/mt/
- CLARA
(Corpus Linguae Arabicae)
Arabic-Czech
http://enlil.ff.cuni.cz/veda/projekty/clara.htm
2.3
Arabic Monolingual Dictionaries and Lexicons
- Mokhtar_Assihah
Dictionary(downloadable software):
http://www.amadsoft.com/products/mokhtar_assihah.jsp
- List
of dictionaries (in Arabic) from Dar El Ilm includes monolingual,
bilingual and multilingual dictionaries:
http://www.malayin.com/laut.asp?catid=2
- The
best Arabic Arabic Dictionaries from Lisan
Al-Arab
($150 ) to Taj al-Arus ($265)
http://fadakbooks.com/ardia.html
- We
are putting on line the copious dictionary of Ibn Mandhur (8
volumes). For having an idea about the quality of its content look
here:
http://dictionary.sakhr.com/
http://qamoos.sakhr.com/intro/introles.asp?lex_id=6
( In Arabic)
- At
the moment we have finshed the letter Kaf which consists in 412
masdars (roots) and almost 400 pages in Word format. It is
available on line at:
http://www.lsi.upc.es/~halkoum/aralisan.php
- TANMART:
–
Vegetables http://www.tammar.4t.com/vegta.htm
–
Plants http://www.tammar.4t.com/herb.htm
–
Fruits http://www.tammar.4t.com/fruit.htm
- Traditional
Medecine
http://www.khayma.com/roqia/nabaway.HTM
(5 old books online)
- Vitamins:
http://www.khayma.com/roqia/nabaway.HTM
2.4
Arabic/English Bilingual dictionaries and lexicons
2.4.1
Printed bilingual dictionaries
From the large
quantity of dictionaries that are available, the most relevant
sources for this section are:
- List
of dictionaries (in Arabic) from Dar El Ilm includes monolingual,
bilingual and multilingual dictionaries:
http://www.malayin.com/laut.asp?catid=2
The
list includes the popular Al Mawrid English-Arabic Dictionary and Al
Mawrid Arabic-English Dictionary (printed version with CD-ROM).
- Jean-Jacques
Schmidt, Dictionnaire français-arabe, arabe-français.
Publishing House Dauphin, 1998. This
book was the work reference for many French and Canadian
researchers. It is an old dictionary and it has more than 1 edition.
http://www.bibliomonde.com/pages/fiche-auteur.php3?id_auteur=1508
–
Dictionnaire Larousse Saturne arabe – français / français
– arabe. Publishing House Larousse: 150,000 words
and phrases
- Van
Mol, M and Berghman, K. (2001), Leerwoordenboek Nederlands Modern
Arabisch
The
Dutch Language Union, Amsterdam
MSA Dutch dictionary
is based on a corpus of 3,000,000 words. Mark Van Mol has compiled
the lexical data base and he may have an electronic version of this
dictionary. He is the director of the Leuven Group (Belgium) and he
has many publications in ANLP.
http://www.kuleuven.ac.be/ilt/arabic/index_en.htm
http://mark.vanmol
at ilt.kuleuven.ac.be
- Hans
Wehr, Arabic-English Dictionary: The Hans Wehr Dictionary of
Modern Written Arabic
This dictionary is
one of the most important for many, perhaps the only one in use for
many years and it has been quoted by numerous English language
authors
It can be found at:
http://www.amazon.com/gp/reader/0879500034/ref=sib_dp_pt/103-9733622-2675046
We consider this
dictionary to be necessary given for our needs.
- The
Nijmegen Arabic/Dutch Dictionary
This
dictionary must be considered to be very important and useful.
2.4.2.
On-line MRD (Machine Readable Dictionaries)
In this section we
deal with a large quantity of information that is continuously
changing and being updated.
- Basic
Arabic Spanich Dictionary, Consejería de Educación y
Ciencia Castilla-La Mancha
http://www.jccm.es/educacion/atenc_div/diccionario_arabe/
- Edward William
Lane’s Arabic-English Dictionary:
A
complete version is available for free on line (even though
www.aramedia.com
is selling it for over than $450??).
This is the largest
lexicon available comprising 8 volumes (about 3200 pages). The
dictionary’s author spent over 30 years on compiling it.
As
with any Arabic dictionary it is organized by roots, and it is also
available on line.
Although this
dictionary was expected to be finished months ago, it was only
available as of December 2005. Because they expected heavy online
traffic they announced that the links sometimes would not be working
properly.
- CRL New Mexico
Arabic English Dictionary
It contains 122,920
entries in XML format including Arabic proper names and it is
organized as follows:
Arabic
word # Part of Speech # English word
- Links
to several resources in Arabic:
http://crl.nmsu.edu/Resources/lang_res/arabic.html
- Mesiti Dictionary
www.mesiti.it/arabic/search_dict.asp
This dictionary it
has been created by a group of teachers from Italy. It allows the
user to introduce Arabic or English words although the search is done
using roots. Plurals and feminine forms of adjectives and nouns are
also provided when necessary using a manify function (Javascript).
- English-Arabic
dictionary from Germany
It also allows the
user to introduce English and Arabic words using an Arabic keyboard.
- Basic
illustrated dictionary Arabic-Spanish:
http://www.jccm.es/educacion/atenc_div/diccionario_arabe
- (Flash).
Child illustrated dictionary Arabic-Catalan:
http://www.edu365.com/agora/dic/catala_arab/
- The
webpage http://www.geocities.com/Athens/Agora/3279/
contains:
- Arabic(Algerian)-English
Lexicon. - Transcription only.
- word
(Arabic) – explanation (English)
- The
webpage http://www.cimos.com/
contains:
- Multilingual
dictionary. (Mono/directional $590. Bidirectional $990). - Dictionaries exist
in two versions:
- Stand
alone version - Client-server
version
- There
are 4 types of dictionaries:
- General
dictionary contains approximately 300,000 words and phrasal verbs
in common usage. - Specific
dictionary contains words used by specialists and experts in a
selected subject area. - Idioms
dictionary contains fixed expressions and phrasal verbs. - User
dictionary contains words added or updated by the user. - Main
features :
- Able
to find all inflected word forms - Search
of phrasal verbs - Suitable
for accessing Internet pages - Easily
integrated with multimedia applications - Multilingual
user interface (English, French, Arabic…)
- The
webpage http://www.ectaco.com/
contains:
– Bidirectional
dictionary. ($49.9). Free sample.
- The
webpage http://www.fonsvitae.com/laneslexicon.html
contains:
–
English <-> Arabic lexicon ($159).
- The
webpage http://www.ub.edu/arab/zips/welcome.htm
contains:
- English/Arabic/Spanish
dictionary from UB. - 83,000
entries. - It
appears to be extracted from the following source
- The
webpage http://www.gtoal.com/wordgames/details/arabic/wordlist.html
contains:
- English-Arabic
Word list (able to be seen with Explorer (windows) - Structure: <Arabic
word> <English word> - There
are several explicit variants of the same word (ex: access,
accessibility, accessing…)
- The
webpage http://www.freelang.com/dictionnaire/arabe.html
contains:
– French <->Arabic
dictionary (800 words).
- The
webpage http://www.let.kun.nl/WBA/
contains:
- Dutch
<-> Arabic dictionary. - Information about
the compilation of the dictionary.
- Saudi
Customs Dictionary (AR>EN)
has an alphabetical list of Arabic terms with English gloss and
includes many general terms.
- Webster’s
Online Arabic English Dictionary (AR<->EN):
can be used as reference and its content can be extracted.
- The webpage
http://www.arabsun.de/dictionary.php
includes an English-Arabic (and German) word pairs and is very
useful. If you search for a letter or a word you will get a long
list of words and phrases containing that letter/word in English and
Arabic in different domains. Its content can be extracted.
- IT
Dictionary (AR<->EN):
This includes an alphabetical list in English and Arabic of around
1613 terms with a special field for gloss/explanation of the
terms.
- QaMoose
Dictionary (AR<->EN): You can download a recent stable version
QaMoose
v-2.1 from
Arabeyes.org, the Arabic Unix Project.
- Academy of
the Arabic Language dictionary: This can be accessed at
http://www.arabicacademy.org.eg/search.asp?sid=1
- Sakhr
English-Arabic dictionary:
http://qamoos.sakhr.com/intro/mgz01.asp
This has information about the
Sakhr English-Arabic dictionary and useful information on Arabic
grammar and Arabic language technology in general.
- Almisbar
bilingual dictionary
http://www.almisbar.com/dict_page.html
- UN
http://lib-thesaurus.un.org/LIB/DHLUNBISThesaurus.nsf/$$searcha?OpenForm
2.5
Lexicons obtained from (selective) access to online MT systems:
- <PLANG=”da-DK” ALIGN=JUSTIFY >Almisbar
- (online
MT) http://www.almisbar.com/salam_trans.html
<PLANG=”da-DK” ALIGN=JUSTIFY >
- Tarjim
- http://tarjim.ajeeb.com/ajeeb/elogin_ET.asp
(registration required)
- Ajeeb
- http://ajeeb.sakhr.com/
-
- [
only links to different sites related to Sakhr and its
Arabic solutions, like Tarjim, Johaina (news), Siraj (text
mining), etc.]
- [
http://www.sakhr.com/Sakhr_e/Products/Idrisi.htm?Index=2&Main=Products&Sub=Idrisi
]
- Use
this link with a robot for example and change the last word to
maintain the free access. For example, for looking for the
translation of “car”
–
http://qamoos.sakhr.com/idrisidic_1.asp?Sentence=car
2.6
Stopwords
Some
papers and tools related in a way to Arabic stopwords:
- The
Lemur Toolkit for Language Modeling and Information Retrieval
The
toolkit supports indexing of large-scale text databases, the
construction of simple language models for documents, queries, or
subcollections, and the implementation of retrieval systems based on
language models as well as a variety of other retrieval models. The
system is written in the C and C++ languages, and is designed as a
research system to run under Unix operating systems, although it can
also run under Windows’.
- A
Web Search Engine for Indexing, Searching and Publishing Arabic
Bibliographic Databases
http://www.isoc.org/inet99/proceedings/posters/085/index.htm
- Arbic-English
CLIR track http://www.glue.umd.edu/~oard/papers/trec2002overview.pdf
- Translation
Term Weighting and Combining Translation Resources in Cross-Language
Retrieval
http://metadata.sims.berkeley.edu/papers/trec2001.pdf
- Building
an Arabic Stemmer for Information Retrieval
http://metadata.sims.berkeley.edu/papers/trec2002.pdf
2.7
Gazetteers
- FAOTERM
– Name of Countries (AR-EN-ES-FR-IT-ZH)
: This is useful for named entities – names of countries in Arabic,
English and some other languages prepared by FAO (food and
agriculture organization of the UN).
- Country
Names (MULTI): This is for
the translation of a country name into 15 languages.
- World
Map (AR-EN): In addition to
the World Map in Arabic and English, this website has list of Arabic
names, important figures and places (useful for named entities)
- http://www.geonames.de/indcou.html:
This is a list of countries rivers, mountains, oceans, seas,
rivers, international organizations, languages, days, months,
seasons, religion, wonders, etc. in many languages including English
and Arabic. It is very useful for NER (Named Entity Recognition).
- Intel®
‘s Trademarks and Brands (EN>AR):
This has 500 terms in PDF.
2.8
Online newspapers
2.9
On line Press Agencies
- Algérie
Presse Service (ALGERIA)
www.aps.dz
//Fr
En Ar
- Agence
France Press (FRANCE)
http://www.afp.fr/arabic/home/
//Fr En Ar Sp
- Maghreb
Press Agency (MAROC)
http://www.map.ma/ar
//Fr En Ar Sp
- Xinhuanet
Agency (CHINA)
http://www.arabic.xinhuanet.com/arabic/index.htm
//En Ar
- BBC
(UK)
http://news.bbc.co.uk/hi/arabic/news/
//En Ar
The
agencies quoted do not lay out s paralle articles except the Chinese
and Algerian agency.
2.10
List of verbs
- A
database of 955 Arabic verbs containing their full conjugations is
available from LOGOS.
http://www.verba.org/verbi_utf8/all_verbs_index_ar.html
The
database includes for each verb the vowelized forms of:
- stem
- verbal
noun - full
conjugation in active and passive voice and perfect and imperfect
tenses
imperative,
conditional, jussive
620
verbs
2.11
List of roots
A
list of triliteral and quadriliteral roots organized in Arabic
alphabetical order compiled by Tim Buckwalter but not available in
his webpage www.qamus.org
We
found it at: www.angelfire.com/tx4/lisan/roots1.htm
- Openburhan:
http://www.openburhan.com/ob_main_frame.html
We
will use this list for generating automatically a corpus. Instead of
extracting the root of the word, we make the opposite step from the
root and the various forms of patterns, then reconsitue a lexicon.
2.12
Electronic Books
Here
we can find a vast and copious collection of free Arabic books.
http://www.almeshkat.net/books/index.php
(2282
books)
http://www.al-eman.com/Islamlib/
http://tafsir.org/books/menu.php?action=new
3. DOMAIN RESTRICTED LEXICAL RESOURCES
UNTERM
United Nations Terminology Database (AR-EN-ES-FR-RU-ZH)
:
This has
70,000 entries in 6 Official Languages and its content can be
extracted because the queries result in long lists of words in
English and Arabic. It covers over 80 different domains:
COUNTRY
NAME, AIDS, agriculture, atmospheric science, biodiversity,
bioscience, budget and management, cartography and geography, child
welfare, climate change, codes and regulations, communication, core
concept, culture, declarations, demographics, development,
disarmament, disasters, discrimination, documents, economics,
education and training, energy, environment, export controls and
sanctions, finance, fisheries, food, forestry, functional and other
titles, geoscience, governance, Greek, habitat, health and medicine,
human rights, humanitarian issues, indigenous peoples, information
technology, intellectual property, international law, international
relations, international trade, labour, landmines and mine action,
Latin, law enforcement, law of the sea, logistics and supplies,
meetings, migrations and refugees, military abbreviations, military
issues, multilateral instruments, narcotic drugs, national law,
natural resources, nuclear science, oceanography, organizational
structure, peace and security, peace operations, plans of action and
initiatives, political life, poverty, religions, science and
technology, set phrases, small arms, social issues, space, staff
matters, statistics, TALOS, terrorism, transport and communications,
water, weapons of mass destruction, women.
UNESCOTERM
Search (AR-DE-EN-ES-FR-RU-ZH)
:
This can be used as reference
and its content can be extracted. It includes terms related to UNESCO
such as administrative and financial terms, education, conferences
and meetings, etc).
UNESCO Structures,
Superseded UNESCO Structures, Institutions: (IGOs, NGOs, etworks,
Systems, Foundations), IOC: Titles, Terms and Acronyms ,
Administrative and Financial Terms , International (Days, Weeks,
Years and Decades), Campaigns and Appeals, UNESCO’s Member States,
UNESCO’s Standard-Setting Instruments, International Prizes,
(Non-Member States, Non-Self-governing Territories, Dependent
Territories etc.), UNESCO Chairs, Miscellaneous, UN and International
Legal Instruments, UNESCO Functions and Titles, (Conferences,
Meetings etc.), Terms in the field of Education, (UNESCO’s
Programmes, Projects, Initiatives), (International Programmes,
Projects, Initiatives), Former Institutions: (IGOs, NGOs, Networks,
Systems, Foundations)
3.1
Medical domain
This has the
Unified Medical Dictionary (UMD) from the World Health Organization
along with its specialized UMD dictionaries which cover more than
70 domains. Entries are arranged by alphabetical order in every
domain and one can see all the English entries with their Arabic
equivalents page by page. All medical terms were approved by the
Arab Academies in Cairo, Damascus, Baghdad and Amman. They also made
sure that the Arabic terms were selected carefully in accordance with
a very strict, clear, simplified and user-friendly methodology. An
electronic version of this edition is available on CD-ROM in a
Windows environment, and comprises about 150 000 terms.
A copy of this
CD-ROM is available from khayat@emro.who.int
The domains include (numbers
refers to number of entries of the domains sampled):
All specialised UMD dictionaries:
Abbreviations
(799 entries), Acidology
(1669), Acronyms
(248),
Anatomy
(2000),
Anesthesiology
(484),
Anthropology
and
anthropometrics (1427),
Bacteriology
(1827),
Biochemistry
and Chemistry (2000),
Biology,
Biomedical
engineering,
Biomedical
ethics,
Biostatistics,
Blood
transfusion medicine,
Botany,
Cardiology
and cadiovascular surgery,
Cell
biology, Demography,
Dentistry(2000),
Dermatology,
Diagnostics(symptoms&signs),
Embryology
& teratology,
Emergency
medicine,
Endocrinology
& metabolism,
Entomology,
Environmental
health, Enzymology
and Zymology, Family
and community medicine,
Food
safety, Forensic
medicine,
Gastroenterology,
Genitourinary
medicine , venereology and STDs,
Health
services,
Helminthology,
hematology,
Histology,
Hospital
administration,
Immunology,
Infectious
diseases,
Informatics,
Laboratory
medicine, Maternal
and child health,
Measures,
Microbiology,
Mycology,
Nephrology,
Neurology,
Nutrition
and dietetics,
Obstetrics
and gynecology,
Occupational
medicine, industrial
medicine, Oncology,
Ophthalmology
and optics,
Orthopedics,
Otorhinolaryngology,
Parasitology,
Pathology,
Pediatrics,
Pharmacology
and therapeutics,
Physiatrics
and physical medicine,
Physiology,
Prefixes,
Preventive
medicine, Public
health, community
mdeicine and hygiene,
Reproductive
health, Sexology,
Suffixes,
Surgery,
Taxonomy,
nosology
and classification
(1118), Toxicology,
Transplantation,
Tropical
medicine, Virology,
WHO
managerial terms
(2000), Zoology
(997).
Helminthology,
hematology,
Histology,
Hospital
administration,
Immunology,
Infectious
diseases,
Informatics,
Laboratory
medicine, Maternal
and child health,
Measures,
Microbiology,
Mycology,
Nephrology,
Neurology,
Nutrition
and dietetics,
Obstetrics
and gynecology,
Occupational
medicine, industrial medicine,
Oncology,
Ophthalmology
and optics,
Orthopedics,
Otorhinolaryngology,
Parasitology,
Pathology,
Pediatrics,
Pharmacology
and therapeutics,
Physiatrics
and physical medicine,
Physiology,
Prefixes,
Preventive
medicine, Public
health, community mdeicine and hygiene,
Reproductive
health, Sexology,
Suffixes,
Surgery,
Taxonomy,
nosology and classification
(1118), Toxicology,
Transplantation,
Tropical
medicine, Virology,
WHO
managerial terms
(2000), Zoology
(997).
- http://www.freewebtown.com/onlinedictionary/A_F_dic.html:
This is an online medical dictionary with alphabetical list of
Arabic and English words.
- Medical
Dictionary (AR<->EN):
This is an online medical dictionary with alphabetical list of
Arabic and English words.
3.2
Agriculture and related domains
- AGROVOC is a
multilingual structured and controlled vocabulary designed to cover
the terminology of all subject fields in agriculture, forestry,
fisheries, food and related domains (e.g. environment).
At present AGROVOC contains more
than 16,700 descriptors and more than 10,900 non-descriptors
(synonyms).
You
can download a copy of AGROVOC from
http://www.fao.org/aims/ag_download.htm
Each descriptor has its
equivalent in other languages. Descriptors are indexing terms which
consist of one or more words representing one and the same concept.
Non-descriptors are terms which help the user to find the appropriate
descriptor(s). Non-descriptors are followed by a reference (USE
operator) to the descriptor, which is the preferred term. For
indexing purposes, it is important that only descriptor terms are
used.
-
AGROVOC is available in 9
languages: the five FAO official languages (which are English,
French, Spanish, Chinese and Arabic), Czech, Portuguese, Japanese
and Thai. Other languages like German, Italian, Korean, Hungarian,
Slovak and Lao are currently being prepared.
It is stated
clearly in their website that AGROVOC is free of charge for
educational or other strictly non-commercial purposes.
AGROVOC
is available for downloading in MySQL, TagText, ISO2709 and Microsoft
Access formats. To download the AGROVOC database for off-line use,
please send your request to fao-agris-caris@fao.org. When sending the
request please specify the following: Full Name, Email, Organisation,
Reason for downloading AGROVOC, Comments. AGROVOC is also available
through web services. More information available here:
http://www.fao.org/aims/ag_webservices.jsp
3.3
Psycology
- http://www.arabpsynet.com/eDictBooks/A.aef.pdf:
This has 7478 Arabic terms (79 pages) from the Psychological
Sciences Dictionary followed by their translations in English and
French, but only starting with Alif. It is also available as a
zipped Executable file from
http://www.arabpsynet.com/HomePage/DictBookAEF.Ar.htm
3.4
Hydrology
-
International Glossary of
Hydrology (1418 entries): This is a multilingual resource that
includes Arabic and English (to view Arabic characters choose
Unicode UTF-8).
http://www.disclic.unige.it/glos_idro/indice.php?list=0&lang=ar&style=1
- Lexique
Hydrologique pour l’Ingénieur English Arabic French
Romanian
http://www.cemagref.fr
(2000 entries Pdf format)
<PLANG=”fr-FR” ALIGN=LEFT >
<PLANG=”fr-FR” ALIGN=LEFT >
3.5
Urbanism
Habitat
and Urbanism Glossary (AR-EN-FR):
This has 3850 Arabic-English-French entries in PDF.
3.6
Chemistry
Elementymology
& Elements Multidict (MULTI):
This is a multilingual dictionary of the names of chemical elements
in many languages. There are alphabetical and numerical lists.
Clicking on the name of an element brings the element information
page up in the main window. It can be used as a reference.
3.7
Zoology
Zoology
Dictionary (EN>AR):
This has 2500 terms in alphabetical order.
3.8
Mathematics
Glencoe
Online –
This is a multilingual Mathematics Glossary (AR-EN-ES-KO-RU-UR-VI-ZH)
in pdf files, in the form of an alphabetical list with glosses.
3.9
Islamic terms
- http://www.muslimphilosophy.com/pd/dmp.pdf:
This is a dictionary of Islamic Philosophical Terms in Arabic
(transliteration) and English in PDF. - http://www.usc.edu/dept/MSA/reference/glossary/term.AH.html
- http://muttaqun.com/dictionary3.html#A
3.10
Finance and Banking
- http://www.islamicfi.com/arabic/dictionary/Dic_letters.asp
Ar Ar dictionary of financial terms organized by alphabetic order - http://www.islamicfi.com/arabic/dictionary/Dic_Subjects.asp
The same dictionary organized by sub-domains (29)
3.11
Botanic
- Spices
http://stephkup.nexenservices.com/epices/affichage/liste.htm
4.
OTHER LINGUISTIC RESOURCES
4.1
Arabic Conjugators
This allows the
online generation of individual verb forms (from I to X) for Arabic
verbs with tri-consonantal given roots. ( in Arabic letters).
- Arabic word
form generator. Rudolf W. Meijer
This is more
complete than the above. It uses the Latin characters for introducing
the Arabic root and it is off line. It has been downloaded and it
works for Windows.
- Jerzy
Łacina Poland (MS-Dos programs)
http://www.staff.amu.edu.pl/~lacina/page4.html
http://www.verba.org/verbi_utf8/all_verbs_index_ar.html
- Conjugation
of Arabic verbs
- fa.ala: This is a tool that
conjugates Arabic verbs
- Morfix Arabic
Search This is a multilingual search engine using Arabic Morphology
and cross-language search.
Interesting and useful online
tool. Arabic Morfix has a big capacity of morphological searching and
is standalone search engine. This tool is a demonstration and it is
based on a collection of 200 articles which contain general news
items form various sources. In its searching it takes into account
the following features: context sensitivity, expanded morphological
search, thesaurus search and entering queries in Latin Transcription
for Arabic names.
-
Off line Conjugator
www.geocities.com/effel_dahling
- aConCorde:
concordancy program for Arabic by Andrew Roberts. A multilingual
tool for processing a corpus. It has been downloaded and tested. It
works.
The tools called concordancers
have as main tasks searching, sorting and classifying words and they
are a real help in which concerns the manipulation of corpus.
4.2
Arabic Dependency Treebank
- <PLANG=”da-DK” ALIGN=LEFT > Prague Arabic
Dependency Treebank Link: http://ufal.mff.cuni.cz/padt/ - Syddansk
Universiteit
http://visl.sdu.dk/visl/ar/parsing/nonautomatic/treebank.php - Penn
Arabic Treebank
http://www.ircs.upenn.edu/arabic/
5. ARABIC
NL PROCESSORS
5.1.
Morphological Analyzers
- Sebawai
Morphological
Analyzer (Kareem Darwish)
- Xerox
http://www.xrce.xerox.com/competencies/content-analysis/arabic/
- Aramorph
- Buckwlater
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004L02
- Morphological
Analizer ($590)
- Downloadable
Morphological Analyzer from CRL, New Mexico State University:
http://crl.nmsu.edu/Resources/lang_res/arabic.html
5.2.
Stemmers
- Al-Stem
–
Light stemmer (Kareem Darwish)
- Light10
–
Larkey
- Chen
- Khoja
5.3.
Root extractors
- gendic: reduces Arabic words to
their roots
5.4
Transliterators
- Jtransliterator: a tool that
transliterates Arabic scripts to Latin script
5.5
Other Arabic NL Processors
- Syntactic
Analyser ($990)
6. OTHER ARABIC NL TOOLS
Illinois
Institute of Technology
Information
Retrieval System Online Query the TREC Arabic collection
http://www.ir.iit.edu:8180/arabic-interface/index.html
- Arabeyes:
It includes various resources.
Arabeyes is a Meta
project that is aimed at fully supporting the Arabic language in the
Unix/Linux environment. It is designed to be a central repository for
standardizng the Arabization process. Arabeyes relies on voluntary
contributions by computer professionals and enthusiasts from all over
the world.
–
Katoob: Editor of Arabic texts
–
Mozilla: Arabization of Mozilla
–
ITL: Islamic tools (data calculus,…)
–
BiCon: Console in Arabic
–
Quran: Tools for reading the Coran
–
QaMoose: a oOn-line access to a dictionary (information extracted
from the word list)
–
Akka: Arabization of Linux Consoles
–
Arabbix: Arabized Linux Live-CD
–
Bayani: arabized scientific plotter.
–
Distros: Arabized Linux distributions
–
Duali: Orthographical corrector
–
FreeBSD: FreeBSD Arabization
- www.freshmeat.net.
It includes:
lala: a localization tool for
LINUX Arabic support.
conv_ara_html: a tool for
converting Arabic numeric character references
PostArabic: Arabic shaping for
PostgreSQL
ToIpt: PHP class for writing
Farsi and Arabic text on images
mule: multilingual emacs
ClearlyU: BDF fonts useable for
Unicode text
Arabeske: an arabesque-like
pattern design tool
buckwalter2unicode: A Python
script to convert from buckwalter to Unicode
Encode::Arabic : Perl module that
can convert from and to some Arabic encodings (including buckwalter,
araTeX, …)
FriBidi: a free implementation of
the Unicode Bidi algorithm.
7. SLIGHTY COMMENTED BIBLIOGRAPHY
Bibliography
on Arabic Linguistics
http://www.lib.umich.edu/area/Near.East/ALSLING.html
Selective
Bibliography on Arabic Grammar and linguistics
http://www.lib.umich.edu/area/Near.East/WFischerBibliography.pdf
7.1
Thesis
Ahmed Farouk
Ahmed. () Developing an Arabic Parser in a Multilingual Machine
Translation System.
Master Thesis. Cairo University
(with PROLOG CODE)
Azza Abd and
El-Moniem Mohamed. Machine Translation of Noun Phrases:
From English to Arabic. Master Tesis.
Cairo University.
Kadri Y., Benyamina A. (1992).
“Un système d’analyse syntaxico-sémantique
du langage arabe non voyellé”, Mémoire
d’ingénieur, Université d’Oran.
<PLANG=”fr-FR” ALIGN=LEFT >
Kareem
Darwish (2003). Probabilistic Methods for Searching OCR-Degraded
Arabic Text
PHD
Thesis.
Mohamed
Attia and Mohamed Elaraby Ahmed (2000). A large-scale
computational processor of the Arabic morphology and application.
Master thesis, Cairo University
MORPHO3
morphological analyzer
4000 roots
1000 patterns
Mona
Diab (2003). Word Sense Disambiguation within a Multilingual
Framework. PHD Thesis.
R. Al-shalabi
(1996). Design and implementation of an
Arabic morphological system to support natural language processing.
Ph.D. Dissertation. Computer Science
Department, Illinois Institute of Technology. Chicago, 1996.
Sabri
Elkateb (2005) Design and implementation of an English Arabic
dictionary/editor. PhD thesis, The University of Manchester, United
Kingdom.
Sebawai
Morphological Analyzer.
Al-Stem Light
stemmer
7.2.
Articles
Abdelhadi
Soudi, Violetta Cavalli-Sforza () “Interfacing an Arabic
Morphology and sentence generation system with an English-to-Arabic
knowledge-based Machine Translation System”.
- KANT
MT system
Abdelhadi
Soudi, Violetta Cavalli-Sforsa, Abderrahim Jamari (2002a) “The
Arabic Noun System Generation”, in Proceedings of the
International Conference on Arabic Processing, Manouba
University,Tunisia.
- Arabic
broken plural - Lexema-based
model for broken plural - Implemented
in Morphe
Abdelhadi
Soudi, Violetta Cavalli-Sforsa, Abderrahim Jamari (2002b) “A
Prototype English-to-Arabic Interlinguabased MT System”, in
Proceedings of the Processing of Arabic Workshop, Language
Resources Evaluation Conference, Las Palmas, Spain.
Abdelhadi
Soudi, Jim Cowie, Hamdy S.Soliman (1999) “Interfacing an Arabic
Morphological Generator with an Interlingua-based Machine Translation
System”, Carnegie Mellon University, USA.
Abdelmajid
Ben Hamadou (1986) “A Compression Technique for Arabic
Dictionaries: The Affix Analysis” COLING 1986-
–
Morphological Analyzer
Ahmed
Rafea, Khaled Shaalan (1993) “Lexical Analysis of Inflected
Arabic Words using Exhaustive Search of an Augmented Transition
Network”. Software Practice & Experience, Vol 23 (6),
pags. 567-588.
- [Begin_1
| Begin_1] + Stem + [Last_1] + [Last_2] + [Last_3] - ATN
implemented in Pascal - 5
registers - 17
flags - types
of rules - Some
details are given
Alexander
Fraser, Jinxi Xu, Ralph Weischedel (2002) “Cross-Lingual
Retrieval at BBN”, TREC 2002
Allan
Ramsay, Hanady Mansur (2000) “Arabic Morphology: a categorial
approach”
- Recover
diacritics missing in MSA texts
Allan
Ramsay, Hanady Mansur (2004) “The parser from an Arabic Text-to
speech system”, Le traitement automatique de l’arabe,
JEP-TALN, Fes, 19-21 april 2004
- Sign-based
system
Alshalabi, R.
and Evens, M. (1998). “A Computational Morphology System for
Arabic”, In Workshop on Computational Approaches to Semitic
Languages COLING-ACL98, August 16, Montreal, 1998.
Azza
Abdel Monem, Khaled Shaalan, Ahmed Rafea, Hoda Baraka. () “A
Proposed Approach for Generating Arabic from Interlingua in a
Multilingual Machine Translation System”
- Nespole
- Grammar
rules: Cavalli-Sforza, Soudi - Morphological
rules: Timothy
Azzah
Al-Maskari and Mark Sanderson, “The effect of Machine
Translation on the performance of Arabic-English QA System”
Black,
W. J., and Elkateb, S. (2004) A Prototype English-Arabic Dictionary
Based on WordNet, Proceedings of 2nd Global WordNet
Conference, GWC2004, Czech Republic, 67-74.
- AE
bilingual WN - Good
editor - Using
Prolog for WN navigation
Black,
W., Elkateb, S., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A.
and Fellbaum, C., (2006). Introducing the
Arabic WordNet Project, in Proceedings of the Third
International WordNet Conference, Sojka, Choi, Fellbaum and Vossen
eds.
Beesley, K. R.
and L. Karttunen: (2000) ‘Finite-State Non-Concatenative
Morphotactics’. In: Proceedings of the fifth workshop of the
ACL special interest group in computational phonology, SIGPHON-2000.
Luxembourg.
Beesley, K. R.
and L. Karttunen (2003). Finite-State Morphology: Xerox Tools and
Techniques.Cambridge University Press.
Berg, H.
(2001) ‘Computers and the Qur’¯an’. In: J. D.
McAuliffe (ed.): Encyclopaedia of the Qur’¯an, Vol.
One. Leiden–Boston–K¨oln: Brill, pp. 391–395.
Chen,
A., Gey, F (2002).”Building an Arabic Stemmer for Information
Retrieval”. The Eleventh Text Retrieval Conference (TREC
2002)
- Two
light stemmers:
MT-based
light stemmer
(similar to Larkey’s)
Chiang, David, Mona Diab, Nizar Habash, Owen Rambow and Safi Sharif. 2006.
Parsing Arabic Dialects. In Proceedings of the 11th Conference of the
European Chapter of the Association for Computational Linguistics. Trento,
Italy. [
PDF ]
Chowdhury,
A., Aljlayl, M., Jensen, E., Beitzel, S.,Grossman, D., Frieder, O.
(2002).”IIT at TREC 2002 Linear Combinations Based on Document
Structure and Varied Stemming for Arabic Retrieval.”The
Eleventh Text Retrieval Conference (TREC 2002)
- Two
stemmers:
pattern-based (how
to get the patterns)
deeper light
stemmer
Darwish, K., Oard, D. W.
(2002).“CLIR experiments at Maryland for Trec-2002 : Evidence
combination for Arabic-English retrieval”,
Eleventh Text Retrieval Conference (TREC 2002).
Mona
Diab.() “Feasibility of Bootstrapping an Arabic WordNet
Leveraging Parallel Corpora and an English WordNet”
Mona
Diab (2004) “An Unsupervised Approach for Bootstrapping Arabic
Sense Tagging”
Diab, Mona, Kadri Hacioglu and Daniel Jurafsky. Automated Methods for
Processing Arabic Text: From Tokenization to Base Phrase Chunking. Book
Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical
Methods. Editors Antal van den Bosch and Abdelhadi Soudi. Kluwer/Springer
Publications, 2007.
Diab,
Mona, Kadri Hacioglu and Daniel Jurafsky (2004). Automatic Tagging of
Arabic Text: From raw text to Base Phrase Chunks. In Proceedings
of HLT-NAACL 2004.
Dichy. (2001)
“On Lemmatization in Arabic – A FormalDefinition of the
Arabic Entries of Multilingual Lexical Databases,” Proc. of
the Workshop on Arabic LanguageProcessing, Toulouse, 2001.
Dichy, J. / A.
Farghaly (2003) “Roots & Patterns vs. Stems: on what
grounds should a multilingual database centred on Arabic be built?”,
in Proceedings of the MT Summit IX Workshop on Machine Translation
for Semitic Languages: Issues and Approaches,September 23, 2003,
New Orleans, Louisiana, U.S.A.
Elkateb,
S., Black, W., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A. and
Fellbaum, C., (2006). Introducing a WordNet for
Arabic, in Proceedings of the Fifth International Conference
on Language resources 2006, Genoa Italy.
El-Sadany, T.
A. and M. A. Hashish, (1989) “An Arabic Morphological
System.”In IBM Systems Journal, Vol. 28, No. 4, 600-612,
1989.
Feddagi, A., (1992) ‘Arabic
Morpho-syntax and semantic parsing’, Department of Computer
Science, University of Manchester, 3rd International Conference on
Multilingual, 10-12 Dec., 1992, Univ. of Durham.
Franz,
M., McCarley, J. S. (2002).”Arabic Information Retrieval at
IBM”. The Eleventh Text Retrieval Conference (TREC 2002).
–
Presentation of two models for crosslanguage IR (English queries,
Arabic documents)
George
Anton Kiraz (1994) “Computational Analysis of Arabic
Morphology.” In Narayanan A. and Ditters E. (eds) The
linguistic Computation of Arabic
- multi-tape
two level FST - grammars
and sample lexicon are included
George Anton
Kiraz, (1998)”Arabic Computational Morphology in the West.”
In Proceedings of the 6th International Conference and
Exhibition on Multi-lingual Computing, Cambridge, 1998.
Habash, Nizar, Owen Rambow and George Kiraz. Morphological Analysis and
Generation for Arabic Dialects. In Proceedings of the Workshop on
Computational Approaches to Semitic Languages at the Conference of American
Association for Computational Linguistics (ACL’05). [
PDF ]
Habash, Nizar and Owen Rambow. Arabic Tokenization, Morphological Analysis,
and Part-of-Speech Tagging in One Fell Swoop. In Proceedings of the
Conference of American Association for Computational Linguistics (ACL’05).
[PDF ]
Habash, Nizar. Large Scale Lexeme Based Arabic Morphological Generation. In
Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez,
Morocco, 2004. [
PDF]
Habash, Nizar and Owen Rambow. MAGEAD: A Morphological Analyzer and
Generator for the Arabic Dialects. In Proceedings of COLING-ACL, Sydney,
Australia, 2006 (Main Volume). [
PDF ]
Habash, Nizar and Owen Rambow. A Morphological Analyzer for MSA and the
Arabic Dialects. Presented at the Arabic Linguistic Society annual meeting,
Kalamazoo. 2006.
Habash, Nizar. “Arabic Morphological Representations for Machine
Translation.” Book Chapter. In Arabic Computational Morphology:
Knowledge-based and Empirical Methods. Editors Antal van den Bosch and
Abdelhadi Soudi. Kluwer/Springer Publications, 2007.
Habash, Nizar, Bonnie Dorr and Christof Monz. Challenges in Building an
Arabic Generation-heavy Machine Translation System and Extending it with
Statistical Components. In Proceedings of the Association for Machine
Translation in the Americas (AMTA-2006). [
PDF ]
Habash, Nizar and Fatiha Sadat. Arabic Preprocessing Schemes for Statistical
Machine Translation, In Proceedings of the North American Chapter of the
Association for Computational Linguistics (NAACL), New York, 2006. [
PDF]
Habash, Nizar, Abdelhadi Soudi, and Tim Buckwalter. “On Arabic
Transliteration.” Book Chapter. In Arabic Computational Morphology:
Knowledge-based and Empirical Methods. Editors Antal van den Bosch and
Abdelhadi Soudi. Kluwer/Springer Publications, 2007.
Habash, Nizar. “On Arabic and its Dialects,” Multilingual Magazine. Volume
21 Number 3. 2006.
Habash, Nizar and Owen Rambow. Extracting a Tree Adjoining Grammar from the
Penn Arabic Treebank. In Proceedings of Traitement Automatique du Langage
Naturel (TALN-04). Fez, Morocco, 2004. [
PDF ]
Habash, Nizar, Clinton Mah, Randy Calistri-Yeh, Sabiha Imran and Paraic
Sheridan. The Design and Validation of an Arabic Conceptual Interlingua for
Information Retrieval. In Proceedings of the International Conference on
Language Resources and Evaluation (LREC). 2006.
Haidar M.
Harmanani, Walid T. Keirouz, Saeed Raheel ()”A rule-based
extensible stemmer for information retrieval with application to
arabic”.
Hasnah, A. /
Evens, M. (2001), “Arabic/English Cross Language Information
Retrieval Using a Bilingual Dictionary”, in: Proceedings of
the ACL/EACL 2001 Workshop on Arabic Language Processing: Status and
Prospects, July 6, 2001, Toulouse, France.
Hassan
Sawaf, Jörg Zaplo, Hermann Ney (2001) “Statistical
Classification Methods for Arabic News Articles”
- Character
3gram and full words
- MaxEntropy
- Document
clustering - Mutual
Information
HLAL Y. (1987)
‘Information system and Arabic: the use of Arabic in
information system’, Linguistics and Signal &
information processing, A subsidiary of Haarper & Row
publishing, Inc. 191-197, 1987.
Hudson, G.
(1986) “Arabic Root and Pattern Morphology without Tiers”
Journal of Linguistics, 22:85-122.
Imad A.
Al-Sughaiyer and Ibrahim A. Al-Kharashi. () “Arabic
Morphological Analysis Techniques: A Comprehensive Survey”. 25
pages, very good. See there sakhr link.
Imad A.
Al-Sughaiyer and Ibrahim A. Al-Kharashi. () “Rule Parser for
Arabic Stemmer”
Jawad Berri,
Hamza Zidoum and Yacine Atif (2001),
“Web-based Arabic Morphological Analyzer.”
In: A.Gelbukh (ed.): CICLing
2001, No. 2004 in Lecture Notes in Computer
John
Maloney and Michael Niv. () “TAGARAB: A Fast, Accurate Arabic
Name Recognizer
Using High-Precision
Morphological Analysis”.
Judith Dror.
() “Morphological Tagging of the Qur’an”, Department
of Arabic Language and Literature, University of Haifa.
Kadri,
Y. (2003) “Recherche d’information translinguistique sur
les documents en arabe”, Rapport de prédoctoral, DIRO,
Université de Montréal.
Kenneth,
R. Beesley (1996).”Arabic Finite-State Morphological Analysis
and Generation” . In Using Xerox tools for Arabic morphology
Kenneth,
R. Beesley (1998). “Arabic Morphological Analysis on the
Internet”, In Proceedings of the International Conference on
Multi-Lingual Computing (Arabic & English), Cambridge
G.B.,17-18 April, 1998. Using Xerox tools for
Arabic morphology
Kenneth,
R. Beesley (2001). “Finite-State Morphological Analysis and
Generation of Arabic at Xerox Research: Status and Plans in 2001”.
Using Xerox tools for Arabic morphology
Kareem
Darwish, Douglas W. Oard.()”Term Selection for Searching Printed
Arabic”
Kareem
Darwish, Douglas W. Oard.()”Probabilistic Structured Query
Methods”
Kazem
Taghva,
Rania Elkhoury, Jeffrey
S. Coombs
(2005) “Arabic Stemming Without A Root Dictionary”. ITCC
(1) 2005:
152-157.
More
works by Kazem can be found in:
http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/t/Taghva:Kazem.html
- ISRI
proposal: Khoja’s without root dictionary - Complete
Algorithm (without pattern sets) is provided
Khoja,
Shereen and Garside, Roger (1999) “Stemming Arabic Text”
Computer Departament, Lancaster University, Lancaster 1999
http://www.comp.lancs.ac.uk/computing/users/khoja/stemmer.ps
- System
of Arabic stemming. Accuracy over 96%
Larkey,
Leah S., Ballesteros, Lisa, and Connell, Margaret. (2002) “Improving
Stemming for Arabic Information Retrieval: Light Stemming and
Co-occurrence Analysis” In Proceedings of the 25th Annual
International Conference on Research and Development in Information
Retrieval (SIGIR 2002), Tampere, Finland, August 11-15, 2002, pp.
275-282.
http://ciir.cs.umass.edu/pubfiles/ir-249.pdf
- Improves
previous approaches to Arabic stemming using co-occurrence
statistics
- Participation
in TREC-11, Light1, Light2, Light3, Light8
Larkey,
Leah S. and Connell, Margaret, (2002) “Arabic Information
Retrieval at UMass in TREC-10” In Voorhees, E.M. & Harman,
D.K. (Eds.), The Tenth Text Retrieval Conference, TREC 2001,
NIST Special Publication 500-250, pp. 562-570.
http://ciir.cs.umass.edu/pubfiles/ir-254.pdf
- Participation
of UMass in TREC-10 Cross-language track, - INQUERY
+ Language Modelling (LM) - Arabic
corpus, normalization, using Khoja stemmer - several
resources - AFP
Arabic Corpus 383,872 documents - Ectaco
Dictionary - Sakhr
Dictionary - Sakhr
SET MT - Place
Name Lexicon - Stop
words
Larkey,
Leah S., Ballesteros, Lisa, and Connell, Margaret. (2002) “Improving
Stemming for Arabic Information Retrieval: Light Stemming and
Co-occurrence Analysis”, In
Proceedings of the 25th Annual International Conference on
Research and Development in Information Retrieval (SIGIR 2002),
Tampere, Finland, August 11-15, 2002, pp. 275-282.
Larkey,
Leah S. and Connell, Margaret, (2002) “Arabic
Information Retrieval at Umass”.
In Voorhees, E.M. & Harman, D.K. (Eds.) The Tenth Text
Retrieval Conference, TREC 2001 NIST Special Publication 500-250,
pp. 562-570.
Larkey,
Leah S., Ballesteros, Lisa, and Connell, Margaret. (2005) “Light
Stemming for Arabic Information Retrieval”
- Light10
- Lemur
toolkit - Affix
Removal - Statistical
Techniques - See
references - Good
description of tools (stemmers & morphological analyzers)
Maamouri, Mohamed, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen
Rambow, Dalila Tabessi. Developing and Using a Pilot Dialectal Arabic
Treebank. In Proceedings of the International Conference on Language
Resources and Evaluation (LREC). 2006.
Mahtab
Nikkhou, Khalid Choukri (2005) “Survey on Arabic Language
Resources and Tools in the Meditarranean Countries”, Nemlar
Report, March 2005.
Mark
Sanderson, Asaad Alberair (2001) “Keep it simple Sheffield – a
KISS approach to the Arabic track”.
- Using
almisbar, ajeeb MT systems
Rambow, Owen, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash,
Stephen Helmreich, Eduard Hovy, Lori Levin, Carnegie Keith J. Miller, Teruko
Mitamura, Florence Reeder, Advaith Siddharthan. Parallel Syntactic
Annotation of Multiple Languages. In Proceedings of the International
Conference on Language Resources and Evaluation (LREC). 2006.
Snider, Neal and Mona Diab. Unsupervised Induction of Modern Standard Arabic
Verb Classes Using Syntactic Frames and LSA. In Proceedings of the Joint
Conference of the International Committee on Computational Linguistics and
the Association for Computational Linguistics (ACL-Coling’06). Sydney,
Australia. 2006. [PDF
]
Snider, Neal and Mona Diab. Unsupervised Induction of Modern Standard Arabic
Verb Classes. In Proceedings of the North American Chapter of the
Association for Computational Linguistics (NAACL), New York, 2006. [
PDF ]
René
Schneider, Thomas Mandl, and Christa Womser-Hacker ()”Integration
of Arabic to a Cross-Lingual Retrieval Tool:Challenges and
Perspectives”.
Riyad
Al-Shalabi and Martha Evens (1998) “A computational morphology
system for Arabic”. In Michael Rosner, editor, Proceedings of
the Workshop on Computational Approaches to Semitic languages,
pages 66–72, Montreal, Quebec, August. COLING-ACL’98.
Saliba, B. and
Al Dannan, A. (1989) “Automatic Morphological Analysis of
Arabic: A study ofContent Word Analysis”, In Proceedings of
the Kuwait Computer Conference, Kuwait, March 3-5, 1989.
Sabri
El-Kateb, William J. Black.(2004) “English-Arabic Dictionary for
translation”
Sabri
El-Kateb, William J. Black (2001)”Towards the design of
English-Arabic terminological and lexical knowledge base”
Schramm, G. (1962), An Outline of
Classical Arabic Verb Structure, Language vol. 38, pp. 360-75.
Shereen
Khoja (2001) “APT: Arabic Part-of-speech Tagger”
Proceedings of the Student Workshop at the Second Meeting of the
North American Chapter of the Association for Computational
Linguistics (NAACL2001), Carnegie Mellon University, Pittsburgh,
Pennsylvania. June 2001.
http://www.comp.lancs.ac.uk/computing/users/khoja/NAACL.pdf
- Tagger
for Arabic.
- Mixed
statistic + rule based - Trained
from a corpus of 50,000 words manually tagged - Accuracy
90%
Shereen
Khoja, Roger Garside and Gerry Knowles (2001) “An Arabic Tagset
for the Morphosyntactic Tagging of Arabic”, Corpus
Linguistics 2001, Lancaster University, Lancaster, UK, March
2001. To appear in a book entitled A Rainbow of Corpora: Corpus
Linguistics and the Languages of the World, edited by Andrew
Wilson, Paul Rayson, and Tony McEnery; Lincom-Europa, Munich.
http://www.comp.lancs.ac.uk/computing/users/khoja/CL2001.pdf
- Proposed
tagset for Arabic Language - Hierarchical
tagset - 177
tags
Smets, M.
(1998). “Paradigmatic Treatment of Arabic Morphology.”, In
Workshop on Computational Approaches to Semitic Languages COLING
-ACL98, August 16, Montreal, 1998.
Soudi, A.
(2004) “Challenges in the Generation of Arabic from
Interlingua”.
Soudi, A.
(1999), “Interfacing an Arabic Morphological Generator with an
Interlingua-based Machine Translation System”, MS. Carnegie
Mellon University, USA.
Soudi, A.,
Eisele, A. (2004) “Generating an Arabic Full-Form Lexicon for
Bidirectional Morphology Lookup”, in Proceedings of Language
Resources Evaluation Conference (LREC), Lisbon, Portugal.
Soudi, A.,
Cavalli-sforza, V., Jamari, A. (2001), “A Computational
Lexeme-based Treatment of Arabic Morphology”, in Proceedings
of The Arabic Processing Workshop, Association For Computational
Linguistics, Toulouse, France, 2001.
Tomlinson,
S. (2002) “Experiments in Named Page Finding and Arabic
Retrieval with Hummingbird.”
Eleventh
Text Retrieval Conference (TREC 2002)
Violetta
Cavalli-Sforza, Abdelhadi Soudi, and Teruko Mitamura.() “Arabic
Morphology Generation Using a Concatenative Strategy”
- Regular
and Hollow verbs in detail - Using
MORPHE for writing rules
Youssef Kadri
& Jian-Yun Nie, (1992) “Traduction des requêtes pour
la recherche d’information translinguistique anglais-arabe”.
IR Laboratoire RALI, Département d’informatique
et de recherché opérationnelle, Université de
Montréal
Zahed
Ahmed () “Arabic weak verb formulation and computation”.
- Arabic
weak verb formulation using FST implemented in Prolog
Zajac, R. and
Casper, M. (1997) “The temple Web Translator”, 1997
Available at:
http://www.crl.nmsu.edu/Research/Projects/tide/papers/twt.aaai97.html
Abdelghani
Bellaachia list of Bellaachia’s works are available in:
http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/b/Bellaachia:Abdelghani.html
More
works by Leah are here
http://ciir.cs.umass.edu/~larkey
JEP-TALN
(2004), Traitement Automatique de l’Arabe, Fès,
20 avril 2004
8. OTHER LINKS
http://129.69.218.213/arabtex/doc/arabdoc.pdf
ArabTeX
link document on typesetting Arabic, Hebrew, etc.
http://cpan.uwinnipeg.ca/dist/Encode-Arabic
Encode-Arabic,
Perl extension for encodings of Arabic can be downloaded.
http://www.arabic-domains.org/intrnational-entites.php
Links
to International entities concerned with Arabic domain names for
completely Arabic internet
http://www.arabismo.com/
Arabic
resources list
http://www.alburaq.net/dictionary1/transform.cfm
English-Arabic,
Arabic-English, and it has a search facility for Arabic words by root
or free search. Online dictionary.
http://literary.ajeeb.com/
(Registration
needed). Only links to different sites related to Sakhr and its
Arabic solutions, like Tarjim, Johaina (news), Siraj (text mining),
etc.
http://english.ajeeb.com/
(Registration
needed)
in
English
On-line
literary dictionary
Virtual
keyboard
Only
links to different sites related to Sakhr and its Arabic solutions,
like Tarjim, Johaina (news), Siraj (text mining), etc.
- Can
translate full text and words. - Multilingual
NLP tools (English, French, Arabic….)
http://www.lexicool.com/
Lists
of several resources of many languages and different language pairs.
http://www.al-bab.com/arab/comp2.htm
Provides
links to several resources like dictionaries, keyboard layouts,
translation software, etc.
http://www.languageguide.org/arabic/
– In Arabic
– Visual vocabulary
classified on subjects
http://wordnet.princeton.edu/links
WordNet
WEB-GUIs
www.memodata.com
Alexandria:
application that allows to look for words in a dictionary with a
click on a word in a web page. Several to several languages.
9. MISCELLANEOUS
- Workshops (for saving all the
proceedings)
Atlas 1999, Arabic Translation
and Localization Symposium (University of Tunise)
ACL Workshop on Arabic Language
Processing: Status and Perspective (2001)
ACL Workshop on Computational
Approaches to Semitic Languages (2002, University of Pennsylvania)
TAL 06, France (EURAR project
DICO may be ongoing)
- Gateway
ayna.com
alltheweb.com
alidrisi.com
hahoua.com
google.au (interesting Arabic
google version)