Puzzle Zeitvertreib Beste 4K Filme Beste Multimedia-Lernspiele % SALE %

Using Comparable Corpora for Under-Resourced Areas of Machine Translation


Using Comparable Corpora for Under-Resourced Areas of Machine Translation
143.14 CHF
Versandkostenfrei

Lieferzeit: 7-14 Werktage

  • 10476062


Beschreibung

1 Introduction 2 Cross-language comparability and its applications for MT
2.1 Introduction: Definition and use of the concept of comparability 2.2 Development and calibration of comparability metrics on parallel corpora 2.2.1 Application of corpus comparability: Selecting coherent parallel corpora for domain-specific MT training 2.2.2 Methodology 2.2.2.1 Description of calculation method 2.2.2.2 Symmetric vs. asymmetric calculation of distance 2.2.2.3 Calibrating the distance metric 2.2.3 Validation of the scores: cross-language agreement for source vs. target sides of TMX files 2.2.4 Discussion 2.3 Exploration of comparability features in document-aligned comparable corpora: Wikipedia 2.3.1 Overview: Wikipedia as a source of comparable corpora 2.3.2 Previous work on using Wikipedia as a linguistic resource 2.3.3 Methodology 2.3.3.1 Document pre-processing 2.3.3.2 Similarity measures 2.3.3.3 Eliciting human judgments 2.3.4 Results and analysis 2.3.4.1 Responses to the questionnaire 2.3.4.2 Inter-assessor agreement 2.3.4.3 Correlation of similarity measures to human judgments 2.3.4.4 Classification task 2.3.5 Discussion 2.3.5.1 Features of 'Similar' articles 2.3.5.2 Measuring cross-language similarity 2.3.6 Section conclusions 2.4 Metrics for identifying comparability levels in non-aligned documents 2.4.1 Using parallel and comparable corpora for MT 2.4.2 Related work 2.4.3 Comparability metrics 2.4.3.1 Lexical mapping based metric 2.4.3.2 Keyword based metric 2.4.3.3 Machine translation based metrics 2.4.4 Experiments and evaluation 2.4.4.1 Data sources 2.4.4.2 Experimental results 2.4.5 Metric application to equivalent extraction 2.4.6 Discussion 2.4.6.1 Advantages and disadvantages of the metrics 2.4.6.2 Using semi-parallel equivalents in MT systems 2.4.7 Conclusion 3 Collecting comparable corpora 3.1 Introduction 3.2 Previous work in collecting comparable corpora 3.2.1 Web crawling 3.2.2 Identifying comparable text 3.3 ACCURAT techniques to collect comparable documents 3.3.1 Comparable corpora collection from Wikipedia 3.3.1.1 Extracting comparable articles 3.3.1.2 Measuring similarity in inter-language linked documents 3.3.2 Comparable corpora collection from news articles 3.3.3 Comparable corpora collection from narrow domains 3.3.3.1 Acquiring comparable documents 3.3.3.2 Aligning comparable document pairs 4 Extracting data from comparable corpora 4.1 Introduction 4.2 Term extraction, tagging, and mapping for under-resourced languages 4.2.1 Related work 4.2.2 Term Extraction, tagging, and mapping with the ACCURAT toolkit 4.2.2.1 Term candidate extraction with CollTerm 4.2.2.1.1 Linguistic filtering 4.2.2.1.2 Minimum frequency filter 4.2.2.1.3 Statistical ranking 4.2.2.1.4 Cut-off method 4.2.2.2 Term tagging in documents 4.2.2.3.1 Term tagging evaluation for Latvian and Lithuanian 4.2.2.3.2 Term tagging evaluation for Croatian 4.2.2.3 Term mapping 4.2.2.4 Comparable corpus term mapping task 4.2.2.5 Discussion 4.2.3 Experiments with English and Romanian term extraction 4.2.3.1 Single-word term extraction 4.2.3.2 Multi-word term extraction 4.2.3.3 Experiments and results 4.2.4 Multi-word term extraction and context-based mapping for English-Slovene 4.2.4.1 Resources and tools used 4.2.4.1.1 Comparable corpus 4.2.4.1.2 Seed lexicon 4.2.4.1.3 LUIZ 4.2.4.1.4 ccExtractor 4.2.4.2 Experimental setup 4.2.4.2.1 Term extraction 4.2.4.2.2 Term mapping 4.2.4.2.3 Extension of the Seed lexicon 4.2.4.3 Evaluation of the results 4.2.4.3.1 Evaluation of term extraction 4.2.4.3.2 Evaluation of term mapping 4.2.4.4 Discussion 4.3 Named entity recognition using TildeNER 4.3.2 Annotated corpora 4.3.3 System design 4.3.3.1 Feature function selection 4.3.3.2 Data pre-processing 4.3.3.3 NER model bootstrapping 4.3.3.4 Refinement methods 4.3.4 Evaluation 4.3.4.1 Non-comparative evaluation 4.3.4.2 Experimental comparative evaluation 4.3.5 Discussion 4.4 Lexica extraction 4.4.1 Related work 4.4.2 Experiments on bilingual lexicon extraction 4.4.2.1

Eigenschaften

Breite: 160
Gewicht: 663 g
Höhe: 240
Länge: 25
Seiten: 323
Sprachen: Englisch
Autor: Andrejs Vasijevs, Bogdan Babych, Dan Tufis, Inguna Skadia, Nikola Ljubesic, Robert Gaizauskas

Bewertung

Bewertungen werden nach Überprüfung freigeschaltet.

Die mit einem * markierten Felder sind Pflichtfelder.

Ich habe die Datenschutzbestimmungen zur Kenntnis genommen.

Zuletzt angesehen

eUniverse.ch - zur Startseite wechseln © 2021 Nova Online Media Retailing GmbH