Faculteit Letteren & Wijsbegeerte
Margot Dedeyne
LeT’s get personal Comparing manually created user profiles with automatic text processing techniques to derive personal interests from tweets
Masterproef voorgedragen tot het behalen van de graad van Master in de Meertalige Communicatie 2015
Promotor Mevr. Orphée De Clercq Vakgroep Vertalen Tolken Communicatie
ACKNOWLEDGEMENTS
First and foremost, I take this opportunity to express my deepest gratitude and appreciation for my supervisor, Ms Orphée De Clercq, without the continuous guidance and encouragement of whom I would never have been able to finish this dissertation. She always provided me with the necessary help for my research and in addition, she was always available to answer all of my questions.
I am also grateful to my linguistic supervisor, namely Prof. Bernard De Clerck, for his linguistic remarks that greatly improved my master’s thesis.
Furthermore, I would also like to express my sincerest gratitude to the twenty people for contributing to this research and making this dissertation possible.
Finally, I thank all my friends and family for helping me to survive all the stress from this year and persuading me to never give up.
1 TABLE OF CONTENTS
List of tables and figures
3
Abstract
5
1
Introduction
6
2
Literature study
8
2.1
8
2.2
3
4
Named Entity Recognition on standard text 2.1.1
Learning methods and approaches to perform NER
11
2.1.2
Features used for NER
13
2.1.3
Challenges and issues for NER
15
Named Entity Recognition on Twitter
17
2.2.1
Challenges and issues for NER in social media
17
2.2.2
Targetable Named Entities
18
Research
22
3.1
Data collection
22
3.2
Data annotation
24
3.3
User profiling
31
User profiling analysis
32
4.1
32
4.2
Manual creation of user profiles 4.1.1
Men
32
4.1.2
Women
35
Automatic analysis
40
4.2.1
Men
40
4.2.2
Women
43
5
Discussion and conclusion
46
6
Bibliography
50
2 7
Appendix
52
7.1
Mails
52
7.2
Overview of the main categories
53
7.3
The top 100 of the most frequently occurring nouns and named entities per test subject 7.3.1
Top 100 of the most frequently occurring named entities of the men
7.3.2
64
Top 100 of the most frequently occurring nouns of the men
7.3.4
59
Top 100 of the most frequently occurring named entities of the women
7.3.3
59
69
Top 100 of the most frequently occurring nouns of the women
76
3 LIST OF TABLES AND FIGURES
Tables
1. Overview of the features used for NER
15
2. Overview of the test subjects’ dataset
23
3. Overview of the main categories of the men
32
4. Overview of the main categories of the women
35
5. Overview of the main categories of Man 1
53
6. Overview of the main categories of Man 2
53
7. Overview of the main categories of Man 3
53
8. Overview of the main categories of Man 4
54
9. Overview of the main categories of Man 5
54
10. Overview of the main categories of Man 6
54
11. Overview of the main categories of Man 7
54
12. Overview of the main categories of Man 8
55
13. Overview of the main categories of Man 9
55
14. Overview of the main categories of Man 10
55
15. Overview of the main categories of Woman 1
55
16. Overview of the main categories of Woman 2
56
17. Overview of the main categories of Woman 3
56
18. Overview of the main categories of Woman 4
56
19. Overview of the main categories of Woman 5
56
20. Overview of the main categories of Woman 6
57
21. Overview of the main categories of Woman 7
57
22. Overview of the main categories of Woman 8
57
23. Overview of the main categories of Woman 9
57
24. Overview of the main categories of Woman 10
58
4 Figures
1. Annotation scheme for named entities, with categories for main type, subtype, usage and metonymic roles 2. The architecture of the NER system
10 20
3. Overview of the main categories and subcategories used for the named entity annotation
25
5
ABSTRACT The main purpose of this dissertation is to explore whether it is possible to manually and automatically find and extract useful information by using NER and a noun-based analysis on a corpus of tweets in order to determine the interests of several active Twitter users, such as a person’s ideology, what kind of job he/she has, etc. Consequently, the main research question of this dissertation is if automatical analysis based on frequent nouns and named entities can automate or at least help to accelerate the manual process of determining people’s interest. The corpus used for this dissertation consists of Twitter data from twenty active Twitter users, namely ten men and ten women. One hundred tweets per person were manually selected for annotation based on named entities after which the resulting tweets were annotated by classifying the named entities into main and subcategories. In a next step, user profiles were manually created for each test subject in order to determine the interests and, based on the output of the LeT’s Preprocessing Toolkit, a top 100 of frequent nouns and named entities was derived. Finally, we sought to explore whether this top 100 would allow us to derive the same information about the Twitter users as from the manually created user profiles. The results revealed that the manual analysis appeared to be more accurate in creating a profile of the test subjects than automatic analyses. In addition, the noun-based analysis provided us with more detailed and specific information than the named entity based analysis since more than 50% of the named entities appeared to be userIDs, cities or countries, on the basis of which it proved to be difficult to determine main interests. It could be concluded that automatic analyses can contribute to speeding up the process of determining people’s interests, however, they cannot be used to completely automate the process.
6
1
INTRODUCTION
Messages posted on social media such as Facebook or Twitter are often informal and as a result too noisy for Natural Language Processing (NLP) purposes, but at the same time the low-barrier and the immediate ‘here and now’ nature of these media make them valuable sources of information. They might provide very up-to-date insights into the tastes, likes and responses to events or products of its users. In other words, social media present an interesting but challenging avenue of research for the extraction of data from output that is not readily accessible or fit for NLP. In this dissertation we will explore to what extent we can find and extract useful information using Named Entity Recognition (NER) on Twitter data in order to map people’s interests or responses to specific products and or services.
This kind of information can be of great value for businesses as it gives them the chance to create or monitor virtual customer environments where brand or product-related communities are formed. For this reason, social media can actually contribute to the success and growth of such companies by identifying target groups for products, by anticipating or closely monitoring fluctuating product or service assessments and by assessing the impact on brand value. As such, Twitter can be considered a very easy way to not only learn about the needs and preferences of the audience, but also to receive immediate feedback about certain products or services, both positive and negative (Sanchez et al., 2012). Using social networking sites also includes some risks for the users. Private information, for example, can often be misused and raise privacy issues such as identity theft or discrimination (Lyon, 2001). Research has begun to explore what kinds of personally identifiable information (e.g. phone numbers, email address, postal address, social security numbers, etc.) people share through services such as Facebook and MySpace (Kolek & Saunders, 2008; Lenhart & Madden, 2007). Today, thousands of personal blogs, social media profiles, tweets, etc. can be found online. These all allow users to express several facets of themselves, such as their opinions, interests and very private information. Thanks to technology the boundaries between someone’s professional life and personal life have become more blurry: private information does not only become available to friends and family, but also to clients, recruiters, and sometimes even
7 unintended audiences. Most of the time social media users do not realize the amount of information they share on the internet and what the consequences may be. In this dissertation, we want to find out how much information Twitter users share on Twitter and which part of this data exactly relates to their interests and opinions.
This brings us to the following research questions of this dissertation:
-
Can we manually determine the interests of our test subjects based solely on their tweets?
-
Based on the output of the LeT’s Preprocessing Toolkit, can we automatically find and extract useful information about personal interests?
-
Can the automatical analysis automate or at least help to accelerate the manual process?
The structure of this dissertation is as follows: Chapter 2 presents a literature overview in which the application of NER on standard text and on Twitter is discussed in greater detail. In order to answer our research questions, we first of all needed a corpus comprising tweets from several active Twitter users. In Chapter 3 we present the different steps that were taken to collect and annotate this corpus. In a next step, which is presented in Chapter 4, 2,000 tweets, corresponding to 100 tweets per test subject, were annotated and classified into main and subcategories, after which a user profile was manually created for each test subject, which allowed us to express the interests of our test subjects based on their tweet output. In a final step we explored if basic NLP systems can automate or at least help to accelerate this process. Finally, in Chapter 5 the results of our research and the answer to the initial research questions will be provided as well as some recommendations for future work.
8
2
LITERATURE STUDY
In this dissertation we will explore whether it is possible to manually and automatically determine the interests of several Twitter users using named entity recognition and a nounbased analysis on a corpus of tweets. Traditionally, named entity recognition is concerned with finding persons, locations or organizations. However, we will not only explore these traditional named entities but we will also focus on non-traditional entities such as films, songs or books, also known as targetable named entities since these kinds of entities may become more and more important. After all, social media are a place where people read about various topics and where opinions and interests are directly expressed. Next to this, an advantage of non-traditional entities is that these are updated more frequently, new films for example are released every week.
In this chapter an overview will be given of Named Entity Recognition (NER) on standard text as well as on Twitter. We will discuss which learning methods and features are used for NER and concentrate on several issues and challenges regarding NER on Twitter data.
2.1
NAMED ENTITY RECOGNITION ON STANDARD TEXT
As Habib and van Keulen (2012, p.1) state, named entities are ‘atomic elements (mentions) in text belonging to predefined categories such as the names of persons, locations, etc.’, while according to Jurafsky and Martin (2000, p. 727), named entities mean ‘everything that can be referred to with a proper name’.
The term ‘Named Entity’ was first used at the Sixth Message Understanding Conference MUC-6 (Nadeau and Sekine, 2007; Konkol, 2012) where the first named entity annotation guidelines were elaborated to extract this type of information from texts. Named Entity Recognition (NER, also known as entity identification, entity chunking and entity extraction) is considered one of the main tasks in Natural Language Processing (NLP) (Konkol, 2012).
9
Jurafsky and Martin (2000, p. 727) define Named Entity Recognition as the combined task of finding spans of text that constitute proper names and then classifying the entities being referred to according to their type. Desmet and Hoste (2014, p.1) state that NER is: the task of automatically identifying names in text and classifying them into a predefined set of categories.
Desmet and Hoste (2014) also suggest that these categories are usually application-dependent, which means that they will differ across various domains. Although NER can be used for information extraction or full-text searching and filtering, finding named entities is also considered a very important preprocessing step for other NLP tasks such as Question Answering, Text Summarization or Sentiment Analysis. For automatic Text Summarization for example, NER helps with identifying important text segments (Hassel, 2003). In addition, Information Retrieval or Question Answering systems are usually built to deal with information about named entities, so incorporating a named entity recognizer in the systems simplifies the task of finding the answers or entities (Mollá, 2006). As a consequence, these kind of NLP tasks can obtain better performance thanks to NER (Konkol, 2012).
There are three universally accepted categories in NER, respectively Persons (e.g. Smith, Luke, etc.), Organizations (e.g. European Union, Google inc, etc.) and Locations (e.g. Amsterdam, Europe, etc.) (Desmet and Hoste, 2014; Nadeau and Sekine, 2007). Since the MUC-6, these three types have been called Enamex tags (Entity name expressions). Next to this tag, two other predominant tags were also introduced, namely Numex tags (numeric expressions such as percentages or currencies) and Timex tags (time expression tags such as dates or times). However, each of the enamex categories can further be divided into several subtypes. The ACE guidelines1 (Doddington et al., 2004), for example, describe subtypes for most categories (such as address, city or country for locations). In the CONLL conference of 2002 (Tjong Kim Sang E, 2002) a fourth type, Miscellaneous, was added to indicate the proper names that were not part of the classic Enamex tags.
1
a program which develops information extraction technologies to find entities, relations and events.
10
However, Desmet and Hoste (2014), who developed the state-of-the-art fine-grained NER system and corpus for Dutch, have made a distinction between six main types, each with their own subtypes. They distinguish as main types: persons, organizations, locations, products, events and miscellaneous named entities. In addition, they also define subtypes: for persons these comprise for example clergy, law or science; organizations can be commercial or governmental; locations can be subdivided in for instance countries, continents or regions; products are languages or shares and finally events can be human or natural. An overview of this subdivision is presented in Figure 1.
Figure 1. Annotation scheme for named entities, with categories for main type, subtype, usage and metonymic roles (Desmet and Hoste, 2014, p.6)
11
2.1.1
Learning methods and approaches to perform NER
For the task of Named Entity Recognition (NER) different approaches or methods can be used. The two most well-known are handcrafted and machine learning systems. Handcrafted or rule-based systems signify that every parameter or rule is made by hand. As the name implies these systems recognize and classify named entities by applying rules. Although creating a rule-based system is time-consuming, it can be very useful for tasks when training examples are not available. Machine learning systems, on the other hand, learn from annotated data (i.e. training data), and three approaches can be discerned according to the particular data that is required to find the parameters: •
The first category is supervised learning, which is implemented when the system needs a corpus with marked entities. Although it has a lot of advantages, supervised learning also has a few shortcomings, the main one of which is the need for a large annotated corpus;
•
Semi-supervised learning is used when the system needs only a small amount of marked entities because it attempts to improve the performance by using unmarked text;
•
The last category is called unsupervised learning, which occurs when the parameters are estimated from unmarked text.
The focus in this study will be on supervised learning, which will be further discussed in the following sections.
12
2.1.1.1
Supervised learning
Today, supervised machine learning (SL) can be considered the most dominant technique for addressing several NER problems (Nadeau and Sekine, 2007; Desmet and Hoste, 2014; Konkol, 2012). In short, the SL method consists of a system that reads a large annotated corpus and creates or learns several disambiguation rules which are founded on discriminative features.
According to Nadeau and Sekine (2007, p.7) the idea of SL is to study the features of positive and negative examples of named entities over a large collection of annotated documents and design rules that capture instances of a given type. Desmet and Hoste (2014, p.16) claim that supervised machine learning demands that the information in a training corpus is shown as a collection of instances which have to be classified into a predefined set of classes.
In other words, for supervised learning the following information is needed: •
A training corpus which consists of a collection of annotated text material; (Ik ga met de trein naar [Kortrijk], Mijn naam is [Margot])
•
A classification or a division into categories. For example the six main types of categories as defined by Desmet and Hoste (2014); (Kortrijk=LOCATION; Margot=PERSON)
•
Features transform the classified training data into feature vectors, these training data are then referred to as instances; ‘Kortrijk’: starts with a capital, is part of a list of possible locations, is not part of a list of possible persons, … (Kortrijk, 1, 1, 0, LOCATION) ‘Margot’: starts with a capital, is not part of a list of possible locations, is part of a list of possible persons, … (Margot, 1, 0, 1, PERSON)
•
In order to classify different main types of named entities, an algorithm or machine learner is necessary. In their study, Desmet and Hoste (2014) used three different techniques, namely Memory-Based Learning (MBL), Support Vector Machines
13 (SVM) and Conditional Random Fields (CRF). Next to these techniques, also Hidden Markov Models or Maximum Entropy (Konkol, 2012) can be used. 2.1.2
Features used for NER
Konkol (2012) distinguishes two categories of features based on the context the feature uses, namely local features (which only use a small neighbourhood of the classified word) and global features. Next to these two classes, also external features and list-lookup features are distinguished in his study, which will be explained in the next sections. Desmet and Hoste (2014) on the other hand, made a division between features for main type named entities and features for the subtypes of named entities. In this section, the most common features used for recognizing and classifying named entities will be discussed in greater detail.
The list-lookup strategy or gazetteer lookup strategy recognizes only entities stored in lists (also a gazetteer, lexicon or dictionary) and can be used to indicate the relation ‘is a’ (for example, Belgium is a country). When a word such as Belgium belongs to a list of countries, then the probability that Belgium is indeed a country is high. Some of the advantages of the list-lookup approach is that it is language independent, simple and fast. However, it cannot deal with ambiguity: due to word polysemy the probability of a word can almost never be 1. Another disadvantage of gazetteers is that they cannot be used for other languages since they are language dependent.
Before candidate words can be matched to elements of an existing list they have to be stemmed or lemmatized. Preprocessing techniques such as these are considered crucial for the use of gazetteers. The stemming technique tries to discover the root of each word, which means removing all affixes. Lemmatization on the other hand, is a task where the output is not a stem but a lemma, or the basic word form of a word. Stemming as well as lemmatization depend on the language used: highly inflectional languages for example simply need these two techniques to diminish the amount of different word forms.
Another feature distinguished by both Nadeau and Sekine (2007) and Konkol (2012) is affixes or the word endings as a feature. Affixes are usually language independent and next to this most types of named entities often have the same word ending as well as a prefix.
14 A feature that Desmet and Hoste (2014) as well as Konkol (2012) mention in their studies is orthographic information, which depends on the appearance of the word. Orthographic features determine whether a word is capitalized, whether it contains punctuation marks, etc. An advantage of this feature is that it can be effective for several languages since it is considered language independent. In comparison with local features, global features use the whole document or corpus. Two examples of global features distinguished by Konkol (2012) are meta-information and previous appearances. E-mails or news articles are typical examples that usually include meta-information. Meta-information, when available, can indeed include various things that can be used as possible features for named entity recognition: email headers for example are good indicators of person names, the news usually starts with a location, etc. Still, using metainformation can have a negative impact on the adaptability because in some domains or other data sources meta-information will not be available.
Marked previous appearances of named entities can come in very handy since new appearances of those named entities will presumably be classified in the same class. This feature can also function as a postprocessing step: after the named entities are classified they can still be altered to another class with a higher probability.
Next to meta-information and previous appearances, Nadeau and Sekine (2007) also distinguish coreferences and aliases as global features. They suggest that coreferences are ‘the occurrences of a given word or word sequence referring to a given entity within a document’. When features are derived from coreferences, then the context of every occurrence is looked at. Aliases of an entity means that the entity is written in several ways in a document.
However, it is very hard to find coreferences and aliases in a document: this problem can be compared to the problem of finding all the occurrences of an entity in a text. Another major problem is to recognize entity mentions in more than one document.
The last category distinguished by Konkol (2012) is external features. Wikipedia is the most important example in this class since it is updated manually so new entities are continuously
15 added, but it also includes redirection pages with different variations of spelling of the same entity.
In the table below, an overview is given of the features mentioned in the previous paragraph:
Konkol
Features
(2012)
Nadeau and Sekine (2007)
Orthographic features
x
Affixes
x
Words
x
Stemming
x
X
Lemmatization
x
X
Meta information
x
X
List-lookup features
x
Previous appearances
x
External features
x
Desmet and Hoste (2014) x
X x
X
Context
x
Coreferences
X
Aliases
X
Table 1. Overview of the features used for NER
2.1.3
Challenges and issues for NER
There are a lot of factors that can significantly influence the performance of named entity recognition systems. Desmet and Hoste (2014), for example, argue that one of the most important challenges for NER research is ‘to classify named entities into a hierarchy of subtypes instead of the coarse main type categories’ (Desmet and Hoste, 2014, p.2). A division into subtypes would be interesting for applications using Question Answering or Information Retrieval.
In Konkol’s opinion (2012) the most dominant factors to change the performance of NER systems are the language, domain and searched entities.
16
The language itself can be seen as the most important factor. Since the very first rule-based systems were specifically built for a certain language, it was impossible to adapt them to different languages. Supervised machine learning trains a model on an annotated corpus, which is usually accessible for languages such as English, but not for most minority languages such as Dutch. However, thanks to research most features can now be determined regardless of the language, and because of this, many systems can be used for different languages.
Next to language, also applying NER systems to a different domain can alter the performance according to Konkol (2012) and Desmet and Hoste (2014). In fact, the problem of domain adaptation can be seen as a very important challenge of NER (Konkol, 2012; Desmet and Hoste, 2014; Nadeau and Sekine, 2007). Usually, the systems are trained on one domain and, consequently, systems using those kinds of corpora do not perform very well for unseen genres (Desmet and Hoste, 2014). Although it would be perfect if they could also be used for other domains, this is not always the case as the performance of most NER systems reduces when the domains slightly differ.
Another important factor consists of the searched entities (Konkol, 2012). Some classes are easier to find than others: countries for example are easier than organizations, but this all hinges on the definition of the class. What is more, there are also several levels of details: some researchers use coarse-grained categories, others do not. At the MUC-6 for instance, three types of named entities were defined (namely Enamex, Timex and Numex) with two or three subtypes for each class, while at the CoNLL-2002 only four types were distinguished (namely Enamex, Timex, Numex and Miscellaneous). Desmet and Hoste (2014) on the other hand, created a fine-grained annotation scheme for named entities in Dutch. In their study, they describe six main types (namely persons, organizations, locations, products, events and miscellaneous entities), each with their own subtypes.
Another challenge can also be the metonymic usage of entities (Desmet and Hoste, 2014). Sometimes entities can be very hard to classify due to ambiguity: it is possible that entities like these can be part of more than one category according to their usage. Two forms of metonymy can be distinguished: nickname metonymy, which means that the name of an entity is applied to refer to another entity (e.g. a city referring to a government: Washington for example is a metonym for the federal government of the U.S.), and cross-type metonymy,
17 when several aspects of one entity are referred to (e.g. organizations and their facility, such as The White House). Metonymic usage detection can be very useful for other tasks such as coreference resolution tasks or information extraction.
2.2
NAMED ENTITY RECOGNITION ON TWITTER
Twitter is an online social networking service that allows users to not only publish tweets, i.e. posts with a 140 character limit, but also to read tweets of the people they follow. In addition to this, Twitter can be considered one of the fastest growing social networking services regarding users and documents: today Twitter counts 288 million monthly active users 2 .
What is more important for this study is that performing NER on Twitter data can be very useful in order to determine people’s interests. In the following sections, we will present some of the challenges and issues for NER in social media and we will give more information about targetable named entities.
2.2.1
Challenges and issues for NER in social media
Ritter et al. (2011) found that it is rather difficult to perform named entity recognition on tweets. In their study they give two reasons for this problem. Firstly, tweets usually contain a very wide range of named entity types: not only products, but also films, companies, etc. In addition to this, except for people and locations, almost all types of named entities appear rather infrequently in tweets. This means that a large dataset of manually annotated tweets will only include a few of the training examples. Secondly, since there is a 140 character limit for tweets, sufficient context information is not always available in order to define the entity types. This is shown in the few examples presented below, which were taken from the corpus collected in the framework of this dissertation (see Section 3 for more information).
-Wake me up when de septemberverklaring ends. - @flyingjonas Aha, leuk :-) Da's daar echt fijn ;-) En dikke merci! #drukdrukdruk - @VanDammeEDU @rvdwalle @DeMoorBart Dat delen we. Spijtig die onnodige polemiek.
2
(https://about.twitter.com/company) 25-03-2015
18 Ritter et al. (2011) also state that off-the-shelf named entity recognizers (i.e. tools trained on news corpora) perform poorly on tweets. The reason for this is that named entity recognizers trained on standard data rely heavily on capitalization which does not occur very often in tweets. In addition, there are a lot of different styles of capitalization in social media. Sometimes capitalization in tweets is considered informative, while on the other hand it can also be used for emphasis. In order to solve this problem, the context of the message can be implemented to decide if a tweet is capitalized in an informative manner such as a capitalization classifier (Ritter et al., 2011) which has shown to improve performance.
Another problem regarding named entity recognition for Twitter and other online services was addressed by Jung (2012). Several machine learning algorithms have been used to create NER systems. These all require that a certain amount of entities is labeled in advance to create a gold-standard dataset. This process occurs offline, something which is not suitable for social media such as Facebook or Twitter where texts are dynamically streamed.
Next to this, social networking services require a limited text, which we call a microtext, that users can only publish once. As mentioned before, the character limit of one tweet is only 140 characters. Consequently, many NER systems find difficulty in accurately identifying the context of each microtext. To solve this problem, Jung (2012) proposes to create microtext clusters so that contextual relationships between microtexts can be found, which has proven to improve the performance of NER tasks. His research shows that the best results in NER are obtained by using microtext clusters.
2.2.2
Targetable Named Entities
Present NER systems recognize named entities such as persons, locations or organizations very well. Still, this does not mean that many systems do not find great difficulty in recognizing other types of named entities such as films (for example Frozen or Harry Potter), songs or books. Named entity recognition for these kind of entities may become more and more important since social media is a place where people read about various topics and where they can share information about a wide range of interests. Therefore, named entity recognition for these ‘non-traditional’ entities can be seen as an important challenge. Ashwini and Choi (2014) present three possible problems than can occur in this case.
19 Firstly, non-traditional entities are more difficult to recognize since some of them tend to not appear in the form of noun phrases, which traditional named entities do. Nevertheless, entities like these, which are targetable named entities, can be part of a closed set so that the space is already known for certain types. By means of this set, a system is able to anticipate the general form of non-traditional entities.
Below we present a few examples of tweets containing targetable named entities, which are underlined.
- Nogal een zwak einde vind ik #witse - 'a song to dream away from ... #OfMonstersandMen #ILike :p - Het is van FC De Kampioenen: De Film geleden dat ik nog zó'n kutfilm heb gezien. #Lucy
Secondly, creating a large dataset containing enough entities can be very difficult. Ashwini and Choi (2014) argue that non-traditional entities compromise only a small portion of the entire data consisting of random streams which is not sufficient to build a good statistical model (Ritter et al., 2011; Gattani et al., 2013; quoted by Ashwini and Choi, 2014, p.1).
For this reason, they suggest that in order to create an efficient statistical model for these kind of entities, a more focused dataset has to be created.
A third problem suggested by Ashwini and Choi (2014) is that it is also very difficult to construct a good statistical model that can be used for a long period of time since nontraditional entities, in contrast with traditional ones, are updated more frequently: new books for example, are published almost every week. In this case, retraining new models and annotating more data for new entities can be very useful. However, retraining new models is a complex process and next to this, choosing which data to annotate may sometimes seem unclear. This is why a dynamically adjusted system has to be created without the constraint of having to be retrained.
In general, many NER systems have to retrain their statistical models when new entities arrive. Ashwini and Choi (2014) however, found a solution to this problem. In their study they present a new approach to recognize targetable named entities, which is more flexible for
20 entities that are updated regularly. regularly For this new approach, they hey distinguish three stages in the named entity recognition system (normalization, candidate identification and entity classification). They argue that these stages are general enough to analyse other targetable named entities.
Figure 2. The architecture of the NER system (Ashwini and Choi, 2014, 2014 p.3)
Another challenge in the NER process involves the so-called so called ‘noise’, which hyperlinks usually introduce, but also hashtags consisting of more than one word.
As Figure 3 shows, in the first step of the named named entitiy recognition system of Ashwini and Choi (2014) the tokens are normalized, normalized, which involves a few rules. Hyperlinks, punctuation marks and articles for example are first removed, removed as well as irrelevant hashtags, retweet markers and userIDs. As a result, resu tweets then appear ppear in their normalized form with only relevant information,, which brings us to an a advantage of normalization, normalization namely that it generates more relevant features. features
identification, which means that ‘candidate ‘candi entities are The second step is called candidate identification, identified by matching token sequences in the tweet to entries in a gazetteer’ (Ashwini and Choi, 2014). After this, every candidate entity is automatically tagged with four possible types
21 of matching, namely full title match, main title match, sub-title match and sequel match to be used as features for the third step.
Finally, in the third step or entity classification a statistical model trained on several features is used to determine whether each candidate entity is a valid entity or not. Three different types of features are used to create this model: firstly orthographic features such as hashtags, number of tokens or capitalizations. Features such as these are considered more useful than the other two features when tweets are written in a certain manner, for example when capitalization and hashtags are used in the right way. Secondly, n-gram features are used to indicate the context of the candidate entities, and the third type of features used for the statistical model are syntactic features, such as word-forms.
22 3
RESEARCH
The main purpose of this research is to explore to what extent we can automatically find and extract useful information using Natural Language Processing (NLP) techniques on a corpus of Twitter data. Based on tweets, we want to find out whether it is possible to determine a person’s political ideology, if he/she likes watching tv, what kind of job he/she has, etc.
In order to conduct this research we first of all needed to collect Twitter data from several active Twitter users. In a next step, these tweets were annotated and manually classified into main and subcategories. Based on these categories, a user profile for each test subject was created. These steps allowed us to express the interests of our test subjects. In a final step, we explored if basic NLP systems can automate or at least help to accelerate this process.
3.1.
Data collection
Before continuing, we would like to point out that this data collection was performed in collaboration with another Master’s student 3 working on a similar subject, i.e. sentiment analysis of the different aspects people tweet about.
In order to collect Twitter data for our research, we focused on active Twitter users, i.e. people that had posted at least 3,000 tweets in total and who tweet to a large extent in Dutch. In a first step we consulted our friends and also browsed Twitter.com for very well-known active Twitter users.
Our two main criteria were: •
To only contact working people, but because this resulted in too few test subjects, we decided to also include students;
•
3
To obtain an equal number of men and women.
Virginie Bardyn, It’s not just Twitter. Sentiment Analysis in Twitter: contrasting a manual and an automatic approach (2015)
23 Taking into account these two criteria, eleven men and thirteen women were contacted in order to obtain their permission to download and use their tweets for our research. An example mail sent to our test subjects can be found in Appendix 7.1. In order to respect our test subjects’ privacy, we will refer to them in the remainder of this dissertation as Man 1, Man 2, … Man 10 and Woman 1, Woman 2, … Woman 10.
As soon as we received their permission, around 3,000 tweets per person were automatically downloaded. This number is restriced to 3,000 because that is the maximum amount of downloaded tweets allowed by the Twitter API4. The table below reveals how many tweets we were able to download per Twitter user.
# tweets
# tweets Man 1
3246
Woman 1
3205
Man 2
3232
Woman 2
3235
Man 3
3237
Woman 3
3221
Man 4
853
Woman 4
2533
Man 5
3203
Woman 5
1984
Man 6
3004
Woman 6
3162
Man 7
3211
Woman 7
3236
Man 8
375
Woman 8
3232
Man 9
3185
Woman 9
3242
Man 10
2459
Woman 10
3205
Table 2. Overview of the test subjects’ dataset
As can be derived from these figures, for some users we were not able to download 3,000 tweets. Some of them did not use Twitter regularly enough or were new to Twitter. Nevertheless, we decided that this amount of data would be sufficient to conduct our research.
In a next step one hundred tweets per person were manually selected for annotation. In this respect, it was crucial to select tweets that would be of interest to this research: we specifically searched for tweets including a sufficient amount of named entities. The 2,000 4
Application Programming Interface: a defined way for a program to accomplish a task by retrieving or modifying data. The Twitter API is used to make websites, widgets, applications, etc.
24 resulting tweets were annotated into main and subcategories as will be described in closer detail in the next section.
3.2
Data annotation
As mentioned in the previous section, one hundred tweets per person were manually selected based on the presence of named entities. Since Twitter is an online community where people write about a very wide range of topics of interest, targetable named entities such as books or movies were also included.
In a next step, these 2,000 tweets were annotated by manually classifying the entities occurring in the respective tweets into main and subcategories. Subcategories are directly triggered by particular named entities whereas the main categories classify these entities into various topics. A subcategory was only defined when it appeared in the tweets of at least three of our individual test subjects, with the exception of subcategories that were clearly significant of a particular person. Figure 3 gives a schematic overview of the different main and subcategories.
The first main category that can be distinguished is the category Current affairs. The subcategories comprised with this category are economics, politics, culture, home affairs and foreign affairs. Below, a few examples of this main category can be found. 1. Exclusieve beelden van Jan Jambon 5 , vlak na zijn keisupermegaprovocerende stunt:
(Man 1; subcategory politics); 2. Mooie bekroning voor Kailash en Malala. Samen in de bres voor recht op onderwijs voor alle kinderen. #Nobelprijs #vrede (Woman 4; subcategory foreign affairs); 3. @FlorVDE wat maakt het uit, wat er al dan #nietinhetregeerakkoord staat? schrijf het er achteraf gewoon bij, N-VA-style! (Woman 10; politics).
5
For the overview of the main and subcategories, the named entities that triggered a particular subcategory were underlined. 6 For the overview of the main and subcategories, every hyperlink was replaced by the word in order to avoid confusion.
25
Current affairs
Professional activity
Public transport
Television
Music
•Economics •Politics •Culture •Home affairs •Foreign affairs
Sport
•Studies •Work Pastime
•Holiday •Travelling •Leisure •Games
Personal
•Health •Losing weight •Relationships •Birthdays •Disposition •Holidays •House •Religion
•NMBS •De Lijn
•Series •Films •Programmes •General
•Cyclo-cross •Athletics •Volleyball •Tennis •Box •Cycling •Football •Hockey •Horseback Horseback riding
•Performances •Radio •General
Clothes Literature
Technology
•Social media •Technology
Food
•Vegetarian •Cooking •General •Restaurant
Trivia
Figure 3. Overview of the main categories and subcategories used for the named entity ent annotation.
26 The second main category is Professional activity, which can be divided into the subcategories studies and work. Below a few examples are presented:
1. Yes yes yes, examenkaart bijna vol, nog 1 examen #GoGoGo (Man 4; subcategory studies); 2. Blijkbaar krijg je als interim-bediende ook een 13e maand. Nee, dat wist ik niet want niemand zegt mij ooit iets. Toch: hoera! (Woman 2; subcategory work); 3. Les om 8u is erg, inhaalles om 8u is erger. #ugh (Woman 5; subcategory studies).
Another topic is Sport. This main category is split up into the following subcategories: cyclocross, athletics, volleyball, tennis, box, cycling and football. As mentioned before, only the subcategories that appear in the tweets of at least three of the twenty Twitter users were used for this research. However, we decided to also incorporate in our research the two subcategories horseback riding and hockey since they proved to be significant for two test subjects, namely Man 5 and Woman 3. A few examples of the category Sport are:
1. Krrriebels, nerveusiteit en spanning. Binnen een uurtje beginnen de wereldruiterspelen van Normandië! #WEG2014 (Man 5; subcategory horseback riding); 2. Wat was het EK hockey fantastisch! Wou dat het nog bezig was :) Veel bijgeleerd in een zeer leuke setting. (SHDB). (Woman 3; subcategory hockey); 3. RT @directvelo: Jens Vandenbogaerde, zegekoning bij de beloften #directvelo (Man 8; subcategory cycling).
The fourth main category distinguished for this research is Public Transport. Since this is a topic many Twitter users write about, a subdivision was made between NMBS and De Lijn. Below a few examples of this main category are presented.
1. Grote fan van de supertram van @delijn. Eindelijk het gevoel dat er eens écht iets gedaan wordt aan het vaak chronische tekort aan plaatsen. (Man 3; subcategory De Lijn); 2. De NMBS vindt het niet nodig dat mensen na 23u nog thuisgeraken vanuit Antwerpen, dus hebben we Taxi Mama nog moeten bellen. #failnmbs (Woman 7; subcategory NMBS);
27 3. RT @Ultimedia: @nmbs deze actie is ronduit schandalig, geen trein, geen infornatie. Misschien moeten jullie pluimvee ipv mensen gaan vervoeren (Man 5; subcategory NMBS).
For this research, not only standard named entities were used but also targetable named entities or non-traditional entities. The biggest number of targetable named entities can be found in the main category Television, which can be split up into the subcategories series, films, programmes and a general category. Below, we present some examples of the category Television.
1. Kijken op @een hoe Jeroen Meus in een Afrikaanse township met de locals een schaap slacht en volledig opeet, het doet me wel iets. #goedvolk (Man 2; subcategory programmes); 2. Despicable me 2, u was fantastisch :D (Man 5; subcategory films); 3. Nieuwe parks en recreation! nieuwe episodes! en niemand vond het nodig mij dat te melden? (Woman 10; subcategory series).
Based on the Twitter data of the test subjects, another topic was distinguished, namely Food. Since there are many ways to talk and write about food, this category was divided into a restaurant, cooking, vegetarian and general subcategory. A few examples found in the tweets of the twenty Twitter users are:
1. Gaan lunchen in Faim Fatale in Gent, de moeite. (Man 2; subcategory restaurant); 2. ''Kot''made wraps. Njammiee (Man 4; subcategory cooking); 3. @MelkMuylle Yum. Kaimokka met slagroom. (Woman 9; subcategory general).
28 The next main category is Music, which includes quite a lot of targetable named entities as well such as song titles. The subcategories comprised with this category are radio, performances and a general subcategory. Below, a few examples are presented.
1. Wakker worden met Liszt & Chopin (hun muziek, niet hun opgedolven lijken) en pyjamaloos soezen in het ontluikende licht. Ochtend perfectie. (Woman 9; subcategory general); 2. Verdoemmeee @SherpaBeNL , beetje meewerken graag. #arcticmonkeys (Man 10; subcategory performance); 3. oooh, radioplus, wat ben je eigenlijk toch waardeloze crap! ik probeer hier wel naar de tijdloze 100 te luisteren he! (Woman 10; subcategory radio).
Another frequently occurring category is Pastime. The following subcategories can be distinguished: holiday, travelling, games and leisure. Below, a few examples of this main category are mentioned.
1. And that's why England is awesome. #uktrip (Woman 5; subcategory travelling); 2. Genieten van t'zonnetje :D (@ Scott's Bar w/ 5 others) (Man 4; subcategory leisure); 3. Super Bad Mario 2! JA! (Man 1; subcategory games).
Twitter is a social networking site where people share a great part of their private life. For this reason, the main category Personal was distinguished as well. The most frequent subcategories comprised with this category are health, losing weight, relationships, birthdays, disposition, holidays, religion and house. Below, some examples are presented.
1. De lijn tussen in hele goeie doen zijn en ziek worden is toch heel dun. Noodgedwongen even rusten om er daarna een mooi einde aan te breien! (Man 8; subcategory health); 2. Godverdomme. Ik heb een nieuw gewichtsdoel voor ogen en heb nog niet eens het oorspronkelijke doel bereikt. KUT. (Woman 8; subcategory losing weight); 3. Volgens mij wil de dalai lama geen opvolger omdat hij het ook niet meer kan verdragen dat er mensen zijn die ''daila lama'' zeggen. (Man 1; subcategory religion).
29 Nowadays, Technology is a very popular topic to write about as can be derived from the 2,000 selected tweets. This category was divided into a subcategory social media and a general subcategory. Some examples of this category are:
1. Mashable.. wat een voddenwebsite is dat geworden zeg :/ #crap (Man 5; subcategory general); 2. @Alineagain Ik haat Facebook! (Women 7; subcategory social media).
Next to the previously mentioned main categories, Literature was distinguished as a category as well. In addition, it does not not have any subcategories. Below some examples are presented.
1. Heel benieuwd naar de nieuwe Houellebecq, vanaf morgen in de boekhandels. (Man 2); 2. Well played, De Standaard, well played. (Woman 8); 3. @LesleyArens 'The Circle' van Dave Eggers net gelezen, is super! (Man 7).
Another main category without any subcategories is Clothes, about which especially women write rather frequently. A few examples that were found in the tweets of our test subjects are presented below.
1. @_Prince_R @TomDeeCee Ik ben altijd erg opgezet met juwelen en lingerie. (Woman 9); 2. Fan van Essentiel en ook voor 40 plus @CoolsKat @reyerslaat :-) (Woman 6).
The last main category is called Trivia, which has a weather subcategory. A few examples are:
1. Herfstzon #zalig #gent (Man 3); 2. pfff, tis om er depressief van te worden :( #HetWeer (Man 4).
30 In the process of manually classifying the named entities occurring in the tweets into main and subcategories, we also encountered some doubtful cases of which a few are presented below:
1. Fan van Essentiel en ook voor 40 plus @CoolsKat @reyerslaat :-) (Woman 6; main category Clothes).
We decided to classify this tweet in the category Clothes since in this case the word ‘Essentiel’ is of greater importance than ‘@reyerslaat’.
2. Tom Helson @CafeCorsari machtig moedig en warm (Woman 6; subcategory performances).
This particular tweet was categorized in the main category Music given that we believed that the essence of the tweet is about the performance in the television programme Café Corsari, and not about the programme itself.
3. Wees toch eens blij dat die stream van Apple niet werkt, anders zou je plots zomaar met U2 geconfronteerd kunnen worden (Man 1; main category Technology).
We state that this tweet is about the malfunction of the Apple stream, so, as a result, this tweet was classified in the category Technology.
31 3.3 User profiling
Based on the gold standard annotations, we first manually created a user profile for each test subject.
In a next step, the Twitter data was automatically processed using the LeT’s Preprocessing Toolkit (LeT’s). To this purpose, we used all the tweets we had available per test subject, i.e. around 3,000 tweets (see Table 2 for the exact numbers). With LeT’s, all Twitter data was tokenized, part-of-speech tagged and, in a final step, also named entity recognition was performed. The principle of NER was explained in Chapter 2. Tokenization is ‘the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens’. Part-of-speech tagging is then ‘the task of assigning each token in a text its correct grammatical category (e.g. noun, verb, adjective, adverb, etc.) depending on its context’ (Van de Kauter et al., 2013, p. 105).
Based on the output of this system, the top 100 most frequently occurring nouns and named entities were derived for every individual test subject. In a final step we assessed whether having the top 100 most frequently occurring nouns and named entities available allows us to derive the same information as present in the manually created user profiles.
32 4
USER PROFILING ANALYSIS
For this research, a user profile was manually created for each test subject, which is presented in Section 4.1. In a next step, we assess whether an automatical analysis allows us to give the same information (see Section 4.2).
4.1
Manual creation of user profiles
For our research, we manually selected one hundred tweets per person that included a sufficient amount of named entities, which were then annotated into main and subcategories. For the manual creation of the user profiles, which is the next step, we took into account those main categories which appeared at least ten times in the tweets and, as a consequence, which are clearly significant of a particular person. As soon as a certain main category frequently appeared in the Twitter data, the subcategories were explored as well. An overview per test subject of the frequencies of the main categories is presented in Appendix 7.2.
In this section, all twenty user profiles are described in closer detail.
4.1.1
Men
The table below reveals the frequency of the main categories which appeared in the Twitter data of the men, the most significant categories of which were indicated in bold.
Man 1
Man 2
Man 3
Man 4
Man 5
Man 6
Man 7
Man 8
Man 9
Man 10
Current affairs
18
13
2
7
2
4
7
7
3
/
Professional
4
/
3
15
3
7
7
1
5
10
Sport
3
/
4
3
38
31
4
66
35
28
Public transport
/
/
17
7
8
2
1
/
3
1
Television
21
23
37
8
10
16
7
/
7
24
Food
4
12
3
11
3
/
4
1
1
15
Music
9
29
9
7
2
8
3
/
22
9
Pastime
8
1
2
14
6
/
1
4
1
3
activity
33 Man 1
Man 2
Man 3
Man 4
Man 5
Man 6
Man 7
Man 8
Man 9
Man 10
Technology
3
4
2
8
10
/
/
/
1
/
Literature
2
6
1
/
4
1
18
/
/
/
Clothes
/
/
/
/
/
/
/
1
/
/
Weather
/
/
1
2
1
/
/
7
3
/
Table 3. Overview of the main categories of the men
•
Man 1 tweets a lot about Television (21 tweets) and Current affairs (18 tweets). Based on the analysis of the subcategories, we can state that he is quite fond of watching science fiction films and, in addition to this, he also takes a particular interest in politics (both 12 tweets to be exact). Having examined the frequency of this subcategory, we can assume that he is often frustrated with the political system in Belgium, which the following tweet illustrates ‘Ik vind dat er in die nieuwe regering vooral te weinig ménsen zitten’.
•
Man 2 seems particularly interested in Music, since this category accounts for almost 30 % of his Twitter data (29 tweets to be exact). Having analysed the different subcategories, we can establish that he very often turns on the radio. Next to this, we could presume that he has an interest for Food and that he spends quite some time in front of the tv (respectively 12 and 23 tweets). Another topic he seems to be quite interested in is Current affairs, which was recognised in 13 tweets. Basing on the analysis of the subcategories, we can assume he takes a keen interest in culture (10 tweets), which can be illustrated by the following tweet ‘Wat moet mee naar het Vlaams Paviljoen op Wereldtentoonstelling 2020 in Dubai? Chokri Ben Chikha ’.
•
It is clear from his Twitter feed that Man 3 likes to watch Television (37 tweets). Based on the analysis of the subcategories, we could conclude that één might be his favourite tv channel. On top of that, it can be assumed that Man 3 is often frustrated with Public transport in Belgium, especially with the NMBS (17 tweets to be exact).
34 •
Man 4 seems to be mostly concerned with his Professional activities, which account for 15% of his tweets. What is more, after having examined his Twitter data, we can state that he is a student (13 tweets) of which the following tweet is a good indication ‘Voor alles geslaagd :D’. Next, the main category Pastime has a frequency of 14%, which could be an indication of a quite busy private life. In addition, since the category Food has a frequency of 11%, we can assume that he is quite interested in having a healthy lifestyle and regularly goes out for dinner.
•
Man 5 apparently has a broad range of interests, since almost every main category was distinguished in his tweets. Based on the 38% frequency of the category Sport, he can be considered a real sports fanatic, and in particular a fan of horseback riding (38 tweets). On top of that, he not only seems specifically interested in following programmes (8 tweets) such as Vive le vélo and Villa Vanthilt, but a 10% frequency of the category Technology could be an indication of him being very active on social media as well and hence being a frequent user of technology.
•
Just as the other men, Man 6 seems to tweet quite often about Sport (31 tweets). Having examined the frequencies of the various subcategories, we could assume that he is particularly interested in football, which accounts for more than 20% (21 tweets to be exact). In addition to this, Television could also be considered one of his main interests (16 tweets), and particularly programmes (13 tweets) such as De slimste mens ter wereld and Goed volk.
•
Man 7 appears to be mainly interested in Literature since this category accounts for almost 20% of his tweets (18 tweets exactly). What is more, after having analysed the subcategories that were recognised in his Twitter feed, we can conclude that he specificaly likes to read opinion pieces (10 tweets), which is illustrated by the following tweet ‘RT @FVMas: De altijd scherpzinnige @wduyck lijkt oplossing al gevonden te hebben. Knappe opinie. ’.
•
Man 8 can be considered the most passionate of all men about Sport given that this main category was distinguished in 66 tweets. The overview of the different subcategories reveals that he is a real cycling enthusiast, since this is the only sport he seems to tweet about. Next, the Weather is also a category that he frequently mentions
35 in his tweets (namely in 7 tweets), which is logical considering the fact that this is often linked with cycling, as can be derived from his Twitter feed. •
Considering the fact that the category Sport has a 35% frequency in his Twitter data, we could assume that Man 9 is a real sports fanatic. After analysing the subcategories, we can state that it is very obvious that football is his main interest, seeing that it accounts for more than 30% of his tweets (32 tweets exactly). Furthermore, another interest of his appears to be Music (namely 22 tweets). Tweets such as ‘Hoera nieuwe cd's! #metronomy #radiohead ’ indicate that he regularly buys new cd’s.
•
Man 10 does not seem to write about a wide range of interests. Having analysed the main categories that were recognised in his tweets, we can conclude that he mostly writes about Sports (28 times), and what is more, only about football. Furthermore, Man 10 also spends quite some time watching tv (24 tweets) as can be derived from the Twitter data. Basing on the frequency of the subcategories, we can conclude that he likes to follow series (12 tweets) such as Dexter, Suits and Modern Family. In addition to this, he also seems to be quite a food lover given that this category accounts for 15% of his tweets.
4.1.2
Women
The table below reveals the frequency of the main categories which appeared in the Twitter data of the women, the most significant categories of which were indicated in bold.
Wom
Women
Women
Women
Women
Women
Women
Women
Women
Women
en 1
2
3
4
5
6
7
8
9
10
Current affairs
1
10
1
46
2
38
1
10
2
3
Professional
2
1
9
12
4
2
10
1
2
3
Sport
14
3
61
6
37
7
6
/
/
1
Public transport
5
1
/
1
/
/
6
/
/
/
Television
32
7
2
/
12
9
29
10
2
10
Food
11
17
1
1
3
3
7
/
16
22
activity
36 Wom
Women
Women
Women
Women
Women
Women
Women
Women
Women
en 1
2
3
4
5
6
7
8
9
10
Music
15
9
/
4
23
6
17
13
3
4
Pastime
1
5
2
2
5
1
7
1
1
9
Personal
6
3
/
1
2
/
1
4
5
5
Technology
/
12
/
/
1
2
3
4
3
/
Literature
/
/
/
1
/
1
1
13
8
1
Clothes
/
3
/
/
/
1
2
1
5
/
Weather
/
/
/
/
1
/
/
/
1
1
Table 4. Overview of the main categories of the women •
Woman 1 seems to watch quite a lot of Television since this category accounts for more than 30% of her Twitter feed (32 tweets exactly). The overview of the subcategories indicates that she particularly likes watching television programmes (23 tweets) and after having analysed this subcategory, we can conclude that she appears to be quite fond of the Eurovison Song Contest, which is illustrated by the following tweet ‘Ik geef het niet graag toe, maar verdorie, Nederland. Met jullie stomme goeie liedjes ook weer. #Eurovision’. Next, it is also clear that Woman 1 is interested in Music as well (15 tweets), and what is more, tweets such as ‘'k ben eens naar Ibiza geweest, en ik heb de Vengaboys daar geen enkele keer gehoord, en ik was een beetje teleurgesteld. #Top500vande90s'’seem to be an indication of her love for 90’s music. In addition to this, we found prove of her having a passion for cooking, since the category Food was recognised in 11 tweets. Finally, it can also be assumed that she loves to either practice Sports or to watch football games on tv (14 tweets). After having analysed this main category, we can say that she is especially interested in the development of the World Cup (14 tweets), the presumption about which can be confirmed by the following tweet ‘United States of Whatever. #BELUSA’.
•
After analsing the Twitter data of Woman 2, we found out that she appears to be particularly interested in Food and cooking (17 tweets). In addition, the main category Technology accounts for 12% of her tweets, which might be an indication of a quite substantial presence on social media. On top of that, she takes a great interest in Current affairs as well (10% of her Twitter feed), and in particular in foreign affairs, which can be illustrated by the following tweet ‘David Cameron spreekt zich uit over
37 Verviers. Opportunist, niemand vraagt je wat. De eigenlijke boodschap: stem niet op Farage maar op mij’. •
Basing on the frequencies of the main categories that were recognised in the tweets of Woman 3, it can be concluded that she does not write about a very wide range of interests. However, what is remarkable is the 61% frequency of the main category Sport in her Twitter data, on the basis of which we can state that she can be considered the most passionate about sport of all women. After having analysed the distribution of the various subcategories, we can conclude that she is particularly interested in hockey and cycling (respectively 18 and 22 tweets), illustrated by the two following tweets: ‘Wie gunt dit Greg Van Avermaet niet? Klasserenner. #EnecoTour’ and ‘BAM! Knal erop ;) Echt jammer voor #hockeybe... @MatthDV: @Lorrietje Ze moeten hun geld sparen voor het volgende golftoernooi he ;)’.
•
Considering the fact that the category Current affairs has a 46% frequency in her Twitter data, we could assume that Woman 4 likes to keep up with current events. The analysis of the subcategories reveals that she is particularly interested in politics, which is illustrated by the following tweet ‘Afscheid van Europa voor @HVanRompuy. Prachtig werk geleverd. Tijd voor nieuwe horizonten. @cdenv’. Further, it is quite clear that Woman 4 has a job (12 tweets) given that the main category Professional activities was recognised in a total of 12 tweets.
•
Woman 5 seems to be quite interested in Sport, since this main category accounts for almost 40% of her tweets (37 tweets exactly). Having analysed the subcategories, we could state that she is mainly interested in football (14 tweets). Still, Woman 5 does not only appear to be a big sports lover, but since the main category Music has a frequency of 23%, we can assume she regularly likes listening to music. What is more, it can be concluded that she very frequently goes to a festival (7 tweets) which can be illustrated by the following tweet ‘Nog aan 't nagenieten. Dat het perfect was. Dat ze 505 speelden. En dat er nooit iets meer waarheid geweest is dan I wanna be yours. #RW14’. Further, this test subject also tweeted a lot about Television (12 tweets), and in particular about programmes (8 tweets) such as De slimste mens ter wereld and Belgium’s got talent.
38 •
Based on her Twitter feed, Woman 6 appears to be interested in a broad range of topics. Taking into account the frequencies of the main categories, it is quite clear that she is mainly interested in Current affairs, since this category was recognised in 38 tweets, which accounts for almost 40% of her Twitter data. What is more, we can pressume she has a job in politics, since this subcategory was recognised in 28 tweets, and which can be confirmed by the following tweet ‘Een sterke @RuttenGwendolyn brengt ons straks vooruit’.
•
Considering the fact that the category Television accounts for almost 30% of the Twitter feed of Woman 7 (29 tweets exactly), we could assume that she quite regularly seems to watch tv. The analysis of the subcategories reveals that she is mostly interested in televison programmes such as De Rechtbank, Familieraad and Zo man zo vrouw. Next, the analysis of the main categories also reveals that she is a student, since this is the only Professional activity she seems to tweet about (namely 10 tweets). On top of that, almost 20% of her Twitter feed (17 tweets to be exact) was classified in the main category Music, which can be an indication for her keen interest in music, and in particular for artists such as Stromae and Sam Smith.
•
Taking into account the various main categories, we can state that Woman 8 appears to be interested in a rather broad range of topics. The 13% frequency of the category Music might prove her love for music, which is confirmed by the following tweet ‘Wat een nummer. Na 10 jaar nog niets van sleet op. ’. Next, also the category Literature was recognised in 13 tweets. The analysis of the subcategories reveals that she is often quite frustrated with the publication of some articles, which can be illustrated by the following tweet ‘@schwalbekoenig Wat een degoutant, sensatiegericht artikel alweer. Bah.’. In addition to this, Current affairs can also be considered one of her main interests (10 tweets), and in particular home affairs (5 tweets). Finally, she seems to tweet a lot about Television as well, which accounts for 10% of her tweets.
•
From the analysis of the Twitter feed of Woman 9, we found out that she could be considered one of the few women that loves Food seeing that this main category accounts for almost 20% of her tweets (16 tweets exactly). What is more, after having analysed the various subcategories in her tweets, we can conclude that she gives her
39 followers cooking advice on a regular basis. This presumption can be illustrated by the following tweet ‘@WieteM @FakePlasticRuby Of in hele dunne plakjes even koken en dan als lasagnebladen gebruiken( Yum in combo met blauwschimmelkaas)’. •
Considering that the category Food has a frequency of 22%, we can state that Woman 10 as well is quite a foodlover. In addition to this, another interest of her is Television since this main category accounts for 10% of her Twitter data. What is more, tweets such as ‘te veel schmink, te weinig pintjes. die mensen weten niet hoe televisiekijken moet! #hallotelevisie’ are an indication that she more than often likes to give her opinion about programmes in particular (4 tweets).
In the process of manually creating user profiles for each test subject, it was notable that almost every man tweets, to a large extent, about the main categories Sport and Television. After having examined the distribution of the various subcategories, we could conclude with certainty that football can be considered the main interest of most men. Furthermore, what can be considered quite logical as well is that a predominant majority of the men does not appear to tweet about Clothes, as was expected in advance. In fact, only one tweet about this category was recognised in their Twitter data.
Based on the analysis of the main categories that were recognised in the Twitter feed of the women who participated in this research, it can be concluded that they mainly write about other topics then most men. Sport for example, and in particular football, seems to account for a lower frequency in their tweets, which can also be concluded for the category Public transport. Still, it is rather notable that 50% of all women regularly seems to tweet about clothes.
However, we would like to emphasize that these findings only concern the ten men and ten women who participated in our research and, as a consequence, these do not relate to men and women in general.
40 4.2
Automatic analysis
As mentioned before, based on the output of the LeT’s Preprocessing Toolkit, a top 100 of most frequently occurring nouns and named entities was derived for each test subject. In this section, we assess whether using this top 100 of most frequent nouns and named entities allows us to obtain the same information as the manually created user profiles. The top 100 most frequent nouns and named entities for each test subject is presented in Appendix 7.3.
4.2.1 •
Men
The manually created user profile reveals that Man 1 is particularly interested in Current affairs and Television (respectively 18 and 21 tweets). The automatic analyses on the other hand, partially give us the same information. It is quite clear that he writes rather often about Current Affairs, and in particular about politics, since words as artikel, Jung-un, Obama and regering all appear in the top 50 of most frequently occurring nouns and named entities. However, the same thing cannot be said for the main category Television. In fact, having analysed the most frequent nouns, we can state that it is not possible to conclude that he is a very passionate television viewer, probably because he writes about a very broad range of specific series or programmes which are mentioned only a few times and, consequently, all appear in the bottom of the list. The named entity based analysis however, does provides a better insight into what kind of series and programmes he follows since words as VRT, Jurassic Park and De Kampioenen are rated rather high.
•
Taken into account the manual analysis of the Twitter data of Man 2, we can state that two of his main topics of interest are Music and Television (respectively 29 and 23 tweets), which is approximately the same as the results of both automatic analyses. After having examined the lists of frequent nouns and named entities, we can conclude that he might be a real music lover, the presumption of which words such as podcast, spotify, radio and Shazam are an indication. Still, from examining the output of the two automatic analyses, we found out that it is not as easy to determine that he regularly watches television as well, since most films and programmes only are to be found in the bottom of the top 100 most frequently occurring nouns and named entities.
41 •
Both manual as well as the automatic analyses indicate that Man 3 spends quite some time in front of the tv given that this main category was recognised in 37 tweets and considering the fact that the lists of frequently occurring nouns and named entities is filled with films and television programmes such as Iedereen Beroemd and Dag Sinterklaas. However, not only films and programmes have a rather high frequency in his Twitter feed, but also De lijn and nmbs as well as other words concerning Public transport in Belgium, which reveal his frustration with both public transport services.
•
We could immediately derive from the analyses of the tweets of Man 4 that he is a student, given that the majority of the tweets and frequently occurring nouns and named entities as well deal with school and university life, which is illustrated by the following tweet ‘En het laatste examen is gedaaaan! #vakantie :p’. In addition to this, a great deal of words concerning cooking and Food such as Amadeus, bar and wraps that were recognised in his tweets, reveal that he is clearly interested in having a healthy lifestyle and regularly goes to a restaurant.
• On the basis of the results of all three analyses, it can be concluded that Man 5 takes very particular interest in horseback riding which accounts for 38% of his feed, and taken into consideration the predominant presence of words such as Ludo Dierickx, paardensport, dressuur and paard in the lists of most frequent nouns and named entities. What is more, the analyses not only reveal him being a true sports lover, but some words such as tv, media and Apple indicate that he is quite a Technology enthusiast as well as a fervent tv watcher. •
The manually created user profile of Man 6 only reveals two main categories, namely Sport and Television (respectively 31 and 16 tweets), which can be illustrated by the presence of words in his feed such as sportwereld_be, Club, Goed volk and De slimste mens ter wereld. The automatic analyses based on nouns and named entities however are more indicative of his Professional activities and interests: that is, both analyses display a rather high ranking of the words Brugge and Roeselare and, as a result, we can assume that he either works or lives in any of the two cities. In addition to this, the result of the named entity based analysis also includes quite some newspapers, which are indicators of another but less apparent category, namely Current affairs.
42 •
Basing on the manual analysis of the Twitter data of Man 7, we can conclude that he seems to be mainly interested in Literature since this category accounts for almost 20% of his feed. Next to this, the noun-based automatical analysis also reveals that he regularly tweets about his Professional activities taken into account the presence of words as Gent, UGent and studies. The named entity based analysis on the other hand, does not exactly reveal various topics of interest of his but rather quite a large amount of cities and countries such as Duitsland, Frans and Brugge on the basis of which it can be concluded that he might have a passion for travelling.
•
Man 8 can be considered the most passionate about Sport of all men, and in particular about cycling since this is the only sport he seems to tweet about. What is more, all three analyses obtain the same result: his Twitter feed is made up of almost solely words with reference to cycling (for instance training, fiets, koers and tijdrit).
•
Based on the results of all three analyses, we can assume Man 9 is a big Sports fanatic, just as the majority of the men who participated in our research. What is more, after analysing the various subcategories, and considering the fact that almost 50% of the top 100 of most frequently occurring nouns and named entities deals with football, we can state that it can be considered his favourite sport. Next, as can be derived from the manual and noun-based analysis not only Sport but Music as well, which accounts for 22% of his feed, is a topic he seems to tweet about regularly. The named entity based analysis on the other hand, reveals another interest of his, namely the Russian culture, which can be illustrated by the following words Rusland, Russisch and Novosibirsk.
•
The results of all three analyses immediately reveal that Man 10 does not seem to tweet about a wide range of interests. After examining the manual and noun-based analysis, we can conclude that he mainly writes about Sport, and what is more, only about football, which accounts for 28% of his feed. In addition to this, we can assume that he spends quite some in front of the tv as well, which can be illustrated by words as VT4 and FIFA. However, the named entity based analysis only demonstrates his love for cooking and Food (some examples are resto, Starbucks and Spar), which was revealed by the manually created user profile as well.
43 4.2.2 •
Women
On the basis of the manual analysis, we can conclude that Woman 1 is mainly interested in Television, Music, Food and Sports (respectively 32, 15, 11 and 14 tweets). The automatic analyses based on the most frequently occurring nouns and named entities however, only reveal a focus on tv of which De grote sprong, Miss België and De slimste mens ter wereld are some examples, as well as on Music, which is illustrated by the following tweet ‘Ja, godverdomme, ja! In-FUCKING-somnia van Faithless op 1 in de #Top500vande90s! Alles kapot!’.
•
Basing on the frequency of words concerning Technology and social media such as Facebook or smartphone in the Twitter feed of Woman 2, we can consider her to be quite a technology addict and in addition, we assume that she is very active on social media as well, which is confirmed by the results of all three analyses. In addition to this, the manual analysis also revealed a passion for Food and an interest for Current affairs (respectively 17 and 10 tweets). Next, based on the examination of the lists of most frequent nouns and named entities, it can be assumed that she has quite a passion for travelling since these are filled with countries and cities such as Parijs, China and New York.
•
The manual as well as the automatic analyses all have the same outcome, namely that Woman 3 can be considered the most passionate about Sport of all women, and in particular about hockey and cycling, which can be illustrated by the following tweet ‘RT @ReemtBorcherts: Sneu voor de Belgische hockeymannen. Ze leveren hun beste prestatie ooit op CT, maar zakken op de wereldranglijst. M ...’. Still, the analyses of her Twitter feed not only reveal her being a real sports fan, but some high ranked nouns and named entities concerning school and studies such as Gent and les indicate that she is presumably a student.
•
On the basis of the analyses of the Twitter feed of Woman 4, we can state that she has one major interest, that is Current affairs since this category accounts for 46% of her feed. In fact, the top 100 of the most frequently occurring nouns and named entities that were recognised in her Twitter data is made up of almost solely words concerning
44 politics such as minister, Vlaanderen and mobiliteit. As a consequence, it can be assumed that she either works in politics or takes particular interest in the topic. •
After having examined the results of the manual and automatical analyses of the Twitter feed of Woman 5, we can conclude that three main interest could be distinguished, namely Sport, Music and Television (respectively in 37, 23 and 12 tweets). Based on the analysis of the various subcategories, we can state that she is mainly interested in football (14 tweets) and in addition, she very frequently goes to a festival as well. What is more, the noun-based analysis reveals that she is probably a student considering the fact that the top 40 includes words as examen, klas and kot.
•
Based on her twitter feed, Woman 6 appears to be interested in a broad range of topics. Still, taking into account the frequencies of the main categories and the lists of most frequent nouns and named entities, it is quite clear that, similar to Woman 4, she probably has a job in politics as well or it can be considered her main topic of interest. Some examples of this presumption are Vlaanderen, woonbonus and Parlement, which can also be illustrated by ‘Meest progressieve regering ooit @openvld meest hervormde regering ooit’.
•
The manually created user profile of Woman 7 reveals that she tweets about quite a lot of various topics, namely Television, Music and Professional activities, that is her studies, of which respectively 29, 17 and 10 tweets were recognised in her feed. However, both automatic analyses seem to not have the same outcome. In fact, it is very difficult to automatically derive some interests of her since the huge amount of userIDs and names present in the lists of frequently occurring nouns and named entities is striking. This can be an indication that she probably spends quite some time on Twitter retweeting and reacting to the tweets of the people she follows.
•
The manually created user profile of Woman 8 managed quite well to give an idea of her main interests, which are Music, Literature, Television and Current affairs (13, 13, 10 and 10 tweets to be exact). In the automatic analyses on the other hand, high ranked words only point out a keen interest in politics (for example De Wever, N-VA and UPolitea) and what is more, probably even her political ideology.
45
•
The automatic analyses and the manual one each create a different profile of the interests of Woman 9. Based on the results of the manual analysis she is one of the few women that has a serious interest in Food and cooking (16 tweets), but the automatic analyses both reveal quite a large amount of userIDs in her Twitter feed, which means that she presumably spends more time on Twitter retweeting than actually keeping her followers up to date about her life. This finding is also sustained by another result of the named entity based analysis, namely that she appears to be into social media and Technology given the high ranking of words as Facebook, Twitter and Google.
•
As mentioned in the previous section, the manually created user profile of Woman 10 gives an insight in what her main interests could be, namely Food and Television (respectively 22 and 10 tweets). However, once more the automatic analyses contradict the result of the manually created profile: basing on the quantity of userIDs and names in the feed of Woman 10, we can, just as Woman 9, consider her someone who is fond of retweeting and reacting to tweets. In addition to this, after examining the results of the named entity based analysis, we can also assume that she loves travelling and exploring other countries given the high frequency of words as Londen and Eindhoven.
46 5
DISCUSSION AND CONCLUSION
Messages posted on social media such as Twitter can sometimes be considered too informal for NLP purposes. However, they also prove to be excellent sources of information since nowadays, millions of people have a Twitter profile and, as a consequence, their opinions and interests can be found online. The main purpose of this dissertation was to explore whether it is possible to manually and automatically find and extract useful information by using NER and a noun-based analysis on a corpus of tweets in order to determine the interests of several active Twitter users, such as a person’s political ideology, what kind of job he/she has, etc. By means of a literature study we first gave an overview of NER on standard text as well as on Twitter. In addition, different learning methods and features were discussed and several issues and challenges researchers face when performing NER were presented.
Consequently, we formulated the following research questions:
-
Can we manually determine the interests of our test subjects based solely on their tweets?
-
Based on the output of the LeT’s Preprocessing Toolkit, can we automatically find and extract useful information about personal interests?
-
Can the automatical analysis automate or at least help to accelerate the manual process?
To determine the interests of our test subjects, we first of all needed to collect Twitter data from several active Twitter users, i.e. people that had posted at least 3,000 tweets. Taking into account two criteria (namely to obtain an equal number of men and women, and to contact working people as well as students) ten men and ten women were contacted in order to obtain their permission to use their tweets for our research. Next, 3,000 tweets per person were automatically downloaded, of which one hundred tweets per person were manually selected for annotation based on the presence of named entities. The 2,000 resulting tweets were then annotated by classifying the named entities into main and subcategories. The main categories classify the various named entities into several topics, whereas the subcategories are then triggered by particular named entities. For our research, a subcategory was only defined when it appeared in the tweets of at least three of our test subjects, except the subcategories that were significant of a person.
47 In a next step, we manually created a user profile for each test subject. For the creation of these profiles we took into account the main categories that appeared at least ten times in the tweets and which can thus be considered significant for a person’s interest. In this manner, i.e. based on tweets, we were able to manually determine the interests of our test subjects. In the process of creating those profiles, it was remarkable that almost every man tweets about the main categories Sport and Television. Based on the distribution of the various subcategories, we can state that football can be considered the main interest of most men. The women who participated in our research, appeared to write mainly about other topics than most men. Sport and Public transport for example had a much lower frequency in their tweets. However, the category Clothes was recognised in 50% of the women’s Twitter feeds.
For our second research question, all collected Twitter data, around 3,000 tweets per person, was processed using the LeT’s Preprocessing Toolkit. Based on the output of this system, we derived a top 100 of most frequently occurring nouns and named entities. In addition, we sought to explore whether having this top 100 would allow us to derive the same information about the Twitter users as from the manually created user profiles.
Having examined the most frequent nouns and named entities, we can state that by using the output of the LeT’s Preprocessing Toolkit it is definitely possible to automatically find and extract useful information about personal interests. However, after manually creating user profiles and automatically determining personal interests, we can conclude that a manual analysis appears to be much more accurate than automatic analyses in creating a complete profile of our test subjects. In addition, and as could be expected, manual profiling gives a much better and more extensive insight into the likes and interests of people.
Comparing both forms of automatic analysis, noun-based versus named entity based, we can state with certainty that the noun-based analysis provides us with more detailed and specific information about the interests of our test subjects compared to the named entity based analysis. The main problem with the frequent named entities was that more than 50%, in some cases more than 60%, of the named entities appeared to be userIDs, names, cities, countries or fillers, on the basis of which it was very difficult, sometimes impossible, to determine the main interests of the Twitter users. Cities for example, can be an indication of a passion for travelling, which is a subcategory of Pastime, as well as an interest for Current affairs. Next, names and userIDs indicate the fact that he/she likes to retweet and react to other tweets, but
48 proved to be useless in searching to formulate a certain interest. In this respect, a manual analysis of the user profiles is quite useful since it is in some cases necessary to look at the context of the named entities in order to be able to determine certain interests.
We think it might be better to take into account a top 200 or 300 of frequent nouns and named entities, as some very specific indications for certain categories (for example certain programmes or films) are logically mentioned only a few times and, as a result, only appear in the very bottom of the lists. Therefore, considering a more extensive list might provide a better insight into people’s interests.
Nevertheless, in certain cases, through an extensive analysis of the top 100 nouns and named entities of each person, we were sometimes able to determine additional interests which were not always recognised in the manual analysis of the Twitter data.
For our third and last research question we wanted to explore whether using automatic analysis can automate or at least help to speed up the process of determining personal interests based on tweets. It can be concluded that, as mentioned before, automatic analyses based on frequent nouns and named entities can certainly help to find and extract useful information. However, these analyses do not seem to provide a complete and detailed profile of a particular person but rather one or two main interests at the most.
In conclusion, we can state that automatic analyses can in some cases contribute to speeding up the process of determining personal interests, however, they cannot be used to completely automate the process.
However, in believing that the tools used for our research still leave room for improvement, we made some recommendations for future work. Firstly, it can be concluded that it would be interesting for future work to pass on the findings of our research to the test subject. In that way we can explore if the manually created user profiles as well as the automatic analyses of their tweets, reflect in some way their main interests and in addition, if these succeed in giving a clear insight into their life. Another recommendation for improving the tools would be to keep the tools from splitting first names and last names, such as Kris and Peeters, as well as splitting the names of countries or cities which consist of two names, such as New and York or Great and Britain. Finally, it would also be interesting to remove fillers, conjunctions,
49 articles and punctuation marks such as question marks or full stops, on the basis of which it proved to be difficult to determine the interests of the test subjects.
50 6
BIBLIOGRAHY
Ashwini, Sandeep (& Choi, Jinho D.). (2014). Targetable Named Entity Recognition in Social Media. Cornell University Library. Bast, Hannah (& Butt, Waleed). (2011). Named Entity Recognition, Efficient Natural Language Processing. Published PowerPoint. Albert-Ludwigs-Universität Freiburg. Culnan, Mary J. (& McHugh, Patrick J. &Zubillaga, Jesus I.). (2010). How large U.S. Companies Can Use Twitter and Other Social Media to Gain Business Value. MIS Quarterly Executive. Volume 9, Issue 4, 243. Desmet, Bart (& Hoste, Véronique). (2014). Fine-grained Dutch Named Entity Recognition. Language Resources and Evaluation, 2, 48, 307-343. Doddington, George (& Mitchell, Alexis & Przybocki, Mark & Ramshaw, Lance & Strassel, Stephanie & Weischedel, Ralf). (2004). Automatic Content Extraction (ACE) Program-Tasks, Data and Evaluation. Proceeding Conference on Language Resources and Evaluation. Habib, Mena B (& van Keulen, Maurice). (2012). Unsupervised Improvement of Named Entity Extraction in Short Informal Context Using Disambiguation Clues. Faculty of EEMCS, University of Twente, Enschede, The Netherlands Hassel, Martin. (2003). Exploitation of Named Entities in Automatic Text Summarization for Swedish. Proceedings of NODALIDA 03-14th Nordic Conference on Computational Linguistics. Humphreys, Lee (& Gill, Phillipa & Krishnamurthy, Balachander). (2010). How much is too much? Privacy issues on Twitter. Proceedingsof the International Communication Association Conference. Jung, Jason J. (2012). Online named entity recognition method for microtexts in social networking services: a case study of Twitter. Expert systems with applications, 9, 39,8066-8070. Jurafsky, Daniel (& Martin, James H.). (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ, USA: Prentice Hall PTR. Konkol, Michal. (2012). Named Entity Recognition. Published PhD study report. Západočeská univerzita v Plzni. Krishnan, Vijay (& Ganapathy, Vignesh). (2005). Named Entity Recognition. Lyon, D. (2001). Surveillance society: Monitoring in everyday life. Buckingham: Open University Press.
51
Mollá, Diego (& van Zaanen, Menno & Smith, Daniel). (2006). Named Entity Recognition for Question Answering. Centre for Language Technology, Macquarie University, Sydney, Australia. Nadeau, David (&Sekine, Satoshi). (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 1 30, 3-26. Ratinov, Lev (& Roth, Dan). (2009). Design Challenges and Misconceptions in Named Entity Recognition. Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL). 147-155. Boulder, Colorado: Association for Computational Linguistics. Ritter, Alan (& Clark, Sam & Mausam & Etzioni, Oren). (2011). Named Entity Recognition in Tweets: An Experimental Study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1524-1534. Sánchez, Patricia (& Levin, Avner & Del Riego, Alissa). (2012). Blurred Boundaries: Social Media Privacy and the Twenty-First-Century Employee. American Business Law Journal, Volume 49, Issue 1, 63-124. Tjong Kim Sang E (2002a) Introduction to the CoNLL-2002 Shared Task: LanguageIndependent Named Entity Recognition. Proceedings of the 6th Conference on Natural Language Learning, 155 -158. Tjong Kim Sang E, De Meulder F (2003) Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. Proceedings of the 7th Conference on Natural Language Learning, 142-147. Twitter, Inc. (2015). About. [Online] http:// about.twitter.com/company [25.03.15] Van de Kauter, Marjan (& Coorman, Geert & Lefever, Els & Desmet, Bart & Macken, Lieve & Hoste, Veronique). (2013). LeT’s Preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal 3, 103120.
52 7
APPENDIX
7.1
Mails
Beste
Ik ben een studente meertalige communicatie aan de Universiteit Gent en ben volop bezig met mijn masterproef. In mijn onderzoek wil ik nagaan hoe sentimentanalyse het best toegepast kan worden op tweets en welke problemen daarmee gepaard gaan.
Daarnaast zou ik aan de hand van een grondige tekstanalyse willen onderzoeken wat mensen precies over zichzelf vrijgeven wanneer ze twitteren. Daarom ben ik dus op zoek naar enkele actieve twitteraars zoals u. Het is de bedoeling u na deze analyse ook op de hoogte te brengen van wat ik over u ontdekt heb en daar eventueel nog een vragenlijst aan te koppelen.
Als u het ziet zitten om mee te werken aan mijn onderzoek, graag een mailtje terug met uw toestemming. Via de Twitter API zullen wij dan uw meest recente tweets downloaden en analyseren. Het spreekt voor zich dat noch uw tweets noch uw gegevens zullen doorgegeven worden aan derden. Alvast hartelijk dank om mijn voorstel in overweging te nemen.
Met vriendelijke groeten
Margot Dedeyne
53 7.2
Overview of the main categories
Current affairs
18
Public transport
Professional
4
Personal
6
Pastime
8
Technology
3
Television
21
Literature
2
Food
4
Clothes
Music
9
Weather
Sport
3
activity
Table 5. Overview of the main categories of Man 1
Current affairs
13
Professional
Public transport Personal
activity Pastime
1
Technology
4
Television
23
Literature
6
Food
12
Clothes
Music
29
Weather
Sport
Table 6. Overview of the main categories of Man 2
Current affairs
2
Public transport
17
Professional
3
Personal
3
Pastime
2
Technology
2
Television
37
Literature
1
Food
3
Clothes
Music
9
Weather
Sport
4
activity
1
Table 7. Overview of the main categories of Man 3
54 Current affairs
7
Public transport
Professional
15
Personal
Pastime
14
Technology
Television
8
Literature
Food
11
Clothes
Music
7
Weather
Sport
3
7
activity 8
2
Table 8. Overview of the main categories of Man 4 Current affairs
2
Public transport
8
Professional
3
Personal
3
Pastime
6
Technology
10
Television
10
Literature
4
Food
3
Clothes
Music
2
Weather
Sport
38
activity
1
Table 9. Overview of the main categories of Man 5 Current affairs
4
Public transport
2
Professional
7
Personal
1
activity Pastime Television
Technology 16
Food
Literature
1
Clothes
Music
8
Sport
31
Weather
Table 10. Overview of the main categories of Man 6 Current affairs
7
Public transport
1
Professional
7
Personal
6
Pastime
1
Technology
Television
7
Literature
Food
4
Clothes
Music
3
Weather
Sport
4
activity
18
Table 11. Overview of the main categories of Man 7
55 Current affairs
7
Public transport
Professional
1
Personal
4
Technology
6
activity Pastime Television Food
Literature 1
Music Sport
Clothes
1
Weather
7
66
Table 12. Overview of the main categories of Man 8 Current affairs
3
Public transport
Professional
5
Personal
Pastime
1
Technology
Television
7
Literature
Food
1
Clothes
Music
22
Weather
Sport
35
3
activity 1
3
Table 13. Overview of the main categories of Man 9 Current affairs
Public transport
1
10
Personal
8
Pastime
3
Technology
Television
24
Literature
Food
15
Clothes
Music
9
Weather
Sport
28
Professional activity
Table 14. Overview of the main categories of Man 10 Current affairs
1
Public transport
5
Professional
2
Personal
6
Pastime
1
Technology
Television
32
Literature
Food
11
Clothes
Music
15
Weather
Sport
14
activity
Table 15. Overview of the main categories of Woman 1
56 Current affairs
10
Public transport
1
Professional
1
Personal
3
Pastime
5
Technology
12
Television
7
Literature
Food
17
Clothes
Music
9
Weather
Sport
3
activity
3
Table 16. Overview of the main categories of Woman 2 Current affairs
1
Public transport
Professional
9
Personal
Pastime
2
Technology
Television
2
Literature
Food
1
Clothes
activity
Music Sport
Weather 61
Table 17. Overview of the main categories of Woman 3 Current affairs
46
Public transport
1
Professional
12
Personal
1
2
Technology
activity Pastime Television
Literature
Food
1
Clothes
Music
4
Weather
Sport
6
1
Table 18. Overview of the main categories of Woman 4
Current affairs
2
Public transport
Professional
4
Personal
2
Pastime
5
Technology
1
Television
12
Literature
Food
3
Clothes
Music
23
Weather
Sport
37
activity
1
Table 19. Overview of the main categories of Woman 5
57 Current affairs
38
Public transport
Professional
2
Personal
Pastime
1
Technology
2
Television
9
Literature
1
Food
3
Clothes
1
Music
6
Weather
Sport
7
activity
Table 20. Overview of the main categories of Woman 6 Current affairs
1
Public transport
6
Professional
10
Personal
1
Pastime
7
Technology
3
Television
29
Literature
1
Food
7
Clothes
2
Music
17
Weather
Sport
6
activity
Table 21. Overview of the main categories of Woman 7 Current affairs
10
Public transport
Professional
1
Personal
4
Pastime
1
Technology
4
Television
10
Literature
13
Clothes
1
activity
Food Music
13
Weather
Sport
Table 22. Overview of the main categories of Woman 8 Current affairs
2
Public transport
Professional
2
Personal
5
Pastime
1
Technology
3
Television
2
Literature
8
Food
16
Clothes
5
Music
3
Weather
1
activity
Sport
Table 23. Overview of the main categories of Woman 9
58 Current affairs
3
Public transport
Professional
3
Personal
Pastime
9
Technology
Television
10
Literature
Food
22
Clothes
Music
4
Weather
Sport
1
5
activity
1
1
Table 24. Overview of the main categories of Woman 10
59 7.3
The top 100 of the most frequently occurring nouns and named entities per test subject
7.3.1 Top 100 of the most frequently occurring named entities of the men
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
FREQ 38 26 20 17 16 14 14 14 13 13 12 12 11 10 10 10 10 9 9 9 8 8 8 8 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5
MAN 1 MAN 2 MAN 3 Category Named entity FREQ Category Named entity FREQ Category Named entity FREQ Category I-PRO ! 131 B-PER RT 131 B-PER RT 46 B-MISC B-PRO The 90 I-PRO : 45 B-LOC Gent 46 I-MISC B-EVE WK 54 B-LOC Gent 22 B-ORG VRT 35 B-LOC B-PER Jezus 48 B-PRO The 21 I-PER De 32 I-MISC I-PER Van 42 B-PRO My 17 B-LOC Brussel 31 B-LOC B-PER Obama 41 I-PRO #lastfm 13 B-LOC Brugge 28 I-MISC B-PRO Star 41 I-PRO Artists 13 B-PER Lowieke 14 B-LOC I-PER De 41 I-PRO Top 13 B-PRO De 14 B-LOC B-ORG Rode 38 I-PRO 3 12 B-MISC Gent-Sint-Pieters 13 B-PER I-PRO Wars 35 I-PER van 11 B-LOC Vlaamse 13 B-PRO I-ORG Duivels 32 I-PRO The 11 B-PER Thijs 13 I-PRO I-PRO . 31 B-MISC #Spotify 11 B-PRO The 12 B-LOC B-LOC Brussel 28 I-PRO of 11 I-PER Van 12 B-LOC B-LOC Amerika 28 I-PRO the 10 B-LOC Antwerpen 9 B-PER B-PER Jan 23 B-PRO Fringe 10 B-LOC Facebook 9 I-MISC B-PER Kim 23 I-PRO . 10 B-LOC Gentse 9 I-PER B-PRO De 18 B-PRO De 10 B-ORG Studio 8 B-ORG B-LOC Antwerpen 16 I-PER De 9 B-LOC @MNMbe 8 B-PER I-PRO IK 15 B-ORG Radio 9 B-LOC Twitter 8 I-ORG I-PRO of 14 I-PRO 9 B-MISC @ThijsVandepoele 7 I-PER B-LOC Vlaamse 14 I-PRO on 9 B-PER Jan 6 B-PER I-PER Peeters 13 B-LOC Gentse 8 B-LOC Leuven 5 B-PER I-PRO De 13 I-PER de 8 B-LOC West-Vlaanderen 5 B-PER I-PRO The 12 B-LOC @UrgentFM 8 I-PRO ! 5 I-PER B-LOC Gent 12 B-PER Breaking 7 B-LOC #Thuisopeen 4 B-LOC B-LOC Leuven 12 I-PER Bad 7 B-ORG De 4 B-LOC B-PER Eddy 11 B-PER De 7 B-PER Luc 4 B-LOC B-PER Geert 10 B-ORG The 7 B-PER Twitter 4 B-LOC I-PER & 10 I-ORG Radio 7 I-ORG Brussel 4 B-LOC I-PER Jong-un 10 I-PRO and 7 I-PER @Jeroenhu 4 B-MISC I-PRO DAT 9 B-ORG BBC 7 I-PRO Life 4 I-LOC B-LOC België 9 I-PER @UrgentFM 6 B-LOC Sint 3 B-LOC B-LOC Game 9 I-PER @VGeperst 6 B-LOC Sinterklaas 3 B-LOC B-PER Bart 9 I-PRO ! 6 B-ORG Rode 3 B-LOC B-PER George 9 I-PRO A 6 B-PER @NMBS 3 B-LOC B-PER Jurassic 8 B-ORG RT 6 B-PER Bart 3 B-MISC B-PER Kris 8 I-PER Van 6 B-PER De 3 B-MISC B-PER Michel 8 I-PRO Bad 6 B-PER Harry 3 B-MISC B-PRO Het 8 I-PRO Night 6 B-PER Lowie 3 B-ORG B-PRO Nederlands 8 I-PRO World 6 B-PER Tom 3 B-PER I-PRO : 8 I-PRO van 6 B-PRO Music 3 B-PER I-PRO Of 7 B-PER #Shazam 6 I-PER @Vannieuwkerke 3 B-PER I-PRO vs. 7 B-PER Check 6 I-PER Vos 3 B-PRO B-LOC New 7 B-PER The 6 I-PRO For 3 I-LOC B-LOC Sharknado 7 I-ORG United 5 B-LOC Brussel-Noord 3 I-MISC B-MISC @MadameDeSable 7 I-PER @OMGFacts 5 B-LOC Koekelare 3 I-ORG B-ORG Delhaize 7 I-PER @SaraLogghe 5 B-LOC Oostende 3 I-PER B-ORG The 7 I-PER Guns 5 I-ORG Duivels 3 I-PER B-PER Didier 7 I-PRO Geluidsarchitect 5 I-PER @MNMbe 3 I-PER B-PER Filip 7 I-PRO Wolf 5 I-PER Vertonghen 3 I-PRO
MAN 4 Named entity I'm at Lichtervelde Lichtervelde Brugge Zwembad Kortemark XD RT De Standaard Brussel Brussels Oud Groenhove Cortemarck Hogeschool-Universiteit @StijnPollet Brussel @CarlDieryckx @K_Meersschaert @CarlDieryckx De De #Lichtervelde België Bruxelles Den Sint Starbucks Arend Gent Haha Station Torhout @StijnPollet Hogeschool-Universiteit XD De Br(ik Cinema XD I'm Katelijnestraat Station Ja @StijnPollet Keizer Koten at
MAN 5 FREQ Category Named entity 260 B-PER RT 40 I-PER @ProjectVygo 31 B-PRO The 21 B-LOC België 19 B-LOC Belgische 19 B-LOC Kortrijk 16 B-LOC Belgen 15 I-PER De 15 I-PRO ! 14 B-LOC Gent 14 I-EVE Cup 12 B-PER Daniel 12 B-PER Pieter 11 I-PER Devos 10 B-LOC Nederland 10 B-PRO De 9 B-ORG Google 9 I-PER Deusser 8 B-EVE Nations 8 B-LOC Belg 8 B-LOC Le 8 B-PER Ludo 8 I-PRO of 7 B-LOC Belgisch 7 B-MISC I'm 7 B-PER De 7 I-EVE Champions 7 I-EVE Tour 7 I-MISC at 7 I-PER @GCT_events 7 I-PER @Noelle_Floyd 6 B-LOC Normandië 6 B-ORG FEI 6 B-PER The 6 B-PRO Windows 6 I-EVE Prix 6 I-LOC Mans 6 I-ORG Cup 6 I-PER @Bram_VDP 6 I-PER @EricDupain 6 I-PER Geerts 6 I-PER Van 6 I-PER de 6 I-PRO . 6 I-PRO : 6 I-PRO Grand 6 I-PRO on 6 I-PRO the 5 B-LOC Belgian 5 B-LOC London
61
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
MAN 1 FREQ Category Named entity 5 B-PER Gert 5 B-PER God 5 B-PER Hitler 5 B-PER Jos 5 B-PER Serge 5 B-PER The 5 B-PER Wouter 5 B-PRO FC 5 I-LOC York 5 I-ORG Van 5 I-PER Janssens 5 I-PER Park 5 I-PER Reynders 5 I-PER Simonart 5 I-PRO 2 5 I-PRO EEN 5 I-PRO Het 5 I-PRO Kampioenen 5 I-PRO the 4 B-LOC Belgische 4 B-LOC Duitse 4 B-LOC Gentse 4 B-LOC Limburg 4 B-LOC Londen 4 B-LOC Parijs 4 B-LOC Rusland 4 B-LOC VS 4 B-MISC @LisaLeysen 4 B-MISC Kerstmis 4 B-MISC RT's 4 B-ORG De 4 B-ORG Google 4 B-ORG U2 4 B-ORG VRT 4 B-ORG Vlaams 4 B-PER Beyoncé 4 B-PER Chris 4 B-PER Jeff 4 B-PER Karel 4 B-PER Kirsten 4 B-PER Laurent 4 B-PER Sherlock 4 B-PER Thrones 4 B-PER Will 4 B-PRO Engels 4 B-PRO Hmm 4 B-PRO IK 4 I-ORG Belang 4 I-PER Baeten 4 I-PER Bontinck
FREQ 7 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
MAN 2 MAN 3 Category Named entity FREQ Category Named entity FREQ Category I-PRO | 4 B-LOC #ExpeditieVRT 2 B-LOC B-PER @UrgentFM 4 B-LOC @ExpeditieVRT 2 B-LOC B-PER Senne 4 B-LOC Belgische 2 B-LOC B-PER Tom 4 B-LOC Kortrijk 2 B-MISC I-ORG Gent 4 B-LOC Mechelen 2 B-MISC I-PER Geluidsarchitect 4 B-LOC Vlaanderen 2 B-MISC I-PRO Knick 4 B-MISC Ongedane 2 B-MISC B-LOC Italiaanse 4 B-PER @EliseVandv 2 B-MISC B-MISC @YouTube-video 4 B-PER Hot 2 B-ORG B-MISC I'm 4 B-PER Marc 2 B-ORG B-PER @Ljosmyndun 4 B-PER Marianne 2 B-ORG B-PER Bart 4 B-PER Sinterklaas 2 B-PER B-PER John 4 B-PER Stefaan 2 B-PER B-PER Minard 4 B-PER Tinder 2 B-PER B-PER Wow 4 B-PRO Frans 2 B-PER B-PRO Breaking 4 B-PRO Station 2 B-PER B-PRO Het 4 I-MISC Zaken 2 B-PER B-PRO WAAR 4 I-PER @ExpeditieVRT 2 B-PRO I-MISC at 4 I-PER @Ihsane_CL 2 B-PRO I-ORG Brussel 4 I-PER @JoyThielemans 2 I-LOC I-ORG Goldfinch 4 I-PER @Julieedv 2 I-LOC I-PER @HistoricalPics 4 I-PER @Ljosmyndun 2 I-MISC I-PER @JRRRMY 4 I-PER @TMSVL 2 I-ORG I-PER @MNMbe 4 I-PER @VRTmess 2 I-ORG I-PER Lux 4 I-PER Marijke 2 I-ORG I-PRO 2 4 I-PER de 2 I-PER I-PRO 2014 4 I-PER van 2 I-PER I-PRO ? 4 I-PRO ? 2 I-PER I-PRO Detective 4 I-PRO De 2 I-PER I-PRO Part 3 B-LOC #IedereenBeroemd 2 I-PER I-PRO Story 3 B-LOC #VRT 2 I-PRO I-PRO at 3 B-LOC @ThijsVandepoele 1 B-EVE I-PRO by 3 B-LOC @ZUIDDAG 1 B-EVE I-PRO in 3 B-LOC Brussel-Zuid 1 B-EVE I-PRO to 3 B-LOC Ieper 1 B-EVE B-LOC @MNMbe 3 B-LOC Overpoort 1 B-EVE B-LOC Antwerpen 3 B-LOC Pukkelpop 1 B-EVE B-LOC Brussel 3 B-LOC Schaarbeek 1 B-EVE B-LOC Facebook 3 B-LOC Schaerbeek 1 B-EVE B-LOC Kortrijk 3 B-LOC Torhout 1 B-EVE B-LOC Nice 3 B-LOC VS 1 B-EVE B-LOC Sardinië 3 B-MISC #GhostRockers 1 B-EVE B-MISC American 3 B-MISC #Lichtfestival 1 B-EVE B-MISC Dag 3 B-MISC @Vannieuwkerke 1 B-EVE B-MISC S1.E1 3 B-MISC Twitter 1 B-EVE B-MISC S1.E9 3 B-MISC VRT-directie 1 B-EVE B-PER Chris 3 B-MISC VRT-gebouw 1 B-EVE B-PER Da's 3 B-MISC VRT-toren 1 B-EVE B-PER Dag 3 B-MISC West-Vlaming 1 B-EVE B-PER David 3 B-ORG #VRT 1 B-EVE
MAN 4 MAN 5 Named entity FREQ Category Named entity Amadeo 5 B-MISC @ProjectVygo La 5 B-MISC BK Veldegem 5 B-ORG The Auditorium 5 B-PER Android Bruxelles-Central 5 B-PER Glenn Mercedes-Benz 5 B-PER Tis OS 5 B-PER Vygo Scott's 5 B-PRO Het @CarlDieryckx 5 B-PRO RT @Sophieke14 5 I-PER @EvelineGoe @StijnPollet 5 I-PER @FocusWTV @Valentienx 5 I-PER @HorsUsNews Café 5 I-PER Cup Kotdrink 5 I-PRO Mobilia 5 I-PRO ? Mountain 5 I-PRO Store School 5 I-PRO Studios Kortemark 5 I-PRO for Ola 4 B-EVE EK Brussel-Centraal 4 B-EVE Global GO 4 B-EVE Grote Bar 4 B-LOC Aachen @CarlDieryckx 4 B-LOC Aken @StijnPollet 4 B-LOC Duitsland Markten 4 B-LOC Instagram @ThijsKevin 4 B-LOC Mechelen @Vannieuwkerke 4 B-LOC New Ah 4 B-LOC Nice Lion 4 B-LOC Roeselare Standaard 4 B-LOC Vlaamse ! 4 B-MISC @FienVerschuren http://t.co/0Aw9ruc5T8 4 B-MISC Apple's http://t.co/1N2MGHyX0O 4 B-MISC HOWEST http://t.co/9uZtcblMwF 4 B-ORG Apple http://t.co/C1mEMDa 4 B-PER Audi http://t.co/HjDTj6LKe3 4 B-PER Evy http://t.co/K8LkYCkrzm 4 B-PER Hippodroom http://t.co/MOcrkG0GON 4 B-PER Jorinde http://t.co/OSSi5C9CZs 4 B-PER Sjef http://t.co/STr1qT4zRj 4 I-EVE Nations http://t.co/U4JkFghmU0 4 I-EVE Prijs http://t.co/W8YDPBaHaD 4 I-ORG Event http://t.co/ZI6PvmdxAL 4 I-ORG Horse http://t.co/hNmpq70Jyh 4 I-ORG Nations http://t.co/jxDlvB11 4 I-PER @Blackletterday http://t.co/k50h5xKviT 4 I-PER @BoktNieuws http://t.co/qryvnjr0MH 4 I-PER @EquirexNL http://t.co/ueO5vDcJB6 4 I-PER @VLPvzw http://t.co/y318cm2nQ3 4 I-PER Philippaerts http://t.co/zRn0ZCbPRD 4 I-PER Verwimp
62
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
MAN 6 FREQ Category Named entity 432 B-PER RT 40 I-PER @HistoricalPics 30 B-LOC Brugge 28 B-LOC #Brugge 27 I-PER @ManUtd 25 I-PER @StadBrugge 22 I-PER @FVMas 20 B-LOC Roeselare 20 B-PRO De 19 I-PER De 18 B-ORG RT 18 I-PER @ICMA 17 B-ORG Club 17 B-ORG Stad 17 I-PRO ! 16 I-PER @ClubBrugge 15 I-PRO the 14 B-PRO Het 14 I-PER @Gapugent 14 I-PER @Vannieuwkerke 14 I-PER Van 13 B-LOC #Roeselare 13 B-PRO The 13 I-ORG Brugge 13 I-PER @DDucheyne 12 B-PER Dirk 11 B-LOC RT 11 B-PRO RT 11 I-PRO : 10 I-ORG @HistoricalPics 10 I-PER @FocusWTV 10 I-PER van 9 B-LOC Anderlecht 9 B-LOC België 9 I-ORG Roeselare 9 I-PER @DenysJan 9 I-PER @StevenVBe 9 I-PRO Nieuwsblad 9 I-PRO of 7 B-LOC Belgische 7 B-LOC Vlaanderen 7 I-PER @Fact 7 I-PRO on 6 B-LOC Brugse 6 B-LOC West-Vlaamse 6 B-PER Check 6 B-PER De 6 B-PER God 6 B-PER Jan 6 B-PER Kevin
FREQ 83 21 19 15 15 11 9 9 9 8 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4
MAN 7 Category Named entity FREQ Category B-PER RT 1 B-EVE B-LOC Gent 9 B-PRO B-PER @DenysJan 9 I-PRO I-PER @Gapugent 7 B-PER I-PER De 7 I-PER B-LOC België 6 B-MISC B-MISC @CarlvKeirsbilck 6 B-ORG B-PER @CarlvKeirsbilck 5 B-LOC B-PER @HeidiWulff 4 I-PRO B-PER @Kroy_Wendy 3 B-LOC B-MISC HR 2 B-LOC B-ORG @KMOquality 2 B-LOC B-ORG UGent 2 B-LOC B-PER God 2 B-LOC B-PER Jan 2 B-LOC I-PER @Innduceme 2 B-LOC I-PRO and 2 B-LOC I-PRO in 2 B-LOC I-PRO of 2 B-MISC B-LOC Anderlecht 2 B-PER B-MISC @Kroy_Wendy 2 B-PER B-MISC @LesleyArens 2 B-PER B-MISC That's 2 B-PER B-ORG RT 2 B-PRO B-PER @DDucheyne 2 B-PRO B-PER Sorry 2 I-PER B-PER Van 2 I-PER B-PRO De 2 I-PRO I-ORG @HeidiWulff 2 I-PRO I-PER van 1 B-EVE B-LOC #Gent 1 B-EVE B-LOC @DenysJan 1 B-EVE B-LOC RT 1 B-LOC B-LOC Vlaamse 1 B-LOC B-MISC Da's 1 B-LOC B-PER Ge 1 B-LOC B-PRO Engels 1 B-LOC B-PRO The 1 B-LOC B-LOC @GertPeersman 1 B-LOC B-LOC Belgische 1 B-LOC B-LOC Europese 1 B-LOC B-LOC Franse 1 B-LOC B-LOC US 1 B-LOC B-MISC RT 1 B-LOC B-ORG @DenysJan 1 B-LOC B-ORG KUL 1 B-LOC B-PER @DeVisKar 1 B-LOC B-PER @FVMas 1 B-LOC B-PER @VeroniekC 1 B-LOC B-PER Bart 1 B-LOC
MAN 8 MAN 9 MAN 10 Named entity FREQ Category Named entity FREQ Category Named entity EK 147 B-PER RT 158 B-LOC Gent RT 59 I-PRO #lastfm 91 B-MISC I'm @Team_3M 59 I-PRO Artists 91 I-MISC at Jens 58 B-PRO My 61 I-MISC Kot Vandenbogaerde 58 I-PRO 3 46 B-MISC Lowie's http://t.co/KIqOg9U4. 58 I-PRO Top 42 B-PER RT Sporza 46 I-PRO : 36 B-PER De Belgie 17 B-LOC Russische 26 B-LOC Schoonmeersen : 16 I-PRO ! 26 I-MISC Ruimte Brussel-Opwijk 15 B-LOC Rusland 26 I-MISC Stille Apfff 15 I-PER Van 26 I-PRO at BK 15 I-PRO The 25 B-PRO I'm Benicassim 13 B-LOC Russisch 23 B-ORG HoGent Frankrijk 13 B-PER @Twezus 23 I-MISC Schoonmeersen Kuurne 13 B-PER Photo 21 B-MISC BYB Mojacar 13 B-PRO The 21 I-ORG Schoonmeersen Puivelde 11 B-LOC België 20 I-MISC Escape TT 11 I-PER De 20 I-MISC Fitness #KBK 10 B-LOC Duitse 19 B-PER Gebouw Legley 10 I-PER @UberFacts 19 I-PER C Ma 10 I-PRO . 15 I-PER Schoonmeersen Specialized 10 I-PRO of 14 B-LOC Sint Toeme 10 I-PRO the 14 B-MISC Charlies Te 9 B-PRO De 14 B-ORG Wervik Thx 9 I-EVE Cup 14 I-LOC Denijsplaats Shiv 8 B-EVE World 13 B-PER BYB TT 8 B-LOC Belgische 13 I-PER De koop 8 B-LOC My 13 I-PER Therminal to 8 B-LOC Novosibirsk 12 B-LOC Oud Parijs-Roubaix 8 B-LOC Russen 12 B-ORG De Ronde 8 B-LOC Siberië 11 B-PRO The http://t.co/W1wgeL7dJd 8 B-LOC Vlaamse 11 I-PRO ! #BOONEN_Tom 8 B-ORG N-VA 11 I-PRO the @AnPost_CRC 8 B-ORG Rode 9 B-LOC Anderlecht @Preventielab 8 B-PER Jan 9 I-MISC Station @Ruben_Pols 8 B-PER The 9 I-PER Coulissen Aah 8 I-LOC Top 9 I-PRO The Alehop... 7 B-EVE WK 8 I-PRO of Angreau 7 B-LOC Anderlecht 7 B-LOC Belgium Anvaing 7 B-LOC Duitsland 7 B-LOC Brussels Apffff 7 B-LOC Nederlandse 7 B-LOC Ghent Ardense 7 B-ORG Bayern 7 I-MISC Gent-Sint-Pieters Ardooie... 7 B-ORG FIFA 7 I-PER Van Belgisch 7 B-ORG KSK 6 B-LOC België Belgische 7 B-ORG The 6 B-LOC Kortrijk België 7 B-ORG VRT 6 B-LOC Overpoortstraat Beselare 7 B-PER Twitter 6 B-LOC SL Borlo 7 B-PER Van 6 B-PER Resto Brest 7 B-PRO RT 6 B-PRO De Brussel 6 B-LOC Antwerpen 6 I-ORG Therminal
63
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
FREQ 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3
MAN 6 MAN 7 Category Named entity FREQ Category Named entity FREQ Category B-PER There 4 B-PER Check 1 B-LOC I-PER @FCBarcelona 4 B-PER Fientje 1 B-LOC I-PER @LambrechtAnnick 4 B-PER Leuk 1 B-LOC I-PER blogt| 4 I-MISC Moh 1 B-LOC I-PRO Be 4 I-MISC My 1 B-LOC I-PRO Bruges 4 I-MISC Nee 1 B-LOC I-PRO and 4 I-ORG @DenysJan 1 B-LOC B-PER John 4 I-ORG @Gapugent 1 B-LOC B-PRO Nederlands 4 I-ORG Cool 1 B-LOC I-PER @AMAnet 4 I-ORG of 1 B-LOC I-PER @BramDoolaege 4 I-ORG van 1 B-LOC I-PER @KMOquality 4 I-PER @DDucheyne 1 B-LOC I-PER @LucOuvrein 4 I-PER @DenysJan 1 B-LOC I-PER Bruyne 4 I-PER @FVMas 1 B-LOC I-PRO Morgen 4 I-PER @Inno_elke 1 B-LOC I-PRO Standaard 4 I-PER @LucSels 1 B-LOC I-PRO in 4 I-PER Pas 1 B-LOC I-PRO to 4 I-PRO Psychology 1 B-LOC B-LOC New 3 B-EVE Creativity 1 B-LOC B-LOC VIDEO 3 B-LOC @FonsLeroyVDAB 1 B-LOC B-LOC Westlaan 3 B-LOC Brugge 1 B-LOC B-MISC C'mon 3 B-LOC Brussel 1 B-LOC B-MISC HR 3 B-LOC HR 1 B-LOC B-MISC RT 3 B-LOC UK 1 B-LOC B-PER David 3 B-LOC West-Vlaanderen 1 B-LOC B-PER Freddy 3 B-MISC @DDucheyne 1 B-LOC B-PER George 3 B-MISC @DenysJan 1 B-MISC B-PER Johan 3 B-MISC @DevlieghereJ 1 B-MISC B-PER Luc 3 B-MISC @KMOquality 1 B-MISC B-PER Ronnie 3 B-ORG @CarlvKeirsbilck 1 B-MISC B-PER Thomas 3 B-ORG @DeVisKar 1 B-MISC B-PER Tom 3 B-PER @DevlieghereJ 1 B-MISC B-PER Van 3 B-PER @Elisameuleman 1 B-MISC B-PER Wouter 3 B-PER @LerenHoeZo 1 B-MISC B-PRO Come 3 B-PER @SakaliSa 1 B-MISC B-PRO Engels 3 B-PER Anne 1 B-MISC B-PRO Nieuws 3 B-PER Filip 1 B-MISC I-LOC City 3 B-PER Frederik 1 B-MISC I-ORG #Brugge 3 B-PER Hugo 1 B-MISC I-ORG @Globe_Pics 3 B-PER Juffrouw 1 B-MISC I-PER @HRMagBE 3 B-PER Karel 1 B-MISC I-PER @LesleyArens 3 B-PER Knap 1 B-MISC I-PER @Nieuwsblad_be 3 B-PER Michael 1 B-MISC I-PER @SportBrugge 3 B-PER Moh 1 B-MISC I-PER @TheAtlantic 3 B-PER Schoon 1 B-MISC I-PER O'Sullivan 3 B-PRO @DenysJan 1 B-MISC I-PER Tour 3 B-PRO @Kroy_Wendy 1 B-MISC I-PRO You 3 B-PRO Frans 1 B-MISC I-PRO a 3 I-EVE Forum 1 B-MISC B-EVE Ronde 3 I-EVE World 1 B-MISC
MAN 8 MAN 9 MAN 10 Named entity FREQ Category Named entity FREQ Category Named entity Buienradar 6 B-LOC Vlaanderen 6 I-PER Brug Deerlijk 6 B-PER @MrDarkley 6 I-PRO Zone Diegem 6 B-PER David 5 B-LOC Barcelona Enghien 6 B-PER Kim 5 B-LOC Gentse Feb 6 B-PER Peter 5 B-LOC Kortrijksepoortstraat Gentse 6 B-PER Romelu 5 B-LOC New Junioren 6 I-ORG Duivels 5 B-ORG BankFIM Kemmel 6 I-ORG Kieldrecht 5 B-ORG Stille La 6 I-PER @TheLadBible 5 B-PER Ma Loir 6 I-PER Smiths 5 I-LOC Airport Maarja 6 I-PER and 5 I-LOC Feesten Montenaken 6 I-PER de 5 I-LOC York Oost-Vlaanderen 6 I-PRO and 5 I-MISC Trechterzaal PK 5 B-LOC Gent 5 I-ORG Banksimulatie Perenchies 5 B-LOC Moskou 5 I-ORG Ruimte Rekkem 5 B-LOC Sint-Niklaas 4 B-LOC Schoonmeersstraat Siberische 5 B-MISC Da's 4 B-LOC Station Silly 5 B-MISC RT 4 B-LOC Voskenslaan Tielt 5 B-PER @sNarah 4 B-MISC Lycus VTT 5 B-PER Bayern 4 B-MISC McDonald's‎ Veldegem 5 B-PER Chelsea 4 B-MISC Starbucks Wervik 5 B-PER Garry 4 B-PER Café Where 5 B-PER Michel 4 B-PER Lukaku Wodecq 5 B-PER Obama 4 B-PER Tis Zonnebeke 5 B-PER Poetin 4 B-PRO Level Zwevegem 5 B-PRO CC 4 I-LOC Kortrijk #E3H 5 I-ORG of 4 I-MISC Beach @Jensvdb 5 I-PER @ThePoke 4 I-PER @Vannieuwkerke Atletiek-Tennis-Bo 5 I-PER @bwinBE 4 I-PRO 2 Awtch... 5 I-PER KLAP 4 I-PRO Buzzy BK 5 I-PER Lukaku 4 I-PRO Days Binche-Doornik-Binche 5 I-PER den 4 I-PRO Snacks Hoho... 5 I-PRO Ermenrike 4 I-PRO Spar Kuurne-Brussel-Kuurne 4 B-LOC @JGobelijn 4 I-PRO Three Kuurne-brussel-kuurne 4 B-LOC Brussel 3 B-LOC Antalya M'n 4 B-LOC RT 3 B-LOC BRU Selectie 4 B-LOC Russian 3 B-LOC Bibliotheek Team3M 4 B-LOC Waasland 3 B-LOC La Wat'n 4 B-LOC Zaventem 3 B-LOC Washington http://t.co/6yJLOMURA6 4 B-MISC Twitter 3 B-MISC @JacobDegrande http://t.co/H24VyBlB 4 B-ORG Europees 3 B-MISC Fanny's http://t.co/N5L2r43SD8 4 B-ORG Real 3 B-MISC Trainspotter http://t.co/OXCND3Pw 4 B-PER Adnan 3 B-MISC VT4 http://t.co/TZLo7Dt05d 4 B-PER De 3 B-ORG Club http://t.co/Unjbgaa1 4 B-PER James 3 B-PER Android http://t.co/dwYk2mRqBn 4 B-PER Jeff 3 B-PER Blok http://t.co/g8SH8FMp 4 B-PER Luc 3 B-PER Maria http://t.co/kE73QaXhtI 4 B-PER Paul 3 B-PER Metro http://t.co/qqL7xwqP1p 4 B-PER Pirlo 3 B-PER Mmm http://t.co/rEuyT91ciT 4 B-PER Sorry 3 B-PER Mmmh
64
7.3.2 Top 100 of the most frequently occurring named entities of the women
65
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
FREQ 48 40 28 28 22 16 13 13 13 13 12 11 10 10 9 9 9 9 9 9 9 8 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5
WOMAN 1 WOMAN 2 WOMAN 3 WOMAN 4 WOMAN 5 Category Named entity FREQ Category Named entity FREQ Category Named entity FREQ Category Named entity FREQ Category Named entity B-LOC Mechelen 381 B-PER RT 87 B-PER P 163 B-PER RT 47 B-PER RT B-PER RT 88 I-PER @Licht_Uit 80 B-LOC Belgische 70 B-LOC Vlaanderen 20 B-LOC Gent B-LOC #Eurovision 24 B-LOC Gent 56 B-LOC België 57 B-LOC Vlaamse 18 B-LOC Brugge I-PRO ! 24 B-LOC Kortrijk 52 B-EVE WK 31 B-PER Hilde 12 B-EVE WK B-PRO The 17 B-PRO The 47 B-LOC Gent 27 B-LOC Antwerpen 12 B-LOC Belgische I-PER van 16 B-LOC Antwerpen 47 I-PER Van 27 B-LOC Vlaams 12 B-PER Alex B-LOC Vilvoorde 16 B-LOC België 46 I-PER Haha 22 B-LOC Brugge 12 I-PRO ! B-MISC Da's 16 I-PER De 39 B-EVE Spelen 22 B-LOC Gent 11 B-PER Jake B-PER Merci 15 B-PRO RT 39 B-PER RT 21 B-LOC Brussel 11 I-PER Bugg B-PER Sorry 13 I-PER @Slate 39 B-PRO @TimvanDijk1991 19 B-MISC @MP_Peeters 10 B-LOC Belgen B-PER Jan 12 I-PRO The 35 B-EVE EK 16 I-PER Crevits 10 B-PRO The I-PRO Of 10 B-LOC Belgium 33 I-PER De 14 B-LOC West-Vlaamse 10 I-PER De B-LOC #QBeachHouse 9 B-LOC Brussel 29 B-PER Haha 14 B-PER Jan 9 B-LOC België B-PER Stromae 9 B-LOC Nederland 23 B-LOC Nederland 13 B-LOC Oostende 9 I-PER Van B-LOC @Qmusic_BE 9 B-PRO De 21 I-PRO Haha 13 B-LOC West-Vlaanderen 8 B-PER Arctic B-LOC Disneyland 8 B-LOC Belgische 19 B-PER Tom 13 B-MISC CD& 8 I-PER Monkeys B-PER Allez 8 B-LOC China 18 B-LOC Belgen 13 I-PER De 7 B-MISC @KaatBuysse B-PER God 8 B-LOC New 18 B-PER @Eva_DP 12 I-PER Van 7 B-PER @Gaylevh B-PER Hashtag 8 B-LOC Parijs 16 B-MISC P 11 B-LOC Torhout 7 B-PER Albert I-PRO . 8 B-ORG Google 16 B-PER Gijs 11 I-PER & 7 B-PER De I-PRO ? 8 B-PER Ge 16 B-PER Jolien 11 I-PER van 7 B-PER Hazard B-PER Harry 8 I-PER van 15 B-LOC Belgisch 11 I-PRO ! 7 B-PER Van B-ORG The 8 I-PRO : 15 B-LOC Nederlandse 10 B-LOC #Brugge 7 I-PER Turner B-PER Hey 7 B-LOC Brussels 15 B-MISC OS 10 B-LOC Limburg 7 I-PER van B-PER Koen 7 B-ORG The 14 B-LOC Oost-Vlaanderen 10 B-ORG Vlaams 6 B-LOC Nederland B-PER Twitter 7 B-PER Haha 14 B-MISC Jaar 10 I-PER @DaanSchalck 6 B-ORG Club B-PRO IK 7 B-PER Hey 14 B-PER @Saraa_vd 9 B-LOC Kortrijk 6 B-ORG RT I-PER . 7 B-PER The 14 B-PER De 8 B-LOC Europa 6 B-ORG Rode I-PER De 7 B-PER Van 14 B-PER Sven 8 B-ORG Vlaamse 6 B-PER Aquarius I-PER Van 7 I-PER @Salon 14 B-PER Van 8 B-PER Proficiat 6 B-PER Courtois I-PRO IK 7 I-PER @TIME 14 I-EVE Gold 8 I-ORG Parlement 6 B-PER Federer I-PRO The 7 I-PER @TheOnion 14 I-EVE Spelen 8 I-PER @Crevits 6 B-PER Haha B-LOC Antwerpen 7 I-PRO ! 13 B-LOC Belg 7 B-LOC Antwerpse 6 I-ORG Duivels B-LOC Dilbeek 6 B-LOC Facebook 13 B-MISC #RedPanthers 7 B-LOC E19 6 I-PER Bruyne B-LOC Leila 6 B-LOC West-Vlaanderen 13 B-ORG UGent 7 B-LOC Europese 5 B-EVE Olympische B-MISC #FouteUur 6 B-PER Delfine 13 B-PER @MCKefkes 7 B-LOC Ieper 5 B-LOC Brazilië B-MISC @WimOosterlinck 6 B-PER Hopelijk 13 B-PER Andy 7 B-PER Crevits 5 B-LOC Komaan B-PER @Omaya_ 6 I-LOC York 12 B-LOC Londen 7 B-PER Luc 5 B-ORG IKEA B-PER Conchita 6 I-ORG THE 12 B-LOC Rio 7 I-PER @FocusWTV 5 B-PER Gilbert B-PER Jezus 6 I-PER @Youssef_Kobo 12 B-ORG Red 6 B-LOC RT 5 B-PER Kevin I-MISC Da's 6 I-PER Persoon 12 B-PER @MatthDV 6 B-LOC Vietnam 5 B-PER Kim I-PER Potter 5 B-LOC Brugge 12 B-PER Evi 6 B-LOC Zeebrugge 5 B-PER Nadal I-PRO #Eurovision 5 B-LOC India 12 B-PER Kim 6 B-MISC Minister 5 B-PER Niels I-PRO A 5 B-LOC Londen 12 I-PER Hoecke 6 B-PER Bart 5 B-PER Nys I-PRO De 5 B-LOC Twitter 12 I-PRO ! 6 B-PER David 5 B-PER The I-PRO W 5 B-MISC Here's 11 B-EVE Olympische 6 B-PER Kris 5 B-PER Vadis B-LOC Brugge 5 B-MISC Kerstmis 11 B-EVE Tour 6 B-PER Kurt 5 I-EVE Spelen B-LOC Brussel 5 B-MISC Zalig 11 B-LOC Vlaamse 6 B-PER Schultz 5 I-PER Goffin B-LOC Grote 5 B-ORG BBC 11 B-MISC Da's 6 I-PER @MP_Peeters 5 I-PER RT B-MISC #Cavalcade 5 B-PER @Smienos 11 B-PER Hans 6 I-PER Peeters 4 B-LOC @Clijsterskim
66
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
FREQ 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3
WOMAN 1 WOMAN 2 WOMAN 3 WOMAN 4 WOMAN 5 Category Named entity FREQ Category Named entity FREQ Category Named entity FREQ Category Named entity FREQ Category Named entity B-ORG #testXperiaZ1 5 I-PRO A 11 B-PRO De 6 I-PER Schultz 4 B-LOC Belg B-PER @SuzieQew 5 I-PRO In 11 I-MISC Jaar 5 B-LOC A12 4 B-LOC Damn B-PER Beyoncé 5 I-PRO Of 11 I-MISC het 5 B-LOC België 4 B-LOC Wimbledon B-PER Da's 4 B-LOC #Kortrijk 11 I-MISC van 5 B-LOC Brugse 4 B-MISC RT B-PER Johan 4 B-LOC Belgen 10 B-LOC #Gent 5 B-LOC Mechelen 4 B-PER Clijsters B-PER Leila 4 B-LOC Belgisch 10 B-LOC @Extrasportbe 5 B-LOC Nederland 4 B-PER David B-PER Nyan 4 B-LOC NL 10 B-LOC EK 5 B-LOC Roeselare 4 B-PER Mertens B-PER Van 4 B-ORG @SuperJ4n 10 B-MISC @BELRedPanthers 5 B-LOC Singapore 4 B-PER Serieus B-PRO FC 4 B-ORG OnePlus 10 B-ORG @TimvanDijk1991 5 B-ORG RT 4 B-PER Tis I-LOC Markt 4 B-PER @Chimiel 10 B-PER @MrgrtMaxine 5 B-PER @Crevits 4 B-PER Tsonga I-PER @Qmusic_BE 4 B-PER Bart 10 B-PER Clijsters 5 B-PER Danke 4 B-PER Wilmots I-PER Cat 4 B-PER Eddy 10 B-PER Delfine 5 B-PER De 4 B-PRO Engels I-PER Wurst 4 B-PER Gij 10 B-PER Gilbert 5 B-PER Melanie 4 B-PRO RT I-PRO C 4 B-PER Hahaha 10 B-PER Tim 5 B-PER Zon 4 I-MISC ! I-PRO In 4 B-PER Hallo 10 B-PER Wow 5 B-PRO Het 4 I-ORG Brugge I-PRO Kampioenen 4 B-PER Ooohhh 10 B-PRO Het 5 I-LOC Limburg 4 I-PER @UberFacts I-PRO Time 4 B-PER Sorry 10 I-PER Persoon 5 I-ORG Gent 4 I-PER Gendt B-LOC @LeilaVDM 4 B-PER Verdomme 9 B-PER @NestenSchepens 5 I-ORG Regering 4 I-PER Odjidja B-LOC Belgen 4 B-PRO @SuperJ4n 9 B-PER Kirsten 5 I-PER @BrouwersKarin 3 B-LOC @Gaylevh B-LOC Gent 4 B-PRO Nederlands 9 B-PER Michel 5 I-PER @HavenGent 3 B-LOC @OttoJanHam B-LOC Italië 4 I-ORG Music 9 B-PER Mss 4 B-LOC E40 3 B-LOC @Vannieuwkerke B-LOC Poolse 4 I-ORG One 9 B-PER Philippe 4 B-LOC Franse 3 B-LOC Argentinië B-LOC Rusland 4 I-ORG Play 9 I-PER Murray 4 B-LOC Izegem 3 B-LOC Barcelona B-LOC Russisch 4 I-PER @AnoukTorbeyns 9 I-PER van 4 B-LOC Limburgse 3 B-LOC Belgium B-LOC Twitter 4 I-PER @BoerCharl 8 B-EVE World 4 B-LOC Nederlandse 3 B-LOC Britse B-MISC #Eurovision 4 I-PER @Chimiel 8 B-LOC Boom 4 B-LOC Nieuwpoort 3 B-LOC Brugge-Gent B-MISC #HetFouteUur 4 I-PER @IndyTech 8 B-LOC India 4 B-LOC Oosterweel 3 B-LOC Duitsland B-PER @LeilaVDM 4 I-PER @Kroy_Wendy 8 B-LOC Leuven 4 B-LOC R4 3 B-LOC English B-PER @NMBS 4 I-PER @SachaDierckx 8 B-LOC Ruddervoorde 4 B-MISC #3Dplan 3 B-LOC Europa B-PER Amai 4 I-PER Ja 8 B-MISC @Extrasportbe 4 B-PER Filip 3 B-LOC Faculteit B-PER Ge 4 I-PER Van 8 B-MISC PSW 4 B-PER Jo 3 B-LOC Genk B-PER Hello 4 I-PRO JE 8 B-MISC Sociale 4 B-PER Knap 3 B-LOC Hoogerheide B-PER Jennifer 4 I-PRO To 8 B-ORG @Sporza 4 B-PER Rik 3 B-LOC Londen B-PER Naar't 4 I-PRO in 8 B-PER @Cedrinho 4 B-PER Straks 3 B-LOC Spanje B-PER Nou 4 I-PRO of 8 B-PER @JannisBal 4 B-PER Tim 3 B-MISC @TomMeeusen B-PER Pokémon 3 B-EVE World 8 B-PER D 4 B-PER Twitter 3 B-MISC Federer B-PER Sergio 3 B-LOC Antwerpse 8 B-PER Twitter 4 B-PRO De 3 B-MISC I'm B-PER Serieus 3 B-LOC Asse 8 I-EVE League 4 B-PRO RT 3 B-ORG Ajax B-PRO De 3 B-LOC Belg 8 I-EVE Open 4 I-MISC & 3 B-ORG Studio I-PER @AnkeBuckinx 3 B-LOC Den 8 I-MISC Wetenschappen 4 I-PER . 3 B-ORG VDB I-PER Lawrence 3 B-LOC Duitsland 8 I-PER #Nys 4 I-PER @DeBrabandere 3 B-PER Andy I-PER Nou 3 B-LOC EU 8 I-PER 't 4 I-PER @JokeSchauvliege 3 B-PER Bart I-PRO NIET 3 B-LOC Leie 8 I-PER @MatthDV 4 I-PER @LoesDewulf 3 B-PER Cav I-PRO V 3 B-LOC Luxemburg 8 I-PER Acker 4 I-PER @essenscia_NL 3 B-PER Chelsea I-PRO the 3 B-LOC Luxemburgse 8 I-PER Alphen 4 I-PER Haegen 3 B-PER Denis B-LOC @HooverphonicOff 3 B-LOC Marokkanen 8 I-PER Anne-Laure 4 I-PER Hilde 3 B-PER Erik B-LOC @Qmusic_be 3 B-LOC Nederlander 8 I-PER Vos 4 I-PRO @crevits 3 B-PER Goffin B-LOC Belg 3 B-LOC Nederlandse 8 I-PER Wuyts 3 B-LOC #Gent 3 B-PER Jan B-LOC België 3 B-LOC Rusland 7 B-EVE US 3 B-LOC #Torhout 3 B-PER Jennifer B-LOC Boedapest 3 B-LOC Turkse 7 B-LOC Europees 3 B-LOC #VlaParl 3 B-PER Julien
67
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
FREQ 329 227 74 61 58 58 56 54 41 39 39 36 36 34 33 31 28 28 27 26 24 23 21 21 21 20 20 19 19 18 18 17 16 16 15 15 14 14 14 14 13 13 12 12 11 11 11 10 10 10
WOMAN 6 WOMAN 7 Category Named entity FREQ Category Named entity B-PER RT 404 B-PER RT B-LOC Brugge 102 I-PER @JarettSays B-LOC Vlaamse 52 B-PRO The I-PER Van 45 I-PRO and B-LOC #Brugge 39 I-PRO the B-PER Mercedes 34 I-PER @BuzzFeed B-LOC Vlaanderen 32 I-PRO of I-PER Volcem 24 I-PRO The B-LOC RT 24 I-PRO You B-LOC Brugse 21 B-LOC Belgium I-PER De 21 B-LOC RT B-LOC Facebook 21 I-PER Lawrence B-ORG Vlaams 21 I-PER RT B-PER Open 19 B-PER Jennifer I-LOC @Barttommelein 19 I-PRO A I-ORG Brugge 18 I-PRO to B-ORG N-VA 17 B-LOC Gent I-ORG Parlement 17 B-PER @Alineagain B-ORG Club 17 B-PER Harry B-PER Woonbonus 17 I-PRO : B-PRO De 17 I-PRO Of B-LOC Vlaams 16 I-PER Potter I-PER @RuttenGwendolyn 16 I-PRO Potter I-PER @VincentVQ 15 I-PER @Hypable I-PER Block 15 I-PRO ! B-PER Maggie 15 I-PRO in I-PER VLD 15 I-PRO on B-MISC @MercedesVVolcem 14 B-PER Chris B-PER Bart 13 B-PRO Harry B-ORG NVA 13 B-PRO RT I-PER Vld 13 I-PRO ? B-LOC Hoefijzerlaan 12 B-PER James B-MISC Ramblas 12 I-PER @MichaelAusiello I-PER @FocusWTV 12 I-PRO Are B-LOC Gent 11 B-MISC X-Men I-PER @ClubBrugge 11 I-MISC ' B-LOC @Barttommelein 11 I-PER @JustJared B-LOC België 11 I-PRO . B-PER Van 11 I-PRO To B-PRO Het 10 B-LOC Zottegem I-PER @Open_VLD_Brugge 10 B-MISC I'm I-PRO Standaard 10 B-MISC I've B-LOC Spa 10 B-ORG RT I-PRO Nieuwsblad 10 B-PER @Matjas_ B-LOC @MercedesVVolcem 10 I-ORG THE B-PER De 10 I-PER @MarkRuffalo B-PER Freya 10 I-PER @ThijsVandepoele B-LOC Amsterdam 10 I-PRO On B-LOC Kortrijk 10 I-PRO With B-LOC Oostende 9 B-LOC Belgische
WOMAN 8 WOMAN 9 WOMAN 10 FREQ Category Named entity FREQ Category Named entity FREQ Category Named entity 36 B-PER Ge 31 B-PER @Julieedv 40 B-PER RT 27 B-ORG N-VA 13 B-LOC Facebook 25 B-LOC Antwerpen 26 B-PER RT 12 B-LOC @Julieedv 24 B-LOC Leuven 24 B-PRO The 12 B-PER God 15 B-LOC Vlaams-Brabant 20 B-LOC Gent 11 B-PER @Eva_Mouton 12 B-PER @Dirt_sA 18 I-PER De 11 B-PER Sorry 11 B-LOC @Eva_Mouton 17 B-ORG RT 10 B-PER @Dirt_sA 10 B-LOC Brussel 16 B-LOC Brussel 10 B-PER @MrtnDb 10 B-PER @Eva_Mouton 16 I-PRO 9 B-LOC Leuven 9 B-LOC Gent 14 I-PER Van 9 I-PRO ! 9 I-PRO ! 12 B-PRO De 8 B-PER RT 7 B-LOC Londen 12 I-PRO . 7 B-LOC @Eva_Mouton 5 B-LOC Kortrijk 11 I-PRO ! 7 B-PER @MelkMuylle 5 B-LOC Pukkelpop 11 I-PRO IK 7 B-PER Katrijn 5 B-PER Ine 10 B-LOC België 7 I-PER @Julieedv 4 B-LOC Amsterdam 10 B-LOC Ninove 6 I-PER @Eva_Mouton 4 B-LOC Facebook 10 B-PER @Eva_Mouton 6 I-PRO IK 4 B-LOC Kinepolis 10 B-PER @Kroy_Wendy 5 B-MISC TV 4 B-LOC London 10 B-PER Allez 5 B-PRO IK 4 B-LOC Rotterdam 10 B-PER Bart 5 I-PER Lap 4 B-PER @AnnlsMns 10 B-PER Karel 5 I-PRO DAT 4 B-PER @Julieedv 10 I-PER Wever 5 I-PRO MIJN 4 B-PER @RSchraey 10 I-PRO the 4 B-LOC Bertem 4 B-PER @Zezunja 9 B-LOC Aalst 4 B-MISC @FakePlasticRuby 4 B-PER Harry 9 B-LOC Vlaanderen 4 B-PER @Omaya_ 4 I-PER @Dirt_sA 8 B-ORG Vlaams 4 B-PER @SuzieQew 4 I-PER @__Mart_ 8 B-PER @PieterDHooghe 4 B-PER @Verwijd 4 I-PER Potter 8 B-PER De 4 B-PER Beste 4 I-PRO DAT 8 B-PER God 4 B-PER Twitter 3 B-LOC Bar 8 B-PER Merci 4 I-PER @Dirt_sA 3 B-LOC Brussels 8 I-PRO of 4 I-PER @Verwijd 3 B-LOC Eindhoven 7 B-MISC @LisaLeysen 4 I-PER De 3 B-LOC Geluidshuis 7 B-PER @SuzieQew 3 B-LOC Gent 3 B-LOC Ingeborg 7 B-PRO Het 3 B-LOC Sint 3 B-LOC Limburg 7 I-ORG Belang 3 B-MISC @AtheistHoly 3 B-LOC Rotselaar 7 I-PER Eetvelt 3 B-MISC @DiamondCityLove 3 B-LOC Stockholm 7 I-PER Euh 3 B-MISC @Julieedv 3 B-LOC Vlaams 6 B-LOC @Eva_Mouton 3 B-MISC @MelkMuylle 3 B-ORG N-VA 6 B-MISC @FakePlasticRuby 3 B-MISC IRL 3 B-ORG VRT 6 B-PER Euh 3 B-ORG @DonDams 3 B-PER @FakePlasticRuby 6 B-PER Evelien 3 B-ORG Colruyt 3 B-PER @InesP__ 6 B-PER Sofie 3 B-ORG DAT 3 B-PER @Kroy_Wendy 6 I-PER . 3 B-PER @DjQCo 3 B-PER @Sstrid 6 I-PER Ge 3 B-PER @DonDams 3 B-PER @VonDerDeckenTok 6 I-PER van 3 B-PER @IkbenHamza 3 B-PER @__Mart_ 6 I-PRO ? 3 B-PER @_Prince_R 3 B-PER @nietnuLaura 6 I-PRO BEN 3 B-PER Bart 3 B-PER GIJ 6 I-PRO MIJN 3 B-PER Bas 3 B-PER Milow 6 I-PRO van 3 B-PER Eva 3 I-LOC Gewest 5 B-LOC Antwerpen 3 B-PER Gij 3 I-LOC Stan
68
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
FREQ 10 10 10 10 9 9 9 9 9 9 9 9 9 9 9 8 8 8 8 8 8 8 8 8 8 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
WOMAN 6 WOMAN 7 Category Named entity FREQ Category Named entity B-LOC West-Vlaanderen 9 B-MISC TV B-PER Landuyt 9 B-PER Emma B-PER Renaat 9 B-PRO Engels I-PER @HorecaBrugge 9 I-PER @Alineagain B-LOC Antwerpen 9 I-PER @Twezus B-LOC Nederland 9 I-PER Nolan B-LOC Zeebrugge 9 I-PRO For B-MISC CD& 8 B-LOC Amsterdam B-PER @de_NVA 8 B-LOC België B-PER Jan 8 B-MISC J.K. I-LOC #Brugge 8 B-ORG The I-PER @MercedesVVolcem 8 B-PER How I-PER @StadBrugge 8 B-PER John I-PER Brugge 8 I-MISC & I-PER van 8 I-MISC Rowling B-LOC @FocusWTV 8 I-ORG RT B-LOC Bruges 8 I-PER @SophiaBush B-LOC Brussel 8 I-PER @TomRouvrois B-LOC West-Vlaamse 8 I-PER Jackman B-ORG Vlaamse 8 I-PER Van B-PER Obama 8 I-PRO I B-PER Peeters 8 I-PRO In I-ORG #Brugge 8 I-PRO for I-PER @DavidRoads 7 B-LOC Londen I-PRO ! 7 B-PER Hugh B-LOC Belgische 7 B-PER Paul B-LOC Open 7 B-PER The B-LOC Vlamingen 7 B-PER Wow B-MISC BTW 7 B-PRO Happy B-ORG De 7 I-MISC and B-PER Proficiat 7 I-ORG THIS I-ORG Regering 7 I-PER @TomFelton I-PER Tommelein 7 I-PER @VERTIGObe I-PER Wever 7 I-PER De B-LOC Belgen 7 I-PER Haha B-LOC Europese 7 I-PRO B-LOC Ro 7 I-PRO Our B-MISC @YouTube-video 7 I-PRO at B-MISC Muyters 6 B-LOC New B-ORG Mercedes 6 B-MISC Amy B-ORG OVLD 6 B-MISC Marvel's B-ORG Stad 6 B-MISC RT B-PER #Brugge 6 B-ORG @TheAVClub B-PER Di 6 B-PER @Twezus B-PER John 6 B-PER Blake B-PER Patrick 6 B-PER Joseph B-PER Schepen 6 B-PER Kristen B-PER Volcem 6 B-PER Robert I-PER #Brugge 6 B-PER Sam I-PER @LambrechtAnnick 6 B-PER Sorry
FREQ 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3
WOMAN 8 WOMAN 9 WOMAN 10 Category Named entity FREQ Category Named entity FREQ Category Named entity B-ORG The 3 B-PER Wow 3 I-PER @Eva_Mouton B-PER @Coltrui 3 I-LOC GOD 3 I-PER @Julieedv B-PER @Couverts 3 I-MISC Oh 3 I-PER @RSchraey B-PER @Endimi 3 I-PER & 3 I-PRO IK B-PER Chiau 3 I-PER @MrGeens 2 B-LOC @Dirt_sA B-PER Eefje 3 I-PER @SuzieQew 2 B-LOC @MissAiesj B-PER Gij 3 I-PER Dat 2 B-LOC @Shadeee_ B-PER Jij 3 I-PER Katrijn 2 B-LOC Aalst B-PER Peter 3 I-PER Nee 2 B-LOC Antwerpse B-PER Stefaan 3 I-PRO EEN 2 B-LOC Brits I-PRO HET 3 I-PRO HET 2 B-LOC Buchbar I-PRO NIET 3 I-PRO IS 2 B-LOC Duvel B-LOC @UPoliteia 3 I-PRO NIET 2 B-LOC Gents B-LOC Belgische 2 B-LOC @Dirt_sA 2 B-LOC Lokerse B-MISC @PieterDHooghe 2 B-LOC @ElineVanhooydon 2 B-LOC Maastricht B-ORG NMBS 2 B-LOC @Gilles_vs 2 B-LOC Mechelen B-PER @FakePlasticRuby 2 B-LOC @SuzieQew 2 B-LOC Nederland B-PER @InesP__ 2 B-LOC Antwerpen 2 B-LOC Oost-Vlaanderen B-PER @MelkMuylle 2 B-LOC Hallo 2 B-LOC Parijs B-PER @Sabrbouz 2 B-LOC Halloween 2 B-LOC Schaarbeek B-PER @Zezunja 2 B-LOC Skoda 2 B-LOC Thaise B-PER Clara 2 B-LOC Vlaams 2 B-LOC Vlaamse B-PER Geert 2 B-LOC Vlaams-Brabant 2 B-LOC Vlaming B-PER Godverdomme 2 B-LOC Vlaamse 2 B-LOC Wallen B-PER Isabel 2 B-LOC Vlaanderen 2 B-MISC #Pukkelpop B-PER Matthias 2 B-MISC @Allinefreedom 2 B-MISC @Dirt_sA B-PER Nada 2 B-MISC @DeIdealeWereld 2 B-MISC @LauraJaneDaisy B-PER Serieus 2 B-MISC @Kroy_Wendy 2 B-MISC @__Mart_ B-PER Sorry 2 B-MISC @LisaLeysen 2 B-MISC Ine B-PER Veerle 2 B-MISC BTW 2 B-ORG @redDreamhead B-PRO Frans 2 B-MISC DJ 2 B-ORG Colruyt B-PRO RT 2 B-MISC DJ's 2 B-ORG Mobistar I-ORG DAT 2 B-MISC DVD 2 B-ORG RT= I-PER @Zezunja 2 B-MISC I'm 2 B-PER @FlorVDE I-PER Marant 2 B-MISC Nieuw 2 B-PER @LaurenMeco I-PER Och 2 B-MISC Skoda 2 B-PER @MrtnDb I-PER Surf 2 B-ORG @ANK2D2 2 B-PER @Niconarrr I-PRO Nieuwe 2 B-ORG @Julieedv 2 B-PER @Omaya_ I-PRO The 2 B-ORG @Sari2_0 2 B-PER @SuzieQew I-PRO VOOR 2 B-ORG Aldi 2 B-PER @TIIM I-PRO ZO 2 B-ORG Bond 2 B-PER Alex I-PRO en 2 B-ORG Google 2 B-PER Bill B-LOC #Ninove 2 B-ORG OMG 2 B-PER Bomma B-LOC @JenzlWashington 2 B-PER @AtheistHoly 2 B-PER Evelien B-LOC @OttoJanHam 2 B-PER @Coltrui 2 B-PER Gaston B-LOC Brugge 2 B-PER @Desertfix 2 B-PER Ge B-LOC Denderleeuw 2 B-PER @ElineVanhooydon 2 B-PER Hm B-LOC Evelien 2 B-PER @FakePlasticRuby 2 B-PER I'm B-LOC Gentse 2 B-PER @HLN_BE 2 B-PER Jezus B-LOC RT 2 B-PER @IlRedeimondani 2 B-PER Kathleen
69 7.3.3 Top 100 of the most frequently occurring nouns of the men
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
MAN 1 FREQ Part-of-speech 71 N(soort,ev,basis,onz,stan) 61 N(soort,mv,basis) 45 N(soort,ev,basis,zijd,stan) 36 N(soort,ev,basis,onz,stan) 33 N(soort,ev,basis,zijd,stan) 28 N(soort,ev,basis,onz,stan) 26 N(soort,ev,basis,genus,stan) 26 N(soort,ev,basis,onz,stan) 26 N(soort,mv,basis) 25 N(soort,ev,basis,onz,stan) 23 N(soort,ev,basis,onz,stan) 23 N(soort,mv,basis) 22 N(soort,ev,basis,zijd,stan) 21 N(soort,ev,basis,onz,stan) 21 N(soort,ev,basis,zijd,stan) 20 N(soort,ev,basis,zijd,stan) 20 N(soort,ev,dim,onz,stan) 19 N(soort,ev,basis,zijd,stan) 19 N(soort,ev,basis,zijd,stan) 18 N(soort,ev,basis,onz,stan) 17 N(eigen,ev,basis,zijd,stan) 16 N(soort,ev,basis,onz,stan) 16 N(soort,ev,basis,zijd,stan) 16 N(soort,ev,basis,zijd,stan) 16 N(soort,ev,basis,zijd,stan) 16 N(soort,ev,basis,zijd,stan) 16 N(soort,ev,basis,zijd,stan) 15 N(soort,ev,basis,zijd,stan) 15 N(soort,ev,basis,zijd,stan) 15 N(soort,mv,basis) 15 N(soort,mv,basis) 14 N(eigen,ev,basis,zijd,stan) 14 N(soort,ev,basis,zijd,stan) 14 N(soort,ev,basis,zijd,stan) 13 N(soort,ev,basis,onz,stan) 13 N(soort,ev,basis,zijd,stan) 13 N(soort,ev,basis,zijd,stan) 13 N(soort,ev,basis,zijd,stan) 12 N(eigen,ev,basis,onz,stan) 12 N(soort,ev,basis,onz,stan) 12 N(soort,ev,basis,onz,stan) 12 N(soort,ev,basis,zijd,stan) 12 N(soort,ev,basis,zijd,stan) 12 N(soort,ev,basis,zijd,stan) 12 N(soort,ev,basis,zijd,stan) 12 N(soort,mv,basis) 11 N(eigen,ev,basis,onz,stan) 11 N(soort,ev,basis,onz,stan) 11 N(soort,ev,basis,onz,stan) 11 N(soort,ev,basis,onz,stan)
MAN 2 Token FREQ Part-of-speech jaar 60 N(eigen,ev,basis,onz,stan) mensen 46 N(soort,ev,basis,zijd,stan) dag 42 N(soort,ev,basis,onz,stan) uur 36 N(soort,ev,basis,onz,stan) man 35 N(soort,ev,basis,zijd,stan) nieuws 29 N(eigen,ev,basis,zijd,stan) keer 25 N(soort,ev,basis,zijd,stan) idee 24 N(soort,ev,basis,zijd,stan) @LisaLeysen 23 N(soort,mv,basis) werk 20 N(soort,ev,basis,zijd,stan) leven 19 N(soort,ev,basis,zijd,stan) kinderen 19 N(soort,ev,basis,zijd,stan) twitter 19 N(soort,ev,basis,zijd,stan) artikel 18 N(soort,ev,basis,onz,stan) week 18 N(soort,ev,basis,zijd,stan) auto 17 N(soort,ev,basis,zijd,stan) beetje 16 N(soort,ev,basis,onz,stan) plaats 16 N(soort,ev,basis,zijd,stan) tijd 16 N(soort,ev,basis,zijd,stan) internet 15 N(eigen,ev,basis,zijd,stan) Jezus 15 N(eigen,ev,basis,zijd,stan) geld 15 N(soort,ev,basis,zijd,stan) @_katrijn 15 N(soort,mv,basis) facebook 15 N(soort,mv,basis) naam 15 N(soort,mv,basis) regering 14 N(soort,ev,basis,onz,stan) tv 13 N(eigen,ev,basis,onz,stan) euro 12 N(soort,ev,basis,onz,stan) film 12 N(soort,ev,basis,zijd,stan) dagen 12 N(soort,ev,basis,zijd,stan) minuten 12 N(soort,ev,basis,zijd,stan) Obama 12 N(soort,mv,basis) keer 11 N(soort,ev,basis,zijd,stan) paniek 11 N(soort,ev,basis,zijd,stan) weekend 11 N(soort,ev,basis,zijd,stan) foto 11 N(soort,ev,basis,zijd,stan) kans 11 N(soort,mv,basis) vakantie 10 N(soort,ev,basis,onz,stan) Brussel 10 N(soort,ev,basis,onz,stan) hoofd 10 N(soort,ev,basis,onz,stan) moment 10 N(soort,ev,basis,onz,stan) bus 10 N(soort,ev,basis,zijd,stan) radio 10 N(soort,ev,basis,zijd,stan) reclame 10 N(soort,mv,basis) wereld 9 N(eigen,ev,basis,zijd,stan) Duivels 9 N(soort,ev,basis,onz,stan) Amerika 9 N(soort,ev,basis,onz,stan) Belang 9 N(soort,ev,basis,zijd,stan) nummer 9 N(soort,ev,basis,zijd,stan) voetbal 9 N(soort,ev,basis,zijd,stan)
Token Gent #NowPlaying @recradiocentrum jaar radio #Spotify #RunKeeper week @bartderaes @Leen_DeSchutter @FredoFredonis dag podcast @bibliotheekgent @SweetCharlie @Frankdhanis uur #nowplaying #radio #Shazam zondag @veerledevos @demorgen @frednasen mensen @aaiBoek @UrgentFM @stubru Radio app ijsberg dagen @TUMULTFM film muziek tv kinderen @gertjanvanegdom boek interview nieuws foto tijd @kristofdhanens @MNMbe @klaasballegeer album @delijn @itzStevieWonder @pieterbl
FREQ 54 53 52 48 45 39 33 33 31 29 27 25 24 24 24 23 23 22 21 21 20 19 19 19 19 19 19 18 18 17 17 17 17 17 16 16 16 15 15 15 15 15 14 14 14 14 14 13 13 13
MAN 3 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,genus,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan)
Token trein mensen #thuisopeen jaar Gent dag #dtv @kzenmatthias @vrtpers #neverforget week succes uur tijd @nmbs leven #defusie VRT #pendelpret foto plaats Brussel @stubru werk @een @rikea lift Dag keer keer man vrouw Mensen kinderen @MNMbe #dsmtw bus @Julieedv geld @zegmaarbas naam #komeneten @hautekiet @Stautemaz @delijn Gent-Sint-Pieters studenten Brugge @ThijsVandepoele Lowieke
FREQ 123 37 35 32 29 28 25 21 18 17 14 14 13 12 11 11 9 8 7 7 7 5 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
MAN 4 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,genus,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan)
Token #RunKeeper lengtes Lichtervelde Brugge @StijnPollet others swim Brussel @CarlDieryckx @destandaard Kortemark @ThijsKevin Torhout XD Brussels Hogeschool-Universiteit @SofieVerschoore @AlessandroSomer keer @Valentienx @vrtderedactie trein #Lichtervelde België @LynnS__ examen @tijd gie nie week @FienVerschuren @e_janssens Bruxelles Gent Haha internet nr #Salamander #opfriscursus @alessandrosomer @be_rail @vyaggio BREAKING Dag auto haha kans keer vakantie verjaardag
71
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
MAN 1 FREQ Part-of-speech 11 N(soort,mv,basis) 11 N(soort,mv,basis) 10 N(soort,ev,basis,onz,stan) 10 N(soort,ev,basis,onz,stan) 10 N(soort,ev,basis,zijd,stan) 10 N(soort,ev,basis,zijd,stan) 10 N(soort,mv,basis) 10 N(soort,mv,basis) 10 N(soort,mv,basis) 10 N(soort,mv,basis) 9 N(eigen,ev,basis,onz,stan) 9 N(eigen,ev,basis,onz,stan) 9 N(soort,ev,basis,zijd,stan) 9 N(soort,ev,basis,zijd,stan) 9 N(soort,ev,basis,zijd,stan) 9 N(soort,ev,basis,zijd,stan) 9 N(soort,ev,basis,zijd,stan) 9 N(soort,ev,basis,zijd,stan) 9 N(soort,ev,dim,onz,stan) 9 N(soort,mv,basis) 9 N(soort,mv,basis) 9 N(soort,mv,basis) 8 N(eigen,ev,basis,onz,stan) 8 N(soort,ev,basis,onz,stan) 8 N(soort,ev,basis,onz,stan) 8 N(soort,ev,basis,onz,stan) 8 N(soort,ev,basis,onz,stan) 8 N(soort,ev,basis,onz,stan) 8 N(soort,ev,basis,zijd,stan) 8 N(soort,ev,basis,zijd,stan) 8 N(soort,ev,basis,zijd,stan) 8 N(soort,ev,basis,zijd,stan) 8 N(soort,ev,basis,zijd,stan) 8 N(soort,ev,basis,zijd,stan) 8 N(soort,ev,basis,zijd,stan) 7 N(eigen,ev,basis,onz,stan) 7 N(eigen,ev,basis,onz,stan) 7 N(eigen,ev,basis,onz,stan) 7 N(eigen,ev,basis,onz,stan) 7 N(eigen,ev,basis,zijd,stan) 7 N(eigen,ev,basis,zijd,stan) 7 N(eigen,ev,basis,zijd,stan) 7 N(eigen,ev,basis,zijd,stan) 7 N(soort,ev,basis,onz,stan) 7 N(soort,ev,basis,onz,stan) 7 N(soort,ev,basis,onz,stan) 7 N(soort,ev,basis,onz,stan) 7 N(soort,ev,basis,onz,stan) 7 N(soort,ev,basis,onz,stan) 7 N(soort,ev,basis,onz,stan)
MAN 2 Token FREQ Part-of-speech dingen 9 N(soort,ev,basis,zijd,stan) foto's 9 N(soort,ev,basis,zijd,stan) café 9 N(soort,ev,dim,onz,stan) water 8 N(eigen,ev,basis,onz,stan) trein 8 N(eigen,ev,basis,zijd,stan) winter 8 N(soort,ev,basis,onz,stan) Mensen 8 N(soort,ev,basis,onz,stan) verkiezingen 8 N(soort,ev,basis,onz,stan) vreemden 8 N(soort,ev,basis,zijd,stan) weken 8 N(soort,ev,basis,zijd,stan) Antwerpen 8 N(soort,ev,basis,zijd,stan) WK 8 N(soort,ev,basis,zijd,stan) Lap 8 N(soort,mv,basis) Tip 8 N(soort,mv,basis) ebola 7 N(soort,ev,basis,onz,stan) hand 7 N(soort,ev,basis,onz,stan) toekomst 7 N(soort,ev,basis,onz,stan) vraag 7 N(soort,ev,basis,onz,stan) filmpje 7 N(soort,ev,basis,onz,stan) Goeiemorgen 7 N(soort,ev,basis,zijd,stan) jaren 7 N(soort,ev,basis,zijd,stan) media 7 N(soort,ev,basis,zijd,stan) Limburg 7 N(soort,ev,basis,zijd,stan) aantal 7 N(soort,ev,basis,zijd,stan) bier 7 N(soort,ev,basis,zijd,stan) huis 7 N(soort,mv,basis) verhaal 7 N(soort,mv,basis) woord 7 N(soort,mv,basis) @inebenzine 6 N(eigen,ev,basis,zijd,stan) app 6 N(eigen,ev,basis,zijd,stan) broek 6 N(eigen,ev,basis,zijd,stan) dood 6 N(eigen,ev,basis,zijd,stan) straat 6 N(soort,ev,basis,genus,stan) vrouw 6 N(soort,ev,basis,onz,stan) website 6 N(soort,ev,basis,onz,stan) België 6 N(soort,ev,basis,onz,stan) Gent 6 N(soort,ev,basis,onz,stan) Leuven 6 N(soort,ev,basis,onz,stan) Nederlands 6 N(soort,ev,basis,onz,stan) God 6 N(soort,ev,basis,onz,stan) Sharknado 6 N(soort,ev,basis,onz,stan) maandag 6 N(soort,ev,basis,onz,stan) mei 6 N(soort,ev,basis,onz,stan) begin 6 N(soort,ev,basis,zijd,stan) boek 6 N(soort,ev,basis,zijd,stan) deel 6 N(soort,ev,basis,zijd,stan) journaal 6 N(soort,ev,basis,zijd,stan) kind 6 N(soort,ev,basis,zijd,stan) miljoen 6 N(soort,ev,basis,zijd,stan) ongeluk 6 N(soort,ev,basis,zijd,stan)
MAN 3 Token FREQ Part-of-speech @stefanvst 13 N(eigen,ev,basis,zijd,stan) @urgentfm 13 N(soort,ev,basis,zijd,stan) filmpje 12 N(eigen,ev,basis,zijd,stan) Brussel 12 N(soort,ev,basis,onz,stan) @UrgentFM 12 N(soort,ev,basis,onz,stan) festival 12 N(soort,ev,basis,zijd,stan) weekend 12 N(soort,ev,basis,zijd,stan) werk 12 N(soort,ev,basis,zijd,stan) @canvastv 12 N(soort,mv,basis) @vrtderedactie 11 N(soort,ev,basis,onz,stan) docu 11 N(soort,ev,basis,zijd,stan) studio 11 N(soort,ev,basis,zijd,stan) @XanderPeeters 11 N(soort,ev,basis,zijd,stan) @freyagoossens 11 N(soort,ev,basis,zijd,stan) @FreeKontent 11 N(soort,ev,basis,zijd,stan) @vooruit 11 N(soort,mv,basis) geld 11 N(soort,mv,basis) probleem 10 N(eigen,ev,basis,onz,stan) verhaal 10 N(eigen,ev,basis,onz,stan) @PieterGDevriese 10 N(eigen,ev,basis,zijd,stan) @denachtraaf 10 N(eigen,ev,basis,zijd,stan) bib 10 N(eigen,ev,basis,zijd,stan) keer 10 N(soort,ev,basis,onz,stan) mail 10 N(soort,ev,basis,onz,stan) plaat 10 N(soort,ev,basis,zijd,stan) #buitvandeboekenbeurs 10 N(soort,ev,basis,zijd,stan) @senneguns 10 N(soort,ev,basis,zijd,stan) minuten 10 N(soort,ev,basis,zijd,stan) @Jonazzty 10 N(soort,ev,basis,zijd,stan) Check 10 N(soort,ev,basis,zijd,stan) Facebook 10 N(soort,ev,basis,zijd,stan) Wow 10 N(soort,ev,basis,zijd,stan) keer 10 N(soort,ev,basis,zijd,stan) Radiocentrum 10 N(soort,mv,basis) artikel 10 N(soort,mv,basis) geval 10 N(soort,mv,basis) huis 9 N(eigen,ev,basis,zijd,stan) idee 9 N(eigen,ev,basis,zijd,stan) magazine 9 N(eigen,ev,basis,zijd,stan) nr 9 N(eigen,ev,basis,zijd,stan) nummer 9 N(soort,ev,basis,onz,stan) project 9 N(soort,ev,basis,onz,stan) woord 9 N(soort,ev,basis,zijd,stan) @Mr_Planckie 9 N(soort,ev,basis,zijd,stan) @VonDerDeckenTok 9 N(soort,ev,basis,zijd,stan) @janseurinck 9 N(soort,ev,basis,zijd,stan) @tomasdeman 9 N(soort,ev,basis,zijd,stan) aflevering 9 N(soort,mv,basis) journalistiek 9 N(soort,mv,basis) maand 8 N(eigen,ev,basis,onz,stan)
MAN 4 Token FREQ Part-of-speech Twitter 3 N(soort,ev,basis,zijd,stan) collega 3 N(soort,ev,basis,zijd,stan) @EliseVandv 3 N(soort,mv,basis) huis 2 N(eigen,ev,basis,onz,stan) station 2 N(eigen,ev,basis,onz,stan) @KrlnCmmrts 2 N(eigen,ev,basis,onz,stan) kans 2 N(eigen,ev,basis,onz,stan) straat 2 N(eigen,ev,basis,onz,stan) #nmbs 2 N(eigen,ev,basis,zijd,stan) @AlfredoMorreel 2 N(eigen,ev,basis,zijd,stan) @Timothy_dev 2 N(eigen,ev,basis,zijd,stan) euro 2 N(eigen,ev,basis,zijd,stan) info 2 N(eigen,ev,basis,zijd,stan) ochtend 2 N(eigen,ev,basis,zijd,stan) wereld 2 N(eigen,ev,basis,zijd,stan) Goeiemorgen 2 N(eigen,mv,basis) dagen 2 N(soort,ev,basis,onz,stan) Antwerpen 2 N(soort,ev,basis,onz,stan) Leuven 2 N(soort,ev,basis,onz,stan) @GentseMarteko 2 N(soort,ev,basis,onz,stan) @Stautemaz 2 N(soort,ev,basis,onz,stan) Facebook 2 N(soort,ev,basis,onz,stan) bed 2 N(soort,ev,basis,onz,stan) weekend 2 N(soort,ev,basis,onz,stan) @Asthota 2 N(soort,ev,basis,onz,stan) @Ljosmyndun 2 N(soort,ev,basis,zijd,stan) @katrienboon 2 N(soort,ev,basis,zijd,stan) @letsjak 2 N(soort,ev,basis,zijd,stan) @petervandeveire 2 N(soort,ev,basis,zijd,stan) @sambal_be 2 N(soort,ev,basis,zijd,stan) aflevering 2 N(soort,ev,basis,zijd,stan) tv 2 N(soort,ev,basis,zijd,stan) tweets 2 N(soort,ev,basis,zijd,stan) #debikerboys 2 N(soort,ev,basis,zijd,stan) #levensvragen 2 N(soort,ev,basis,zijd,stan) vrouwen 2 N(soort,ev,basis,zijd,stan) Sinterklaas 2 N(soort,ev,basis,zijd,stan) Thijs 2 N(soort,ev,basis,zijd,stan) maandag 2 N(soort,ev,basis,zijd,stan) september 2 N(soort,ev,basis,zijd,stan) @thomas_vrbrggn 2 N(soort,ev,basis,zijd,stan) water 2 N(soort,ev,basis,zijd,stan) @Tinamaerevoet 2 N(soort,ev,basis,zijd,stan) @VonDerDeckenTok 2 N(soort,ev,basis,zijd,stan) hond 2 N(soort,ev,basis,zijd,stan) studio 2 N(soort,ev,basis,zijd,stan) tweet 2 N(soort,ev,basis,zijd,stan) @LisaLeysen 2 N(soort,ev,basis,zijd,stan) kleren 2 N(soort,ev,basis,zijd,stan) West-Vlaanderen 2 N(soort,ev,basis,zijd,stan)
Token weg wi vragen Brussel-Centraal Kortemark Mobilia Roeselare Veldegem #NMBS @Valentienx Amadeo Facebook Kotdrink Unizo zondag Markten #Brussel @Autowereld @stijnpollet jaar kot lid personeel uur zwembad @MercedesBenz_NL @belgacom_eva_NL @khbo @kweeet @onemorething @thijskevin afstand ahja aula chocolade economie frambozetaart ghog helft klas kortfilm maarja mail man medaille nacht ng oef snelheid tijd
72
FREQ 1 34 2 34 3 29 4 25 5 21 6 21 7 21 8 20 9 20 10 20 11 20 12 19 13 19 14 19 15 19 16 19 17 18 18 18 19 18 20 18 21 18 22 16 23 16 24 16 25 16 26 15 27 14 28 14 29 14 30 14 31 14 32 13 33 13 34 13 35 12 36 12 37 12 38 12 39 12 40 12 41 12 42 11 43 11 44 11 45 11 46 11 47 10 48 10 49 10 50 10
MAN 5 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan)
MAN 6 Token FREQ Part-of-speech @vygo_be 81 N(soort,ev,basis,zijd,stan) @FienVerschuren 45 N(soort,ev,basis,onz,stan) jaar 38 N(soort,ev,basis,onz,stan) week 35 N(eigen,ev,basis,onz,stan) België 31 N(eigen,ev,basis,onz,stan) @BrunoByttebier 29 N(soort,ev,basis,zijd,stan) @Peter_Hoefslag 27 N(soort,ev,basis,zijd,stan) Kortrijk 26 N(soort,mv,basis) deel 25 N(eigen,ev,basis,onz,stan) @ntone 24 N(soort,ev,basis,zijd,stan) website 23 N(soort,mv,basis) paard 22 N(soort,ev,basis,zijd,stan) @diskwriter 22 N(soort,ev,basis,zijd,stan) @shequus 21 N(soort,ev,basis,zijd,stan) tijd 20 N(soort,ev,basis,onz,stan) mensen 20 N(soort,ev,basis,zijd,stan) werk 19 N(eigen,ev,basis,onz,stan) @LonnekeRuesink 19 N(soort,ev,basis,zijd,stan) @lauradieryckx 19 N(soort,ev,basis,zijd,stan) @laurensvdw 18 N(soort,mv,basis) dag 17 N(soort,ev,basis,zijd,stan) Belgen 16 N(soort,ev,basis,zijd,stan) weekend 16 N(soort,ev,basis,zijd,stan) top 14 N(soort,ev,basis,zijd,stan) beetje 14 N(soort,ev,basis,zijd,stan) others 14 N(soort,ev,basis,zijd,stan) Gent 14 N(soort,mv,basis) succes 13 N(soort,ev,basis,onz,stan) uur 13 N(soort,ev,basis,zijd,stan) @Wendy_Scholten 13 N(soort,ev,basis,zijd,stan) @fienverschuren 13 N(soort,ev,basis,zijd,stan) @StijnPollet 13 N(soort,ev,basis,zijd,stan) #dtv 13 N(soort,mv,basis) paarden 12 N(eigen,ev,basis,zijd,stan) @YoungCrazyFool 12 N(soort,ev,basis,onz,stan) foto 12 N(soort,ev,basis,onz,stan) man 12 N(soort,ev,basis,onz,stan) paardensport 12 N(soort,mv,basis) sport 12 N(soort,mv,basis) #jumpingmechelen 11 N(eigen,ev,basis,zijd,stan) es 11 N(soort,ev,basis,onz,stan) artikel 11 N(soort,ev,basis,zijd,stan) @JensVercruysse 11 N(soort,ev,basis,zijd,stan) info 11 N(soort,ev,basis,zijd,stan) trein 11 N(soort,ev,basis,zijd,stan) filmpje 10 N(eigen,ev,basis,onz,stan) Nederland 10 N(eigen,ev,basis,onz,stan) #volttv 10 N(soort,ev,basis,onz,stan) @LeandraLM 10 N(soort,ev,basis,onz,stan) @sanderdenayer 10 N(soort,ev,basis,zijd,stan)
MAN 7 Token FREQ Part-of-speech @Stad_Roeselare 98 N(soort,ev,basis,zijd,stan) @fanseel 54 N(soort,ev,basis,onz,stan) jaar 47 N(soort,mv,basis) Brugge 46 N(soort,ev,basis,zijd,stan) Roeselare 37 N(soort,mv,basis) dag 35 N(soort,ev,basis,onz,stan) @wduyck 34 N(soort,ev,basis,onz,stan) @NietzscheQuotes 33 N(soort,ev,dim,onz,stan) #Brugge 30 N(soort,ev,basis,zijd,stan) week 28 N(eigen,ev,basis,zijd,stan) mensen 28 N(soort,ev,basis,onz,stan) @JBoudrez 26 N(soort,ev,basis,zijd,stan) tijd 24 N(soort,ev,basis,onz,stan) Stad 23 N(soort,mv,basis) werk 22 N(eigen,ev,basis,onz,stan) @sportwereld_be 22 N(soort,ev,basis,zijd,stan) #Roeselare 21 N(soort,ev,basis,onz,stan) #quote 21 N(soort,mv,basis) #roeselare 20 N(soort,ev,basis,onz,stan) @Visit_Bruges 20 N(soort,ev,basis,zijd,stan) #Roeselare 20 N(soort,ev,basis,zijd,stan) @destandaard 19 N(soort,ev,basis,onz,stan) stad 19 N(soort,ev,basis,onz,stan) @tijd 19 N(soort,ev,basis,zijd,stan) http 18 N(eigen,ev,basis,zijd,stan) http://t.... 18 N(soort,ev,basis,onz,stan) @jodenys 18 N(soort,ev,basis,zijd,stan) centrum 18 N(soort,ev,basis,zijd,stan) @BruneelEline 17 N(soort,ev,basis,zijd,stan) column 17 N(soort,ev,basis,zijd,stan) keer 17 N(soort,mv,basis) man 16 N(soort,ev,basis,genus,stan) medewerkers 16 N(soort,ev,basis,zijd,stan) Club 15 N(soort,ev,basis,onz,stan) leiderschap 15 N(soort,ev,basis,onz,stan) uur 15 N(soort,ev,basis,zijd,stan) weekend 15 N(soort,ev,basis,zijd,stan) #Bruges 15 N(soort,ev,basis,zijd,stan) @jefstaes 15 N(soort,mv,basis) #Brugge 15 N(soort,mv,dim) @RoeselareSport 14 N(soort,ev,basis,zijd,stan) @sporza 13 N(eigen,ev,basis,zijd,stan) euro 13 N(soort,ev,basis,onz,stan) http://t.c... 13 N(soort,ev,basis,zijd,stan) wereld 13 N(soort,mv,basis) Anderlecht 12 N(soort,ev,basis,zijd,stan) Vlaanderen 12 N(soort,ev,basis,zijd,stan) #stubru 12 N(soort,ev,basis,zijd,stan) talent 12 N(soort,mv,basis) #ambaroe 12 N(soort,mv,basis)
Token @wduyck onderzoek mensen @DeVisKar @lvandingenen werk jaar beetje @CarlvKeirsbilck @DenysJan @fanseel @thebandb @thebandb studenten Gent studie probleem @kverstegen onderwijs keer twitter debat stuk @DevlieghereJ @Kroy_Wendy idee @Lunarreader dag @DDucheyne @MerckxKristel @LesleyArens keer psychologie #ugent effect @tombogman tijd vraag vrouwen @dodekoekjes ervaring @HeidiWulff nieuws week @johanroels @LoosveldStijn wereld wetenschap mannen media
FREQ 21 17 15 14 14 14 10 10 9 8 7 7 7 7 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3
MAN 8 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,ev,dim,onz,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,genus,stan)
Token training plaats seizoen dag koers wedstrijd weer fiets week man jaar weekend tijd tijdrit Sporza einde @sporza info overwinning voorspelling dagen profs renners aanval deugd outfit rug stage start dagje zonnetje @K_Vermeerbergen benen nr vliegtuig BK T conditie keer ronde val ziekte wedstrijdje Belgie Brussel-Opwijk Man @JefVnMeirhaeghe Hoho zondag keer
73
FREQ 51 10 52 10 53 10 54 10 55 10 56 10 57 9 58 9 59 9 60 9 61 9 62 9 63 9 64 8 65 8 66 8 67 8 68 8 69 8 70 8 71 8 72 8 73 8 74 8 75 8 76 8 77 8 78 7 79 7 80 7 81 7 82 7 83 7 84 7 85 7 86 7 87 7 88 7 89 7 90 7 91 7 92 7 93 7 94 7 95 7 96 6 97 6 98 6 99 6 100 6
MAN 5 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,ev,dim,onz,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,genus,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan)
MAN 6 Token FREQ Part-of-speech cc 10 N(soort,ev,basis,zijd,stan) dressuur 10 N(soort,ev,dim,onz,stan) plaats 9 N(eigen,ev,basis,onz,stan) tv 9 N(eigen,ev,basis,zijd,stan) @ElVeetje 9 N(soort,ev,basis,genus,stan) @Liesbetje 9 N(soort,ev,basis,onz,stan) Belg 9 N(soort,ev,basis,onz,stan) keer 9 N(soort,ev,basis,onz,stan) @EricDupain 9 N(soort,ev,basis,onz,stan) @de_paardenkrant 9 N(soort,ev,basis,onz,stan) mss 9 N(soort,ev,basis,zijd,stan) wereld 9 N(soort,ev,basis,zijd,stan) @Sportcareers 9 N(soort,ev,basis,zijd,stan) idee 9 N(soort,ev,basis,zijd,stan) team 9 N(soort,ev,basis,zijd,stan) verschil 9 N(soort,ev,basis,zijd,stan) voetbal 9 N(soort,mv,basis) @KirstyDeWolf 8 N(soort,ev,basis,onz,stan) @RevorBedding 8 N(soort,ev,basis,onz,stan) @verge 8 N(soort,ev,basis,zijd,stan) avond 8 N(soort,ev,basis,zijd,stan) mening 8 N(soort,ev,basis,zijd,stan) super 8 N(soort,ev,basis,zijd,stan) woop 8 N(soort,ev,basis,zijd,stan) stukje 8 N(soort,ev,basis,zijd,stan) @MaartenLauwers 8 N(soort,ev,basis,zijd,stan) maanden 8 N(soort,ev,basis,zijd,stan) @ProjectVygo 8 N(soort,ev,basis,zijd,stan) @myfei_home 8 N(soort,mv,basis) @CarlDieryckx 8 N(soort,mv,basis) @e_kimedias 8 N(soort,mv,basis) @thenerd_be 8 N(soort,mv,basis) aandacht 7 N(eigen,ev,basis,zijd,stan) http://t.... 7 N(eigen,ev,basis,zijd,stan) laptop 7 N(soort,ev,basis,onz,stan) mens 7 N(soort,ev,basis,zijd,stan) persconferentie 7 N(soort,ev,basis,zijd,stan) weg 7 N(soort,ev,basis,zijd,stan) wereldbeker 7 N(soort,ev,basis,zijd,stan) foto's 7 N(soort,ev,basis,zijd,stan) media 7 N(soort,ev,basis,zijd,stan) reacties 7 N(soort,ev,basis,zijd,stan) tips 7 N(soort,mv,basis) weken 7 N(soort,mv,basis) wereldruiterspelen 7 N(soort,mv,basis) Normandië 7 N(soort,mv,basis) @VincentVQ 6 N(eigen,ev,basis,zijd,stan) Vygo 6 N(eigen,ev,basis,zijd,stan) zondag 6 N(soort,ev,basis,onz,stan) land 6 N(soort,ev,basis,onz,stan)
MAN 7 Token FREQ Part-of-speech @piersmorgan 11 N(eigen,ev,basis,onz,stan) beetje 11 N(soort,ev,basis,onz,stan) België 11 N(soort,ev,basis,onz,stan) zaterdag 11 N(soort,ev,basis,onz,stan) keer 11 N(soort,ev,basis,onz,stan) @thebandb 11 N(soort,ev,basis,zijd,stan) Nieuwsblad 11 N(soort,ev,basis,zijd,stan) einde 11 N(soort,ev,dim,onz,stan) idee 11 N(soort,mv,basis) succes 11 N(soort,mv,basis) @Mr_P 11 N(soort,mv,basis) @TheTweetOfGod 11 N(soort,mv,basis) dank 10 N(eigen,ev,basis,zijd,stan) organisatie 10 N(soort,ev,basis,onz,stan) ploeg 10 N(soort,ev,basis,onz,stan) zomer 10 N(soort,ev,basis,zijd,stan) media 10 N(soort,ev,basis,zijd,stan) leven 10 N(soort,ev,basis,zijd,stan) probleem 10 N(soort,ev,basis,zijd,stan) #7dag 10 N(soort,mv,basis) #mufc 9 N(soort,ev,basis,onz,stan) @GaryLineker 9 N(soort,ev,basis,onz,stan) @Nieuwsblad_be 9 N(soort,ev,basis,onz,stan) @StevenDeseure 9 N(soort,ev,basis,onz,stan) @annebethboudry 9 N(soort,ev,basis,zijd,stan) @dcbrugge 9 N(soort,ev,basis,zijd,stan) @nieuwsblad_be 9 N(soort,ev,basis,zijd,stan) @vrtderedactie 9 N(soort,ev,basis,zijd,stan) @MaximeSoenen 9 N(soort,ev,basis,zijd,stan) @hinssen 9 N(soort,ev,basis,zijd,stan) kinderen 9 N(soort,mv,basis) werknemers 9 N(soort,mv,basis) God 9 N(soort,mv,basis) VIDEO 9 N(soort,mv,basis) @Fact 8 N(eigen,ev,basis,zijd,stan) @Luidprotest 8 N(soort,ev,basis,onz,stan) blog 8 N(soort,ev,basis,onz,stan) info 8 N(soort,ev,basis,onz,stan) manier 8 N(soort,ev,basis,onz,stan) naam 8 N(soort,ev,basis,onz,stan) plaats 8 N(soort,ev,basis,onz,stan) start 8 N(soort,ev,basis,onz,stan) besturen 8 N(soort,ev,basis,onz,stan) examens 8 N(soort,ev,basis,zijd,stan) gemeenten 8 N(soort,ev,basis,zijd,stan) werken 8 N(soort,ev,basis,zijd,stan) @DenysJan 8 N(soort,ev,basis,zijd,stan) juni 8 N(soort,ev,basis,zijd,stan) #nieuwsblad 8 N(soort,ev,basis,zijd,stan) @jobat 8 N(soort,ev,dim,onz,stan)
MAN 8 Token FREQ Part-of-speech België 3 N(soort,ev,basis,onz,stan) aantal 3 N(soort,ev,basis,onz,stan) akkoord 3 N(soort,ev,basis,onz,stan) paar 3 N(soort,ev,basis,onz,stan) verschil 3 N(soort,ev,basis,onz,stan) @GertPeersman 3 N(soort,ev,basis,zijd,stan) naam 3 N(soort,ev,basis,zijd,stan) @Filo_Sofietje 3 N(soort,ev,basis,zijd,stan) @FrankVanLaeken 3 N(soort,ev,basis,zijd,stan) @demorgen 3 N(soort,ev,basis,zijd,stan) kandidaten 3 N(soort,ev,basis,zijd,stan) studies 3 N(soort,ev,basis,zijd,stan) @VeroniekC 3 N(soort,ev,basis,zijd,stan) @ResearchUGent 3 N(soort,ev,basis,zijd,stan) boek 3 N(soort,ev,basis,zijd,stan) @ChristlJoris 3 N(soort,ev,basis,zijd,stan) @wdebaene 3 N(soort,ev,basis,zijd,stan) basis 3 N(soort,ev,basis,zijd,stan) discussie 3 N(soort,ev,basis,zijd,stan) collega's 3 N(soort,ev,basis,zijd,stan) Akkoord 3 N(soort,ev,basis,zijd,stan) gevoel 3 N(soort,ev,basis,zijd,stan) interview 3 N(soort,ev,dim,onz,stan) woord 3 N(soort,mv,basis) #7dag 3 N(soort,mv,basis) @JIMMYVDP 3 N(soort,mv,basis) euro 2 N(eigen,ev,basis,onz,stan) feedback 2 N(eigen,ev,basis,onz,stan) man 2 N(eigen,ev,basis,onz,stan) rector 2 N(eigen,ev,basis,onz,stan) Mensen 2 N(eigen,ev,basis,onz,stan) dagen 2 N(eigen,ev,basis,zijd,stan) data 2 N(eigen,ev,basis,zijd,stan) problemen 2 N(eigen,ev,basis,zijd,stan) God 2 N(eigen,ev,basis,zijd,stan) #eurovision 2 N(eigen,ev,basis,zijd,stan) @AnnBrusseel 2 N(eigen,ev,basis,zijd,stan) geld 2 N(soort,ev,basis,onz,stan) huis 2 N(soort,ev,basis,onz,stan) leven 2 N(soort,ev,basis,onz,stan) management 2 N(soort,ev,basis,onz,stan) publiek 2 N(soort,ev,basis,onz,stan) succes 2 N(soort,ev,basis,onz,stan) @deviskar 2 N(soort,ev,basis,onz,stan) @wille_bart 2 N(soort,ev,basis,onz,stan) analyse 2 N(soort,ev,basis,onz,stan) job 2 N(soort,ev,basis,onz,stan) kwaliteit 2 N(soort,ev,basis,zijd,stan) toekomst 2 N(soort,ev,basis,zijd,stan) @ksavje 2 N(soort,ev,basis,zijd,stan)
Token bed calpe deel gevoel nieuwsblad @ManuWemel duurtraining eetfestijn finale km koop kracht massasprint meet mn ploegmaat proloog rit rustdag vlucht wind winnaar weertje Goeiemorgen tijdrijden weken Frankrijk Kuurne Mojacar Puivelde spanje @Jensvdb Apfff Dinsdag Legley Vrijdag maart @ErwinBorgonjon hotel id klassement nieuwjaar parcours stuk topseizoen werk west #GREIPEL_Andre #KBK Belgie
74
FREQ 1 87 2 45 3 41 4 27 5 20 6 19 7 16 8 16 9 16 10 16 11 16 12 15 13 14 14 14 15 13 16 13 17 12 18 11 19 11 20 11 21 11 22 11 23 10 24 10 25 10 26 10 27 10 28 10 29 10 30 10 31 10 32 10 33 9 34 9 35 9 36 9 37 9 38 9 39 9 40 9 41 9 42 9 43 9 44 8 45 8 46 8 47 8 48 8 49 8 50 8
MAN 9 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,genus,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan)
MAN 10 Token FREQ Part-of-speech @JGobelijn 159 N(eigen,ev,basis,onz,stan) jaar 45 N(soort,ev,basis,zijd,stan) #eurosong 42 N(soort,mv,basis) #dtv 39 N(soort,ev,basis,onz,stan) uur 38 N(soort,mv,basis) dag 29 N(soort,ev,basis,genus,stan) Photo 29 N(soort,ev,basis,zijd,stan) Rusland 28 N(soort,ev,basis,zijd,stan) #krobel 26 N(eigen,ev,basis,zijd,stan) #lt 23 N(soort,ev,basis,zijd,stan) BREAKING 15 N(eigen,ev,basis,onz,stan) @Twezus 13 N(soort,ev,basis,zijd,stan) Dag 13 N(soort,ev,basis,zijd,stan) bus 12 N(eigen,ev,basis,onz,stan) @paddypower 12 N(soort,ev,basis,zijd,stan) #janbecaus 11 N(eigen,ev,basis,onz,stan) #balduzoute 11 N(soort,ev,basis,zijd,stan) België 11 N(soort,ev,basis,zijd,stan) @EgonTemm 10 N(soort,ev,basis,onz,stan) Foto 10 N(soort,ev,basis,onz,stan) mensen 10 N(soort,ev,basis,zijd,stan) minuten 10 N(soort,ev,basis,zijd,stan) #sociaalVB 10 N(soort,ev,basis,zijd,stan) Artists 10 N(soort,ev,dim,onz,stan) #UEFASuperCup 10 N(soort,mv,basis) #lastfm 9 N(eigen,ev,basis,zijd,stan) #yolosibirsk 9 N(soort,ev,basis,zijd,stan) euro 9 N(soort,ev,basis,zijd,stan) man 9 N(soort,ev,basis,zijd,stan) tijd 9 N(soort,ev,basis,zijd,stan) verjaardag 9 N(soort,ev,basis,zijd,stan) vraag 9 N(soort,ev,basis,zijd,stan) Anderlecht 9 N(soort,ev,basis,zijd,stan) #FCBCFC 8 N(eigen,ev,basis,onz,stan) keer 8 N(soort,ev,basis,onz,stan) leven 8 N(soort,ev,basis,zijd,stan) moment 8 N(soort,mv,basis) keer 7 N(eigen,ev,basis,zijd,stan) week 7 N(soort,ev,basis,zijd,stan) @FrankVanLaeken 7 N(soort,ev,basis,zijd,stan) @Pegerniemels 7 N(soort,ev,basis,zijd,stan) @mbaeten 7 N(soort,ev,basis,zijd,stan) Russen 7 N(soort,ev,basis,zijd,stan) Russisch 7 N(soort,mv,basis) Siberië 7 N(soort,mv,basis) FIFA 6 N(eigen,ev,basis,onz,stan) N-VA 6 N(eigen,ev,basis,zijd,stan) #doctorwho 6 N(soort,ev,basis,onz,stan) #herbertarmada 6 N(soort,ev,basis,onz,stan) #homofielevoetballer 6 N(soort,ev,basis,onz,stan)
Token Gent @ThomasLievrouw @kennydecruyen stadhuis others keer Ruimte campus Schoonmeersen #blok Wervik @chelseafc dag Anderlecht @JasperStragier Ghent @Sam_Dep nie kot succes @VoetCh @sporza tis beetje #durftevragen @VoetCh #dtv @arne_six @rscanderlecht orde trein tv zin Kortrijk @friescarton @TomVandewinckel Menen SL @voetch avond jup man richting @foofighters blokken België @Robainn #wijstakenniet @stubru Overpoortstraat
75
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
FREQ 8 8 8 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5
MAN 9 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,gen) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan)
Token @ChiaraVSt plaats @demorgen Duitsland Bayern VRT #makro #vergetenduivel kind miljoen #belusa #kvk @koengodderis Twitter oorlog wereld #invlaamsevelden kinderen media Antwerpen Gent Novosibirsk Sint-Niklaas Vlaanderen @MrDarkley @OldSchoolPanini Twitter #vrtjournaal @ruben_vanlent geld nummer #belalg & @HVolkaerts @JSPR_ @delijn helft klap koning match #miasanmia @eenonmens Duivels dingen graden Da's Moskou @JGobelijn Novosibirsk Obama
MAN 10 FREQ Part-of-speech 6 N(soort,ev,basis,onz,stan) 6 N(soort,ev,basis,onz,stan) 6 N(soort,ev,basis,zijd,stan) 6 N(soort,ev,basis,zijd,stan) 6 N(soort,ev,basis,zijd,stan) 6 N(soort,ev,basis,zijd,stan) 6 N(soort,ev,basis,zijd,stan) 6 N(soort,ev,basis,zijd,stan) 6 N(soort,ev,basis,zijd,stan) 6 N(soort,mv,basis) 6 N(soort,mv,basis) 5 N(eigen,ev,basis,onz,stan) 5 N(eigen,ev,basis,zijd,stan) 5 N(eigen,ev,basis,zijd,stan) 5 N(soort,ev,basis,onz,stan) 5 N(soort,ev,basis,onz,stan) 5 N(soort,ev,basis,onz,stan) 5 N(soort,ev,basis,onz,stan) 5 N(soort,ev,basis,onz,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,ev,basis,zijd,stan) 5 N(soort,mv,basis) 5 N(soort,mv,basis) 5 N(soort,mv,basis) 5 N(soort,mv,basis) 4 N(eigen,ev,basis,zijd,stan) 4 N(eigen,ev,basis,zijd,stan) 4 N(eigen,ev,basis,zijd,stan) 4 N(eigen,ev,basis,zijd,stan) 4 N(eigen,ev,basis,zijd,stan) 4 N(eigen,ev,basis,zijd,stan) 4 N(eigen,ev,basis,zijd,stan) 4 N(eigen,ev,basis,zijd,stan) 4 N(eigen,ev,basis,zijd,stan) 4 N(soort,ev,basis,onz,stan) 4 N(soort,ev,basis,onz,stan) 4 N(soort,ev,basis,onz,stan)
Token jaar werk @JacobDegrande @destandaard @htc Bachelorproef da keer staking #komeneten @Moljers Barcelona Kortrijksepoortstraat Lukaku #tvgidsandroid examen seizoen water weekend #COYM #belkaz @FillWerrelFan @GaryLineker @thomaslievrouw @xanderycke bachelorproef back blokdag gie goal les tram weg zetel @StefWijnants @demorgen mensen merci @JacobDegrande @LisaCoulleit_ @TomVdwinckele Ma Mmm Schoonmeersstraat Tis Voskenslaan zaterdag #fingerscrossed @rscanderlecht Gebouw
76 7.3.4 Top 100 of the most frequently occurring nouns of the women
77
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
FREQ 64 59 59 54 52 46 43 36 35 32 32 32 31 31 25 25 25 24 23 21 21 21 20 19 19 18 18 17 17 17 17 17 17 16 16 16 15 15 15 15 14 14 14 14 14 13 13 13 13 13
WOMAN 1 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,genus,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,gen) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan)
WOMAN 2 Token FREQ Part-of-speech bus 57 N(soort,ev,basis,onz,stan) @_kleingeelvisje 57 N(soort,ev,basis,onz,stan) @catocoremans 42 N(soort,mv,basis) Dag 32 N(soort,ev,basis,zijd,stan) jaar 28 N(soort,ev,basis,zijd,stan) Mechelen 26 N(soort,mv,basis) #5dagenzonderzagen 25 N(eigen,ev,basis,onz,stan) trein 24 N(eigen,ev,basis,onz,stan) dag 24 N(soort,ev,basis,onz,stan) werk 23 N(soort,mv,basis) vrouw 21 N(soort,ev,basis,zijd,stan) beetje 20 N(soort,ev,basis,onz,stan) #degrotesprong 18 N(soort,ev,basis,zijd,stan) da's 18 N(soort,ev,dim,onz,stan) #missbelgie 18 N(soort,mv,basis) tijd 17 N(soort,ev,basis,zijd,stan) jongens 17 N(soort,ev,basis,zijd,stan) uur 17 N(soort,ev,basis,zijd,stan) keer 16 N(eigen,ev,basis,onz,stan) #dtv 16 N(eigen,ev,basis,onz,stan) @delijn 16 N(soort,ev,basis,onz,stan) man 16 N(soort,ev,basis,onz,stan) mensen 16 N(soort,ev,basis,zijd,stan) @JolienRoets 16 N(soort,ev,basis,zijd,stan) keer 15 N(soort,ev,basis,zijd,stan) @LeilaVDM 15 N(soort,ev,basis,zijd,stan) week 15 N(soort,ev,basis,zijd,stan) #tvvv 15 N(soort,ev,dim,onz,stan) @catocoremans 15 N(soort,mv,basis) #fouterockers 15 N(soort,mv,basis) @IkBenLauren 14 N(soort,ev,basis,zijd,stan) @cpossemiers 14 N(soort,mv,basis) Da's 13 N(soort,ev,basis,onz,stan) #dbssvv 13 N(soort,ev,basis,zijd,stan) tandarts 13 N(soort,ev,basis,zijd,stan) minuten 13 N(soort,ev,basis,zijd,stan) Da's 12 N(soort,ev,basis,onz,stan) #sytycd 12 N(soort,ev,basis,onz,stan) #wvsw 12 N(soort,ev,basis,onz,stan) mama 12 N(soort,ev,basis,onz,stan) nummer 12 N(soort,ev,basis,zijd,stan) @_katrijn 12 N(soort,ev,basis,zijd,stan) wereld 12 N(soort,ev,basis,zijd,stan) dagen 12 N(soort,ev,basis,zijd,stan) vrouwen 12 N(soort,ev,basis,zijd,stan) @Qmusic_BE 12 N(soort,ev,basis,zijd,stan) Merci 12 N(soort,mv,basis) huis 11 N(soort,ev,basis,onz,stan) leven 11 N(soort,ev,basis,onz,stan) #dsmtw 11 N(soort,ev,basis,zijd,stan)
WOMAN 3 Token FREQ Part-of-speech @Independent 127 N(soort,ev,basis,onz,stan) jaar 101 N(soort,ev,basis,zijd,stan) mensen 85 N(soort,ev,basis,zijd,stan) @chenlingzhang 75 N(soort,ev,basis,zijd,stan) @laurenscollee 56 N(eigen,ev,basis,onz,stan) @LaDoubleMers 48 N(soort,mv,basis) Gent 47 N(soort,ev,basis,zijd,stan) Kortrijk 46 N(soort,ev,basis,zijd,stan) werk 46 N(soort,ev,basis,zijd,stan) vrouwen 45 N(eigen,ev,basis,onz,stan) week 45 N(soort,ev,basis,zijd,stan) @jeltenieuwhuis 43 N(eigen,mv,basis) @nikmahie 43 N(soort,mv,basis) @_kleingeelvisje 42 N(eigen,ev,basis,zijd,stan) @mbaeten 39 N(soort,ev,basis,onz,stan) auto 38 N(soort,ev,basis,onz,stan) dag 38 N(soort,ev,basis,zijd,stan) wereld 38 N(soort,mv,basis) Antwerpen 36 N(soort,ev,basis,zijd,stan) België 34 N(soort,ev,basis,zijd,stan) geld 34 N(soort,ev,basis,zijd,stan) leven 33 N(eigen,ev,basis,onz,stan) man 33 N(soort,mv,basis) tijd 32 N(soort,ev,dim,onz,stan) @SweetCharlie 31 N(soort,mv,basis) @destandaard 30 N(soort,ev,basis,onz,stan) energie 30 N(soort,ev,basis,onz,stan) beetje 30 N(soort,ev,basis,zijd,stan) @RenevDensen 28 N(soort,mv,basis) kinderen 26 N(soort,ev,basis,onz,stan) file 25 N(soort,ev,basis,genus,stan) @eenonmens 25 N(soort,mv,basis) nummer 25 N(soort,mv,basis) elektriciteit 24 N(soort,ev,basis,onz,stan) foto 24 N(soort,ev,basis,zijd,stan) muziek 24 N(soort,ev,basis,zijd,stan) #lichtuit 23 N(eigen,ev,basis,onz,stan) boek 23 N(soort,ev,basis,onz,stan) nieuws 23 N(soort,ev,basis,onz,stan) uur 23 N(soort,ev,basis,zijd,stan) @AlArabiya_Eng 22 N(eigen,ev,basis,zijd,stan) @ElleEase 22 N(soort,ev,basis,onz,stan) @HuffingtonPost 22 N(soort,ev,basis,zijd,stan) @OrakelvMerksem 21 N(soort,ev,basis,onz,stan) @guardian 21 N(soort,ev,basis,zijd,stan) vrouw 21 N(soort,ev,basis,zijd,stan) @demorgen 20 N(soort,ev,basis,onz,stan) huis 20 N(soort,ev,basis,zijd,stan) moment 20 N(soort,ev,basis,zijd,stan) @volkskrant 20 N(soort,mv,basis)
WOMAN 4 Token jaar #hockeybe @BiekeCornillie @extrasportbe België @matthiasmaes dag @MrgrtMaxine sport Gent plaats Spelen #RedPanthers @MatthDV seizoen #TeamBelgium tijd @JulieBaeten @TimvanDijk1991 #cycling top EK @NestenSchepens beetje #RedLions Jaar succes @Pieter_DeDecker @BELRedPanthers brons keer dames mensen Succes #tennis wedstrijd Nederland stuk weekend week Haha voetbal start goud @VNieuwenhuyse keer nieuws @angeliquedupre richting mannen
FREQ 250 151 113 100 86 76 58 44 43 41 40 37 35 35 35 34 29 29 28 28 27 26 25 25 25 24 24 23 23 23 23 23 23 22 21 21 21 21 20 20 20 19 18 18 16 16 15 15 15 15
Part-of-speech N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis)
Token @crevits @MP_Peeters & jaar @cdenv Vlaanderen minister onderwijs @katrienrosseel succes werk Minister @minouesquenet @wegenenverkeer miljoen dag @crevits euro #stemcdenv fietspaden Antwerpen mensen Gent Dank mobiliteit @delijn samenwerking Brugge bezoek opening tijd bedrijven werken regering Brussel toekomst weg @JoVandeurzen project verkeer #verkeersveiligheid verkeersveiligheid nr zon Onderwijs dank spitsstrook top kinderen vrouwen
78
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
FREQ 13 13 13 13 12 12 12 12 12 12 12 12 12 12 12 11 11 11 11 11 11 10 10 10 10 10 10 10 10 10 10 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
WOMAN 1 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,mv,basis) N(soort,mv,basis)
WOMAN 2 Token FREQ Part-of-speech foto 11 N(soort,ev,basis,zijd,stan) jongen 11 N(soort,ev,basis,zijd,stan) naam 11 N(soort,mv,basis) dingen 11 N(soort,mv,basis) Vilvoorde 11 N(soort,mv,basis) Sorry 10 N(soort,ev,basis,onz,stan) @vincentvangeel 10 N(soort,ev,basis,onz,stan) hoofd 10 N(soort,ev,basis,onz,stan) plezier 10 N(soort,ev,basis,zijd,stan) #Cavalcade 10 N(soort,ev,basis,zijd,stan) baby 10 N(soort,ev,basis,zijd,stan) gsm 10 N(soort,ev,basis,zijd,stan) #becaushits 10 N(soort,ev,basis,zijd,stan) maanden 10 N(soort,ev,basis,zijd,stan) ouders 10 N(soort,ev,basis,zijd,stan) Stromae 10 N(soort,ev,basis,zijd,stan) @gertjanvanegdom 10 N(soort,ev,basis,zijd,stan) kind 10 N(soort,ev,basis,zijd,stan) stuk 10 N(soort,mv,basis) #homofielevoetballer 10 N(soort,mv,basis) aflevering 10 N(soort,mv,basis) #Eurovision 10 N(soort,mv,basis) Allez 9 N(eigen,ev,basis,onz,stan) God 9 N(eigen,ev,basis,onz,stan) Twitter 9 N(soort,ev,basis,zijd,stan) #Eurovision 9 N(soort,ev,basis,zijd,stan) koffie 9 N(soort,ev,basis,zijd,stan) rest 9 N(soort,ev,basis,zijd,stan) stem 9 N(soort,ev,basis,zijd,stan) @NeleBollen 9 N(soort,ev,basis,zijd,stan) @petra_stevens 9 N(soort,mv,basis) Disneyland 9 N(soort,mv,basis) #FouteUur 9 N(soort,mv,basis) #perrongeluk 9 N(soort,mv,basis) bureau 8 N(eigen,ev,basis,onz,stan) station 8 N(eigen,ev,basis,onz,stan) water 8 N(eigen,ev,basis,zijd,stan) @bakkerkrof 8 N(soort,ev,basis,onz,stan) hond 8 N(soort,ev,basis,onz,stan) liefde 8 N(soort,ev,basis,onz,stan) schoonzus 8 N(soort,ev,basis,onz,stan) snor 8 N(soort,ev,basis,zijd,stan) taart 8 N(soort,ev,basis,zijd,stan) tante 8 N(soort,ev,basis,zijd,stan) tuin 8 N(soort,ev,basis,zijd,stan) zetel 8 N(soort,ev,basis,zijd,stan) zin 8 N(soort,ev,basis,zijd,stan) meisje 8 N(soort,ev,basis,zijd,stan) @VeerleGoossens 8 N(soort,ev,basis,zijd,stan) kinderen 8 N(soort,ev,basis,zijd,stan)
WOMAN 3 Token FREQ Part-of-speech http:/... 20 N(soort,mv,basis) website 19 N(soort,mv,basis) @FrankVanLaeken 18 N(eigen,ev,basis,zijd,stan) @lvandingenen 18 N(eigen,mv,basis) media 18 N(soort,ev,basis,zijd,stan) bed 18 N(soort,ev,basis,zijd,stan) hoofd 17 N(soort,ev,basis,onz,stan) idee 17 N(soort,ev,basis,zijd,stan) @BBQezel 17 N(soort,ev,basis,zijd,stan) @Sir_Neighbour 17 N(soort,ev,basis,zijd,stan) @ZaakJustitie 17 N(soort,ev,basis,zijd,stan) @r0eland 17 N(soort,ev,basis,zijd,stan) ht... 17 N(soort,mv,basis) http://t.c... 17 N(soort,mv,basis) kop 17 N(soort,mv,basis) krant 17 N(soort,mv,basis) maand 16 N(eigen,ev,basis,zijd,stan) tv 16 N(soort,ev,basis,zijd,stan) @HuubBellemakers 16 N(soort,ev,basis,zijd,stan) @cpossemiers 16 N(soort,ev,basis,zijd,stan) @gewoonxadi 16 N(soort,ev,basis,zijd,stan) dingen 16 N(soort,ev,dim,onz,stan) Brussel 16 N(soort,ev,dim,onz,stan) Nederland 15 N(eigen,ev,basis,zijd,stan) #dtv 15 N(soort,ev,basis,onz,stan) @HubertDeMeulder 15 N(soort,ev,basis,zijd,stan) http... 14 N(eigen,ev,basis,onz,stan) http://... 14 N(eigen,ev,basis,onz,stan) http://t.co... 14 N(eigen,ev,basis,zijd,stan) smartphone 14 N(soort,ev,basis,onz,stan) Mensen 14 N(soort,ev,basis,onz,stan) dagen 14 N(soort,ev,basis,onz,stan) mannen 14 N(soort,ev,basis,zijd,stan) tips 14 N(soort,ev,basis,zijd,stan) China 14 N(soort,ev,basis,zijd,stan) Parijs 14 N(soort,ev,basis,zijd,stan) Facebook 14 N(soort,ev,basis,zijd,stan) Succes 14 N(soort,mv,basis) miljoen 13 N(eigen,ev,basis,onz,stan) onderzoek 13 N(eigen,ev,basis,onz,stan) succes 13 N(eigen,ev,basis,onz,stan) @PaulienVrf 13 N(soort,ev,basis,onz,stan) @hannes_bhc 13 N(soort,ev,basis,onz,stan) @sanfyezerskiy 13 N(soort,ev,basis,zijd,stan) http://t... 13 N(soort,ev,basis,zijd,stan) http://t.... 13 N(soort,ev,basis,zijd,stan) http://t.co/... 13 N(soort,ev,basis,zijd,stan) prijs 13 N(soort,mv,basis) tip 13 N(soort,mv,basis) trein 12 N(eigen,ev,basis,zijd,stan)
WOMAN 4 Token FREQ Part-of-speech sporten 14 N(eigen,ev,basis,onz,stan) @Sportcareers 14 N(soort,ev,basis,onz,stan) @Extrasportbe 14 N(soort,ev,basis,onz,stan) Belgen 14 N(soort,ev,basis,onz,stan) finale 14 N(soort,ev,basis,zijd,stan) medaille 14 N(soort,ev,basis,zijd,stan) hockey 14 N(soort,mv,basis) @k1sas 14 N(soort,mv,basis) hockey 13 N(eigen,ev,basis,onz,stan) les 13 N(soort,ev,basis,onz,stan) man 13 N(soort,ev,basis,onz,stan) weg 13 N(soort,ev,basis,zijd,stan) da's 13 N(soort,ev,basis,zijd,stan) dagen 13 N(soort,ev,basis,zijd,stan) medailles 13 N(soort,ev,basis,zijd,stan) vrouwen 13 N(soort,ev,basis,zijd,stan) OS 13 N(soort,mv,basis) @Saraa_vd 12 N(eigen,ev,basis,onz,stan) cc 12 N(soort,ev,basis,zijd,stan) krant 12 N(soort,ev,basis,zijd,stan) naam 12 N(soort,ev,basis,zijd,stan) @Lorrietje 12 N(soort,ev,basis,zijd,stan) @ksavje 12 N(soort,mv,basis) @Eva_DP 12 N(soort,mv,basis) Proficiat 12 N(soort,mv,basis) #atletiek 12 N(soort,mv,basis) Oost-Vlaanderen 11 N(eigen,ev,basis,onz,stan) Rio 11 N(eigen,ev,basis,zijd,stan) Belg 11 N(soort,ev,basis,onz,stan) team 11 N(soort,ev,basis,onz,stan) werk 11 N(soort,ev,basis,onz,stan) zilver 11 N(soort,ev,basis,zijd,stan) #trackcycling 11 N(soort,ev,basis,zijd,stan) @HanneloreDesmet 11 N(soort,ev,basis,zijd,stan) ploeg 11 N(soort,ev,basis,zijd,stan) ronde 11 N(soort,ev,basis,zijd,stan) tweet 11 N(soort,ev,basis,zijd,stan) @KaatHannes 11 N(soort,ev,basis,zijd,stan) @Sporza 11 N(soort,mv,basis) Haha 11 N(soort,mv,basis) Londen 10 N(soort,ev,basis,onz,stan) interview 10 N(soort,ev,basis,onz,stan) omnium 10 N(soort,ev,basis,onz,stan) @JolienDhoore 10 N(soort,ev,basis,onz,stan) @vremdetweet 10 N(soort,ev,basis,onz,stan) link 10 N(soort,ev,basis,onz,stan) titel 10 N(soort,ev,basis,onz,stan) jongens 10 N(soort,ev,basis,zijd,stan) media 10 N(soort,ev,basis,zijd,stan) @Cedrinho 10 N(soort,ev,basis,zijd,stan)
Token West-Vlaanderen Succes gesprek overleg foto http://t.co... @mariannethyssen studenten Limburg debat proficiat @patrickdhaese actie haven km start dagen Oostende @filipwatteeuw aanleg campagne wereld @JOACHIMCOENS @italbers kansen wegen Torhout CD& Aantal deel initiatief #mobiliteit brug burgemeester bus infrastructuur realisatie stap leerlingen scholen gezelschap leven licht talent team vervoer volk hand http http://t.c...
79
FREQ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
50 49 41 34 28 23 22 22 21 20 20 19 16 16 15 15 15 15 14 14 14 13 13 13 12 12 12 12 12 12 12 12 12 11 11 10 10 10 10 9 9 9 9 9 9 8 8 8 8 8
WOMAN 5 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,genus,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,mv,basis) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,dim,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan)
Token @KaatBuysse jaar dag mensen week tijd uur others Brugge Gent #tdf trein finale tickets @stubru huis match dagen examen @LisaDemulder keer seizoen voetbal avond keer kot leven moment #bkwaregem @vrtderedactie auto les tv @Gaylevh plaats Belgen gevoel hoofd beetje België @Clijsterskim zondag bed weekend set Hazard dak kind nieuws podium
WOMAN 6 FREQ Part-of-speech 246 N(eigen,ev,basis,onz,stan) 105 N(soort,ev,basis,onz,stan) 65 N(soort,ev,basis,onz,stan) 64 N(soort,mv,basis) 62 N(eigen,ev,basis,onz,stan) 62 N(soort,ev,basis,zijd,stan) 61 N(soort,ev,basis,zijd,stan) 57 N(soort,ev,basis,onz,stan) 57 N(soort,ev,basis,zijd,stan) 56 N(soort,ev,basis,zijd,stan) 54 N(eigen,ev,basis,onz,stan) 52 N(soort,ev,basis,zijd,stan) 49 N(soort,mv,basis) 45 N(soort,ev,basis,onz,stan) 44 N(soort,ev,basis,zijd,stan) 40 N(soort,ev,basis,zijd,stan) 39 N(eigen,ev,basis,onz,stan) 38 N(soort,ev,basis,zijd,stan) 37 N(soort,mv,basis) 36 N(eigen,ev,basis,zijd,stan) 33 N(soort,ev,basis,zijd,stan) 33 N(soort,mv,basis) 32 N(soort,ev,basis,zijd,stan) 31 N(soort,ev,basis,zijd,stan) 30 N(eigen,ev,basis,zijd,stan) 29 N(soort,ev,basis,onz,stan) 29 N(soort,ev,basis,zijd,stan) 29 N(soort,ev,basis,zijd,stan) 29 N(soort,mv,basis) 28 N(eigen,ev,basis,zijd,stan) 28 N(soort,ev,basis,zijd,stan) 26 N(soort,ev,basis,onz,stan) 26 N(soort,ev,basis,zijd,stan) 25 N(soort,ev,basis,onz,stan) 25 N(soort,ev,basis,zijd,stan) 24 N(eigen,ev,basis,zijd,stan) 23 N(soort,ev,basis,zijd,stan) 23 N(soort,ev,basis,zijd,stan) 23 N(soort,mv,basis) 22 N(soort,ev,basis,onz,stan) 22 N(soort,ev,basis,onz,stan) 22 N(soort,ev,basis,zijd,stan) 22 N(soort,ev,basis,zijd,stan) 21 N(eigen,ev,basis,zijd,stan) 21 N(eigen,ev,basis,zijd,stan) 21 N(soort,ev,basis,onz,stan) 21 N(soort,ev,basis,zijd,stan) 21 N(soort,ev,basis,zijd,stan) 20 N(eigen,ev,basis,zijd,stan) 20 N(soort,ev,basis,onz,stan)
Token Brugge jaar procent @JasperPillen Vlaanderen stad euro Voorstel @MercedesVVolcem regering #Brugge toekomst mensen beleid @vrtderedactie foto @MercedesVVolcem @destandaard woningen Facebook mobiliteit @demorgen woning dag Woonbonus debat #brugge weg @vermeulenniels N-VA @youtube Parlement @tijd @openvld #7dag @RuttenGwendolyn @alexanderdecroo overheid wachtlijsten miljard miljoen burgemeester economie Mercedes NVA werk energie plaats Hoefijzerlaan geld
FREQ 47 39 33 28 27 27 26 24 22 19 18 18 18 17 17 16 15 15 15 14 14 13 13 13 12 12 12 12 12 11 11 11 11 11 11 11 10 10 10 10 10 10 10 10 10 9 9 9 9 9
WOMAN 7 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,genus,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan)
WOMAN 8 Token FREQ Part-of-speech @owlwithenvelope 84 N(soort,mv,basis) @estetollen 81 N(soort,ev,basis,onz,stan) dag 79 N(soort,ev,basis,zijd,stan) jaar 52 N(soort,ev,basis,onz,stan) @Alineagain 43 N(soort,mv,basis) @hydrogens 42 N(soort,ev,basis,zijd,stan) @Simon_Amez 42 N(soort,mv,basis) week 41 N(soort,ev,basis,zijd,stan) @EmilyMichiels 40 N(soort,ev,basis,zijd,stan) @TomRouvrois 40 N(soort,mv,basis) @CharlotteLepas 36 N(soort,ev,basis,zijd,stan) film 35 N(soort,ev,basis,genus,stan) mensen 35 N(soort,ev,basis,onz,stan) Gent 35 N(soort,mv,basis) uur 33 N(soort,ev,basis,onz,stan) @Matjas_ 30 N(soort,ev,basis,zijd,stan) #AgentsofSHIELD 30 N(soort,mv,basis) @empiremagazine 29 N(soort,ev,basis,zijd,stan) trein 29 N(soort,ev,basis,zijd,stan) Also 29 N(soort,ev,basis,zijd,stan) @julianbelgium 28 N(soort,ev,basis,onz,stan) school 28 N(soort,ev,basis,onz,stan) tijd 27 N(eigen,ev,basis,zijd,stan) @papillotes 27 N(soort,mv,basis) keer 26 N(soort,ev,basis,zijd,stan) @perjacxis 26 N(soort,ev,basis,zijd,stan) @vulture 25 N(soort,ev,basis,onz,stan) darling 24 N(soort,ev,basis,onz,stan) man 24 N(soort,ev,basis,onz,stan) @Alineagain 23 N(soort,mv,basis) #belusa 23 N(soort,mv,basis) #dtv 22 N(soort,ev,basis,onz,stan) @TomRouvrois 22 N(soort,ev,basis,zijd,stan) Film 22 N(soort,mv,basis) keer 21 N(eigen,ev,basis,onz,stan) @JenniferUpdates 21 N(soort,ev,basis,onz,stan) Zottegem 21 N(soort,ev,basis,zijd,stan) & 21 N(soort,mv,basis) @Magische_Schelp 20 N(soort,ev,basis,onz,stan) aflevering 20 N(soort,ev,basis,onz,stan) http 20 N(soort,ev,basis,zijd,stan) les 20 N(soort,ev,basis,zijd,stan) beetje 20 N(soort,mv,basis) @GimmeChemicals 19 N(soort,ev,basis,onz,stan) X-Men 19 N(soort,ev,basis,zijd,stan) Engels 19 N(soort,mv,basis) @TheAVClub 19 N(soort,mv,basis) @YouTube 18 N(soort,ev,basis,zijd,stan) @sebabraet 18 N(soort,ev,basis,zijd,stan) @EllenAsylumwise 18 N(soort,ev,basis,zijd,stan)
Token mensen jaar dag leven kinderen tijd @LisaLeysen ge man @pcasteels week keer werk @matthias_somers huis wijn vrienden keer trein twitter geld uur N-VA dingen vrouw wereld bed boek fruit maanden vrouwen moment zin dagen Gent ziekenhuis maand mannen hoofd kind naam zetel ogen idee plaats Mensen minuten @_katrijn @sanfyezerskiy @zegmaarbas
80
FREQ 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
8 8 8 8 8 8 8 8 8 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
WOMAN 5 Part-of-speech N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan)
WOMAN 6 Token FREQ Part-of-speech programma 20 N(soort,ev,basis,zijd,stan) werk 20 N(soort,ev,basis,zijd,stan) #rvv 19 N(soort,ev,basis,onz,stan) @evadaeleman 19 N(soort,ev,basis,onz,stan) fan 18 N(eigen,ev,basis,onz,stan) helft 18 N(eigen,ev,basis,zijd,stan) iPhone 18 N(soort,ev,basis,onz,stan) radio 18 N(soort,ev,basis,onz,stan) @TomMeeusen 18 N(soort,ev,basis,onz,stan) Nederland 18 N(soort,ev,basis,zijd,stan) Albert 18 N(soort,ev,basis,zijd,stan) Nys 17 N(soort,ev,basis,zijd,stan) begin 17 N(soort,ev,basis,zijd,stan) einde 17 N(soort,mv,basis) feit 17 N(soort,mv,basis) ma 16 N(eigen,ev,basis,onz,stan) nr 16 N(soort,ev,basis,onz,stan) #dsmtw 16 N(soort,ev,basis,zijd,stan) Werchter 16 N(soort,mv,basis) cross 16 N(soort,mv,basis) kans 15 N(eigen,ev,basis,onz,stan) koers 15 N(soort,ev,basis,onz,stan) ronde 15 N(soort,ev,basis,onz,stan) winkel 15 N(soort,ev,basis,zijd,stan) feestje 15 N(soort,ev,basis,zijd,stan) dingen 15 N(soort,ev,basis,zijd,stan) examens 15 N(soort,ev,basis,zijd,stan) jongens 15 N(soort,mv,basis) Nadal 15 N(soort,mv,basis) Aquarius 14 N(soort,ev,basis,onz,stan) Belg 14 N(soort,ev,basis,onz,stan) Federer 14 N(soort,ev,basis,zijd,stan) donderdag 14 N(soort,ev,basis,zijd,stan) maandag 14 N(soort,ev,basis,zijd,stan) #belrus 14 N(soort,ev,basis,zijd,stan) album 14 N(soort,ev,basis,zijd,stan) idee 14 N(soort,ev,basis,zijd,stan) internet 14 N(soort,ev,basis,zijd,stan) plan 14 N(soort,ev,basis,zijd,stan) respect 14 N(soort,ev,basis,zijd,stan) verschil 14 N(soort,ev,basis,zijd,stan) water 14 N(soort,ev,basis,zijd,stan) woord 14 N(soort,mv,basis) #belwal 14 N(soort,mv,basis) #blok 14 N(soort,mv,basis) #eerstestagedag 14 N(soort,mv,basis) #giro 13 N(soort,ev,basis,onz,stan) @RobbeSchepens_ 13 N(soort,ev,basis,onz,stan) fiets 13 N(soort,ev,basis,onz,stan) maand 13 N(soort,ev,basis,onz,stan)
Token Ramblas vooruitgang nr stadhuis Spa #Brugge @Barttommelein @bertschelfhout water @MichielVanroose voorzitter commissie woonbonus @BartSomers steden Gent @sandrinedecrom jobkorting @MP_Peeters vrouwen België @freyabos voorstel #vlareg @GertPeersman oppositie stem Muyters bedrijven @lindewin huis #villapolitica @laurestuyck @sandrinedecrom bouw horeca minister partij stilstand tijd vraag vrijheid @Visit_Bruges gemeenten jaren jobs @stad_forum Nieuwsblad nt stadion
FREQ 9 9 9 9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6
WOMAN 7 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan)
WOMAN 8 Token FREQ Part-of-speech @InnocentNoet 18 N(soort,ev,basis,zijd,stan) @thatmarsgirl 18 N(soort,mv,basis) mama 17 N(eigen,ev,basis,onz,stan) @delangelisa 17 N(soort,ev,basis,zijd,stan) fans 16 N(soort,ev,basis,onz,stan) films 16 N(soort,ev,basis,zijd,stan) thanks 16 N(soort,ev,basis,zijd,stan) België 16 N(soort,ev,basis,zijd,stan) Well 16 N(soort,mv,basis) #vtmnieuws 15 N(eigen,ev,basis,zijd,stan) @MonsieurKoen 15 N(soort,ev,basis,onz,stan) @edgarwright 15 N(soort,ev,basis,onz,stan) examen 15 N(soort,ev,basis,onz,stan) @_EarlySunsets 15 N(soort,ev,basis,zijd,stan) @denofgeek 15 N(soort,ev,basis,zijd,stan) amazing 15 N(soort,mv,basis) @BelRedDevils 14 N(eigen,ev,basis,zijd,stan) dagen 14 N(soort,ev,basis,onz,stan) weken 14 N(soort,ev,basis,onz,stan) Amsterdam 14 N(soort,ev,basis,zijd,stan) Londen 14 N(soort,ev,basis,zijd,stan) #Whiplash 14 N(soort,ev,basis,zijd,stan) @SonDaan 14 N(soort,mv,basis) Wow 14 N(soort,mv,basis) #TheMindyProject 13 N(soort,ev,basis,onz,stan) @AlfredoMorreel 13 N(soort,ev,basis,onz,stan) @FilmfestGent 13 N(soort,ev,basis,onz,stan) geluk 13 N(soort,ev,basis,onz,stan) nieuws 13 N(soort,ev,basis,zijd,stan) weekend 13 N(soort,ev,basis,zijd,stan) @RealHughJackman 13 N(soort,ev,basis,zijd,stan) BREAKING 13 N(soort,ev,basis,zijd,stan) foto 13 N(soort,ev,basis,zijd,stan) http:/... 13 N(soort,ev,basis,zijd,stan) it 12 N(soort,ev,basis,onz,stan) tv 12 N(soort,ev,basis,onz,stan) vertraging 12 N(soort,ev,basis,onz,stan) Twitter 12 N(soort,ev,basis,onz,stan) zondag 12 N(soort,ev,basis,zijd,stan) #ffgent 12 N(soort,ev,basis,zijd,stan) @Waanzinema 12 N(soort,ev,basis,zijd,stan) @YahooMoviesUK 12 N(soort,ev,basis,zijd,stan) @stubru 12 N(soort,ev,basis,zijd,stan) einde 12 N(soort,ev,basis,zijd,stan) hoofd 12 N(soort,ev,basis,zijd,stan) @billyeichner 12 N(soort,ev,basis,zijd,stan) @joshuahorowitz 12 N(soort,ev,basis,zijd,stan) bachelorproef 12 N(soort,ev,basis,zijd,stan) episode 12 N(soort,mv,basis) http://t.co/... 12 N(soort,mv,basis)
Token mens schoenen Brussel avond @JenzlWashington @inebenzine alcohol euro jaren @SuzieQew begin nummer weekend job mening weken @Kroy_Wendy feit geval tv vraag vriend handen vragen gezicht paar probleem stuk @MargotHollevoet @nikmahie @sofie_shmofie baby deur foto antwoord plan punt recht @DeHuisvrouw Tijd facebook info kans kop mail manier mond richting @DonDams @MelissaJanssens
81
FREQ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
87 82 68 62 52 50 43 41 39 38 33 31 29 28 27 25 24 24 23 22 20 20 20 19 19 19 18 18 18 17 17 17 17 17 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15
WOMAN 9 Part-of-speech N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,ev,dim,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,dim) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis)
Token mensen @Julieedv @zegmaarbas @senorwauters @detomhelsen dag beetje @_kleingeelvisje leven @Eva_Mouton @inebenzine & dingen @Verwijd tijd twitter jaar @EllenKoenen @gingeraleplus @jeltenieuwhuis hoofd moment @mbaeten @_Prince_R keer man @ultradesign_be @FrankVanLaeken @dodekoekjes @Eva_Mouton hart @DeHuisvrouw @DonDams @LisaLeysen @Kroy_Wendy geld kind foto liefde manier plaats @tvanknopers woorden @Dirt_sA gevoel @kawemel @kristelmerckx rest wereld kinderen
FREQ 87 61 52 49 48 47 39 35 33 32 32 32 32 32 32 30 27 26 26 25 24 23 23 20 19 19 19 19 19 19 19 19 19 17 17 17 17 17 17 17 17 16 16 16 16 16 16 16 15 15
WOMAN10 Part-of-speech N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,genus,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,onz,stan) N(soort,ev,basis,onz,stan)
Token @sanfyezerskiy mensen @_katrijn beetje @geertsimonis @Dirt_sA dag @LisaLeysen @VonDerDeckenTok @Eva_Mouton @Eva_Mouton jaar @zegmaarbas tijd @mbaeten man werk keer week Antwerpen Leuven huis @strikeplank auto @nietnuLaura @janvandepoel @RobbySallaets @anneleis @inebenzine plaats @MelissaJanssens dingen vrouwen @InesP__ @RSchraey bed leven @jandemol foto ge keer @FakePlasticRuby @Omaya_ hoofd @MargotHollevoet taart @EllenKoenen @KVerschoren Vlaams-Brabant geld
82
FREQ 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
15 14 14 14 14 14 13 13 13 13 13 13 12 12 12 12 12 12 12 12 12 12 11 11 11 11 11 11 11 11 11 11 10 10 10 10 10 10 10 10 10 10 9 9 9 9 9 9 9 9
WOMAN 9 Part-of-speech N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,genus,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,dim,onz,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,mv,basis) N(soort,mv,basis) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(soort,ev,basis,zijd,stan) N(eigen,ev,basis,onz,stan) N(eigen,ev,basis,zijd,stan) N(eigen,ev,basis,zijd,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan) N(soort,ev,basis,onz,stan)
WOMAN 10 Token FREQ Part-of-speech mannen 15 N(soort,ev,basis,onz,stan) @MrtnDb 15 N(soort,ev,basis,onz,stan) Facebook 15 N(soort,ev,basis,zijd,stan) geval 15 N(soort,ev,basis,zijd,stan) @MrJelmie 15 N(soort,mv,basis) @bockiederepper 14 N(eigen,ev,basis,zijd,stan) God 14 N(soort,ev,basis,onz,stan) idee 14 N(soort,ev,basis,zijd,stan) @ine_vdw 14 N(soort,mv,basis) mens 13 N(eigen,ev,basis,zijd,stan) vraag 13 N(soort,ev,basis,onz,stan) vrouwen 13 N(soort,ev,basis,zijd,stan) @MFvLeuchtenberg 13 N(soort,ev,basis,zijd,stan) keer 13 N(soort,ev,basis,zijd,stan) @kristoffbertram 13 N(soort,ev,basis,zijd,stan) huis 13 N(soort,mv,basis) @strikeplank 12 N(soort,ev,basis,genus,stan) seks 12 N(soort,ev,basis,zijd,stan) week 12 N(soort,ev,basis,zijd,stan) meisje 12 N(soort,ev,basis,zijd,stan) @draaimeulen 12 N(soort,mv,basis) benen 12 N(soort,mv,basis) @PTRVDA 11 N(eigen,ev,basis,zijd,stan) @DeIdealeWereld 11 N(soort,ev,basis,onz,stan) bed 11 N(soort,ev,basis,onz,stan) @FakePlasticRuby 11 N(soort,ev,basis,zijd,stan) @jandemol 11 N(soort,ev,basis,zijd,stan) baby 11 N(soort,ev,basis,zijd,stan) naam 11 N(soort,ev,basis,zijd,stan) zin 11 N(soort,ev,basis,zijd,stan) foto's 11 N(soort,mv,basis) kleren 11 N(soort,mv,basis) @FakePlasticRuby 11 N(soort,mv,basis) @ElineVanhooydon 11 N(soort,mv,basis) stuk 11 N(soort,mv,dim) @DeWaaslandwolf 10 N(soort,ev,basis,onz,stan) @MelkMuylle 10 N(soort,ev,basis,onz,stan) @Sarcist 10 N(soort,ev,basis,onz,stan) @godverrrr 10 N(soort,ev,basis,onz,stan) @mrlgotlucky 10 N(soort,ev,basis,zijd,stan) mond 10 N(soort,ev,basis,zijd,stan) persoon 10 N(soort,ev,basis,zijd,stan) Leuven 10 N(soort,ev,basis,zijd,stan) @nietnuLaura 10 N(soort,ev,basis,zijd,stan) Sorry 10 N(soort,ev,dim,onz,stan) @jinsvonstroheim 10 N(soort,ev,dim,onz,stan) @kinkydebeer 10 N(soort,mv,basis) deel 10 N(soort,mv,basis) nr 10 N(soort,mv,dim) punt 9 N(eigen,ev,basis,onz,stan)
Token idee uur @DeHuisvrouw naam mannen @__Mart_ plezier @bollegijz minuten @Zezunja @farahfouffriau @zmeanie trein vrouw wereld kinderen zoek @_evelienc @spinocle film @EmilyMichiels @IkBenLauren @Kroy_Wendy @miss_punt woord @bebatjof @lodeba collega hoera twitter @AnnlsMns @tvanknopers foto's vrienden @dodekoekjes bad kind nieuws nummer @Dirt_sA @FakePlasticRuby @erdeebee mama wijn @dropje mopje @StienVerbelen @chateaubrys meisjes Brussel