polysemous emoji 🌹

Travis Hoppe / @metasemantic

Polysemy: One word, many senses

I put money in the bank.

The ball when in with a bank shot.

Drive carefully around the snow bank.

I went fishing in the river bank.

What about emoji? 😍 😘 😂 ❤ 😭 💯

1. Data collection

Gathered all tweets that contained the top 200 emoji.
Approximately 80,000 per hour, 13,000,000 total.

2. Data QC (spam tweets)

Removed exactly identical tweets.
Removed tweets that only differ by index:

"Hello baby @justinbieber Can u follow me? ♥ x37"
"Hello baby @justinbieber Can u follow me? ♥ x38"
"Hello baby @justinbieber Can u follow me? ♥ x39"

3. Data Wrangling

Built a pipeline for repeatable processing:

remove_mentions, remove_urls, HTML_symbols, remove_apostrophe, space_symbols, special_lowercase, replace_emoji, limit_character_subset, remove_repeated_tokens, remove_twitter_mentions_hashtags, remove_emoji_modifier

Special care: Emoji have skin tone which count as an extra character.
TIL: Fitzpatrick is the name of the skin tone scale.

4. Machine Learning

Train word2vec over tweets and consider emojis as a qualified "word"

5. What can we learn?

Habits of highly emotive people...

Emoji have synonyms and antonyms

There is an optimal length to omggggggg

`word2vec` spreads vectors across the hypersphere