Postdoctoral Fellow, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD
GSIGAASMEF CFDVFKELKV HHANENIFYC PIAIMSALAM VYLGAKDSTR TQINKVVRFD KLPGFGDEIE AQCGTSVNVH
SSLRDILNQI TKPNDVYSFS LASRLYAEER YPILPEYLQC VKELYRGGLE PINFQTAADQ ARELINSWVE SQTNGIIRNV
LQPSSVDSQT AMVLVNAIVF KGLWEKAFKD EDTQAMPFRV TEQESKPVQM MYQIGLFRVA SMASEKMKIL ELPFASGTMS
MLVLLPDEVS GLEQLESIIN FEKLTEWTSS NVMEERKIKV YLPRMKMEEK YNLTSVLMAM GITDVFSSSA NLSGISSAES
LKISQAVHAA HAEINEAGRE VVGGAEAGVD AASVSEEFRA DHPFLFCIKH IATNAVLFFG RCVSP
How do we model many protein-protein interactions?
Can we predict aggregates from experimental structure?
Be able to say what is possible, and what isn't!
Algorithmic design, ex. linear algebra, molecular dynamics...
Hardware design, specialized hardware, ex. Anton, GRAPE.
Predicting run-time (non-trivial at model stage!).
Scaling up!
Meet The Man Who Gamed Reddit With A Bot
Download Wikipedia
Download all posts with score>1000
for 2013 and 2014 (~5000)
Cross-reference each post to the correct Wikipedia paragraph
Build True positives (known TIL's)
Build Decoys (other paragraphs in TIL's)
Build unknown samples (rest of Wikipedia*)
sqlite3
, requests
, bs4
, pandas
, numpy
, scikit-learn
,gensim
, praw
, wikipedia
, nltk
, stemmming.porter2
>> "Good muffins cost $3.88\n in New York"
['Good', 'muffins', 'cost', 'TOKEN_MONEY', 'in', 'New', 'York', 'TOKEN_EOS']
>> "I sat on the rock"
['I', 'sat', 'on', 'rock']
>> stem("factionally")
'faction'
TF-IDF
(term frequency-inverse document frequency)
TF-IDF
>>> model.most_similar(positive=['woman', 'king'], negative=['man'])
[('queen', 0.50882536), ...]
>>> model.doesnt_match("breakfast cereal dinner lunch".split())
'cereal'
>>> model.similarity('woman', 'man')
0.73723527
>>> model['computer'] # raw numpy vector of a word
array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)
Uses far fewer features to store relationships between words!
Training classifier
Test Accuracy: 0.878; Test Accuracy on TP: 0.116; Test Accuracy on TN: 0.998
TIL The Founder Of Japans Mcdonalds Stated | 4726
TIL Mike Kurtz An American Burglar Found Out That | 4123
TIL A Woman That Reported 100 Incidents Of | 2899
TIL During The Sentencing Of His War Crimes Trial | 1551
TIL That Art Spiegelman The Creator Of Maus A | 1144
TIL That Once Officially Labeled As Retarded | 640
TIL Before World War Ii It Was Very Rare For | 498
TIL That A Study Showed Those With A Distressed | 142
TIL Frankie Fraser A Notorious English Gangster | 135
TIL Rafael Quintero A Mexican Drug Trafficker | 68
...
/u/possible_urban_king
.r/todayIlearned
...
Natural language parsing, NLP.
Supervised and unsupervised learning.
Knowing the right algorithm and its limitations...
Validation and statistics.
... computer science is more than just code ...
For class participation credit, fill out this questionnaire: