I would spend a solid three paragraphs gushing over how excited I am for my semester, but instead I'll just run down my first week and a half of grad life:
Course Work
- Machine Learning: Probably the class I am most excited about this semester. We've been moving through material, and have already covered Nearest Neighbors, Decision Trees, and Maximum Liklihood Estimation. I've been taking math classes since freshman year just for shits and giggles, and this might be the first time I get to start using real math with a purpose (MWAP?). We turned in our first problem set yesterday, and I actually had to bust out my Calc III book to refresh Lagrange multipliers. I'm really happy that I am using my Calc III for something (actually using, not plugging and chugging through problems). It validates that I actually kept my Calc III book (even if I only did so because Barnes and Noble refused to buy it back). I think its going to be a great, proof-based class, but the problem sets alternate between proofs and code, so its a very nice balance. Coding assignments are a nice way to back up theory (unlike my undergrad Analysis class, in which my professor clearly stated "'application' in this class does not mean application to real things; it means application to math").
- Algorithms: This has been mostly review so far, even though I haven't taken an algo course yet, which I guess means my Data Structures class deserves a shout out. We got our first problem set assigned yesterday, but I haven't looked at it yet. Its sitting in the folder next to me, as a type (creepy, cliche, haunting music). Rumor has it that the algo WPE exam has the highest failing rate, but I don't believe in stressing out prematurely, so I will wait and post when I have some grounding for how hard the course actually is.
- Computational Linguistics: A very nice course. This class sort of feels like home right now. My best description is that its relaxing- I can't say I'm getting super fired up about the material, but that is mostly because it is familiar (and because its from 3 - 4:30, and I'm usually a little draggy...). I had a delightful afternoon doing the first problem set today: just implementing method after method of basic text processing- parsing, tokenizing, getting word frequencies, calculating cosine similarities, all using NLTK. Basically, I just love Python. Writing Python is candy and butterflies.
OOC (Out Of Class)
Class is class. But I am beginning to work on becoming a part of the research-critiquing and paper-reading world. This is the best thing, where I am starting to feel like a Ph.D. and not an undergrad...
- CLUNCH: We had our first Computation Linguistics Lunch last Monday, where a few second year students reviewed some conference papers from over the summer. The first was regarding a faster method for learning Viterbi sequences; the model was nice, although during the discussion afterward, the general consensus was that it is fairly difficult to think of a situation with a large enough, dense enough feature set to make it necessary. The second one attempted to model the sentence dependencies in online question forums. I thought this one was interesting, and the use of forum data seemed potentially relevant to textual entailment, which is something I've started thinking more about as an area of research. The third paper was about consistent sentiment tagging or words in WordNet; the discussion of the paper sentiment analysis ended after one of the other students pointed out that the reduction in section 4.1 appears to be backward. CLUNCH ended without any conclusion as to what the authors exactly intended to show with the proof (it seemed they wanted to show NP Completeness by reducing their problem to SAT3) and I have not found any satisfying answer. So, if anyone has an explanation, please let me know.
- Stat NLP: I joined Penn's Stat NLP Reading Group, which starts next Monday with this paper. I have not read it yet, so obviously have no comments. I have scanned it, and I think this group will start out slightly outside my comfort zone, but I am very excited about it. Cheers to learning through immersion.
- TE: My current Google Baby (thing I spend time Googling randomly when I should be working) is Textual Entailment/Paraphrasing/Natural Language Inference...it has a lot of names. I've mostly been going through a good review of the field, and joined Hopkins' googlegroup on the topic (although I haven't been reading all the papers, just some abstracts). But based on my little but of poking around, I think this is a very cool, very compelling research area. #thinkingaboutthesisareaswaytoosoon
- ESL and MTurk: I've taken a one-week hiatus from my summer research, but plan to be back on top of it soon. Updates specific to that will stay on my old blog. (Contrary to what this laundry-list posts suggests, I am trying to keep things compartmentalized.)
- Fellowships: The not-as-fun parts of academic life, but very necessary. Luckily the fellowships I am looking at are nicely spaced out (MSR in October, NSF in November, and NDSEG in December). I am trying to use research statements/proposals as an excuse for some quality reflection on my work and where I'd like to go. But it is also a bit of a time suck. So tradeoffs. Such is life.
- PennApps: I had a short detour last weekend doing my first ever hackathon. My team's hack fell short of what we wanted (we were going for a kind of open-sourced Siri) but it was a blast. I had an awesome teammate from CMU, I made it from 6am on Friday to midnight on Sunday with only 1.5 hours of sleep and two cups of coffee, and I got to play with a lot of APIs I would not have had an excuse to play with otherwise. (I also got to start building a collection of tech-startup t-shirts, which will be necessary when I am a TA next year; all decent CS TAs where tshirts with tech company logos on them (the really great TAs wear witty shirts like this)).
- Latex: I turned in my first ML assignment in Latex. Its good to get the over the how-to-write-math-in-Latex learning curve right away, because it will be immensely helpful from here on out. I have also decided that answers just look correct in Latex. It could be total, utter gibberish, and when typeset in Latex, it looks damn good.
I will go ahead, now, and naively claim that this will not be my pattern, and that the rest of the semester, I will post more frequently but with less content per post. As I write this, I know that this claim won't last a week. But at least now the intent is down in writing.