Big Data and Language Technologies Seminar
General Information
Lecturer | Jun.-Prof. Dr. Martin Potthast |
Lab Advisors | Lukas Gienapp, Niklas Deckers |
Workload | 2 SWS Seminar Lecture, 2 SWS Lab |
Seminar Lecture | Monday, 13:30 - 15:00, starting 04.04.2022 in presence, |
Lab | Monday, 15:30 - 16:45, starting 04.04.2022 in presence, |
Contact | Email, or via Discord server "webis-lectures" |
Description | Information on the web is growing at an exponential pace, courtesy of social media platforms, blogs, and news. Such large scale data sources call for high-end, scalable, distributed architectures for cognitive analysis, which shape the business decisions of many industries. In addition, deep learning has been propelled into mainstream and is now accessible to researchers and companies alike, thanks to tools such as TensorFlow, PyTorch. The Webis research group operates large-scale high-performance compute infrastructure (totaling more than 3000 CPU cores, 10+ Petabytes of storage, and 24 high-end GPUs), which will be put to use in the course of this seminar. Students will receive application-oriented training in Big data and deep learning frameworks, language technologies, and explore interesting research questions. This seminar requires good skills in both programming (Python) and algorithms. |
Requirements |
Please note that this course requires prior Python programming experience. In addition, some familiarity with Linux environments, and knowledge of machine learning basics, is highly recommended. To help you gauge your prior subject knowledge, we've provided a set of self-assessment questions below. Read through the self-assessment questions, and take note of how many you can answer in the affirmative, and how many answers you know without having to look them up. Self-assessment questionnaireThis questionnaire is not perfectly suitable for studying in order to catch up; however, the questions should cover a broad range of topics around our course's scope and highlight potential weak points. Python
Linux Command line/Remote work
Machine Learning Basics
|
Deliverables |
In order to successfully complete this course, you will have to:
|
Announcements
Organization
- General Note This is a joint seminar between Uni Weimar and Uni Leipzig. We will use our camera equipment to link both seminars.
- Leipzig-specific Remark This course will have to be credited as Citizen Science on your transcripts as it represents a topic-wise variation of that course. This also means that it cannot be credited twice if you already completed Citizen Science.
- Communication
- Lecture website - materials and announcements will be uploaded on this website.
- Discord - there is a Discord server for this lecture to ask questions and engage in discussion. Check your mails for an access code. Please join the server and choose a Nickname such that we can identify you (at least surname).
- Email - important announcements will be sent out via mail.
Lecturenotes
- Big Data and Language Technologies » Introduction » Organization, Literature [slides] [video (LE)] [video (WE)]
- Big Data and Language Technologies » Introduction » Introduction [slides] [video (LE)] [video (WE)]
- Big Data and Language Technologies » Machine Learning Basics » Regression [slides] [video]
- Big Data and Language Technologies » Machine Learning Basics » Gradient Descent [slides] [video]
- Big Data and Language Technologies » Machine Learning Basics » Recurrent Neural Networks [slides] [video]
- Big Data and Language Technologies » Deep Learning » RNNs for Machine Translation [slides] [video]
Lab Sessions
Date | Title | Description | Materials | Deliverables | Stream |
---|---|---|---|---|---|
04.04.2022 | Deep Learning in Python (Session 1) |
|
|||
11.04.2022 | Deep Learning in Python (Session 2) |
|
|
||
18.04.2022 | No Session (Easter Monday) | ||||
25.04.2022 | Deep Learning in Python (Session 3) |
|
|
[lab] | |
02.05.2022 | Deep Learning on SLURM (Session 1) |
|
|
||
09.05.2022 | Deep Learning on SLURM (Session 2) |
|
Set up Cluster Access | ||
16.05.2022 | Project Fair |
|
|
|
|
23.05.2022 | Prompt Engineering (Session 1) |
|
|||
30.05.2022 | Prompt Engineering (Session 2) |
|
|||
06.06.2022 | No Session (Whit Monday) | ||||
13.06.2022 | Prompt Engineering (Session 3) |
|
|
Prompt Engineering Presentations | |
20.06.2022 | Q&A Session |
|
Project Exposé | ||
27.06.2022 | Group Meetings |
|
|||
04.07.2022 | Mid-Term Presentations |
|
Project Presentation | ||
11.07.2022 | Q&A Session |
|
|||
29.08.2022 | Project Deadline | Hand in your report in PDF format by eMail. Cutoff is 22:00 CEST | Project Report |