Skip to content

Commit ce8bc46

Browse files
committed
Release: 2023-06 June Bootcamp.
1 parent 6dd03c4 commit ce8bc46

File tree

64 files changed

+4244
-15647
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+4244
-15647
lines changed

.gitignore

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,3 @@ dmypy.json
140140

141141
# Cython debug symbols
142142
cython_debug/
143-
144-
# Cache
145-
cache/
146-
.vscode/

README.md

Lines changed: 89 additions & 119 deletions
Original file line numberDiff line numberDiff line change
@@ -1,178 +1,148 @@
1-
![Georgian](assets/georgian-logo.png)
1+
[DEPRECATED]
2+
3+
IMPORTANT: This code is provided as-is from June 2023. Please note that all code provided here is for illustrative purposes only. Dependent libraries have since been updated and current versions may contain vulnerabilities. We do NOT recommend running this code.
24

3-
# Georgian GenAI Boot Camp
5+
![Georgian](assets/georgian-logo.png)
46

5-
Welcome to the Georgian GenAI boot camp repository. This repository contains all the demos we used during our bootcamps. Content for the latest boot camp can be found under the [notebooks](https://github.com/georgian-io/genai-bootcamp/tree/main/notebooks) directory. A copy of the content from previous boot camps can be found in the [archive](https://github.com/georgian-io/genai-bootcamp/tree/main/archive).
7+
# Georgian GenAI Boot Camp - June 2023
68

7-
IMPORTANT: In the archive folder, the code provided as-is from the dates of the respective bootcamps. Please note that all code provided here is for illustrative purposes only. Dependent libraries have since been updated and current versions may contain vulnerabilities. We do NOT recommend running the code in the archive.
9+
Welcome to the Georgian GenAI boot camp repository. This repository contains all the demos we used during the bootcamp.
810

911
## Table of Contents
10-
- [Georgian GenAI Boot Camp](#georgian-genai-boot-camp)
11-
- [Table of Contents](#table-of-contents)
12-
- [Goals](#goals)
13-
- [Setup \& Installation](#setup--installation)
14-
- [Agenda](#agenda)
15-
- [API Access](#api-access)
16-
- [Bootcamp Participants:](#bootcamp-participants)
17-
- [Non-Bootcamp Participants:](#non-bootcamp-participants)
18-
- [Repository Info](#repository-info)
19-
- [Resources](#resources)
2012

13+
* [Goals](#goals)
14+
* [Agenda](#agenda)
15+
* [API Access](#api-access)
16+
* [Setup & Installation](#setup--installation)
17+
* [Repository Info](#repository-info)
18+
* [Resources](#resources)
2119

2220
---
2321
## Goals
24-
Our bootcamps usually consist of a few days of tutorials followed by a hackathon. At the end of the hackathon our goal was for participants to have:
22+
At the end of the hackathon our goal was for participants to have:
2523
- A deeper understanding of the opportunities GenAI unlocks.
2624
- A theoretical understanding of the latest GenAI technologies.
2725
- A practical understanding of the latest GenAI technologies.
2826
- Implemented at least one end-to-end application using GenAI.
2927

3028
[[Back to top]](#)
3129

32-
## Setup & Installation
33-
34-
0. This repository requires you to have installed poetry as a dependency manager. Please follow the instructions to install poetry from [here](https://python-poetry.org/docs/#installation).
35-
36-
1. Clone this repository and `cd genai-bootcamp`
37-
38-
2. Environment management options
39-
40-
a) Poetry: ```poetry shell```
41-
42-
b) Conda: create and activate a conda env for this project:
43-
```bash
44-
conda create -n genai-bootcamp python=3.10
45-
conda activate genai-bootcamp
46-
```
47-
48-
3. Install package
49-
```
50-
poetry install
51-
```
52-
53-
4. Setup private environment files
54-
55-
Paste the `.env` file and `google-api.json` file provided to you into root directory of this repository.
56-
57-
Note: DO NOT COMMIT THIS FILE OR SHARE IT ANYWHERE!
58-
59-
Note: If you have trouble setting up Poetry, you should be able to skip it and just run `pip install -r requirements.txt` instead. Please reach out to us or create an issue if this does not work.
60-
61-
Note: Some operating systems might rename `.env` to `env`. The period at the front is important as all the notebooks expect this. Please rename the file if you run into this issue.
62-
63-
[[Back to top]](#)
64-
6530
---
6631
## Agenda
6732

68-
Below you can see the agenda we followed for our boot camp in October 2023.
33+
Below you can see the agenda we followed for our boot camp in June 2023.
6934

70-
<!-- omit in toc -->
7135
### Day 1:
7236

73-
<!-- omit in toc -->
74-
#### Introduction to LLMs & Prompt Engineering (Georgian & Vector Institute)
75-
* [Azin Asgarian](https://www.linkedin.com/in/azin-asgarian/), AI Technical Lead at Georgian
76-
* [David Emerson](https://www.linkedin.com/in/david-emerson-1b9b2225/), Applied Machine Learning Scientist at Vector Institute
77-
* [Akash Saravanan](https://www.linkedin.com/in/akashsara/), Applied Research at Georgian
37+
#### Introduction to LLMs (David Emerson from Vector)
38+
* LLM Trends
39+
* Foundation models
40+
* Working with LLMs
41+
* Intro to Prompt Engineering
7842

79-
<!-- omit in toc -->
80-
#### Prompt Engineering & Evaluation (Georgian)
81-
* [Akash Saravanan](https://www.linkedin.com/in/akashsara/), Applied Research at Georgian
82-
* [Pashootan Vaezipoor](https://www.linkedin.com/in/pashootan-vaezipoor-7353212a/), Machine Learning Researcher at Georgian
43+
#### Customizing LLMs (David Emerson from Vector)
44+
* Prompt Engineering
45+
* Fine-tuning Approaches
8346

84-
<!-- omit in toc -->
85-
#### [Guest Speakers] Google & Microsoft
86-
* [Erik Saarenvirta](https://www.linkedin.com/in/erik-saarenvirta/?originalSubdomain=ca), Sr. Customer Engineer, Data & AI at Google
87-
* [Asmita Usturge](https://www.linkedin.com/in/asmitausturge/), Senior Data Scientist and Azure Cloud Lead at Microsoft
47+
#### Hands-on Session (Georgian & Google)
48+
* Setup and example notebooks - Akash Saravanan (Georgian)
49+
* Prompt engineering best practices - Royal Sequeira (Georgian)
50+
* Google demo - Erik Saarenvirta (Google)
8851

89-
<!-- omit in toc -->
9052
### Day 2:
53+
#### Tools & platforms (Rodrigo Ceballos from Georgian)
54+
* Concepts with Langchain
55+
* Memory and Search
56+
* Interfaces with Streamlit
57+
* Evaluation with LabelStudio
58+
* Deployment with HuggingFace
59+
60+
#### Fine-tuning, RLHF, and Deployment
61+
* Fine-tuning - Rohit Saha (Georgian)
62+
* RLHF - Akash Saravanan (Georgian)
63+
* Deploying LLMs - Rodrigo Ceballos (Georgian)
64+
65+
#### LLM Privacy and Security
66+
* Introduction - Alex Manea (Georgian)
67+
* Robustness and Mitigating Bias - Angeline Yasodhara (Georgian)
68+
* PrivateGPT - Michael Young and Kory Fong (PrivateAI)
9169

92-
<!-- omit in toc -->
93-
#### LLM Fine-Tuning & Alignment (Georgian)
94-
* [Rohit Saha](https://www.linkedin.com/in/rohit-saha-ai/), Applied Research Scientist at Georgian
95-
* [Akash Saravanan](https://www.linkedin.com/in/akashsara/), Applied Research at Georgian
70+
[[Back to top]](#)
9671

97-
<!-- omit in toc -->
98-
#### Tools, Platforms, & Deployment (Georgian)
99-
* [Rodrigo Ceballos](https://www.linkedin.com/in/rodrigo-ceballos-lentini/), Machine Learning Engineer at Georgian
100-
* [Kyryl Truskovskyi](https://www.linkedin.com/in/kyryl-truskovskyi-275b7967/), Machine Learning Engineer at Georgian
101-
* [Maria Ponomarenko](https://www.linkedin.com/in/maria-ponomarenko-71b465179/), MLOps Intern at Georgian
72+
---
73+
## API Access
74+
To be able to run the notebooks here, you'll need access to API keys for all these services. Read on for instructions on how to set up each of the APIs that you need. Note that the OpenAI and Google APIs will charge you based on usage, so you will need to set up billing.
10275

103-
<!-- omit in toc -->
104-
#### Privacy, Trust & Responsible AI (Georgian & PrivateAI)
105-
* [Angeline Yasodhara](https://www.linkedin.com/in/angelineyasodhara/), Applied Research Scientist at Georgian
106-
* [Mike Brosseau](https://www.linkedin.com/in/mikebrosseau/), Director, Product Management at Private AI
107-
* [Rodrigo Ceballos](https://www.linkedin.com/in/rodrigo-ceballos-lentini/), Machine Learning Engineer at Georgian
76+
Note that to run the examples, you only need to have one API key setup. So if you already have access to an OpenAI key, you could run all the notebooks with it (excluding the Google/HuggingFace examples). The PrivateAI API key is used only for the PrivateAI demos (in `notebooks/day-1/04-example-summarization.ipynb` and `notebooks/extra_resources/PrivateAI Demo.ipynb`).
77+
78+
1. Create a `.env` file. In the root directory of this repo (I.E., the same directory this readme is in), create a `.env` file. Ensure that the period is present at the start of the filename. Within this file, place the following text:
79+
```
80+
OPENAI_API_KEY=""
81+
GOOGLE_APPLICATION_CREDENTIALS="../../google-api.json"
82+
HUGGINGFACEHUB_API_TOKEN=""
83+
HUGGINGFACEHUB_ENDPOINT="https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct"
84+
PRIVATE_AI_API_KEY = ""
85+
```
10886

109-
<!-- omit in toc -->
110-
### Day 3:
87+
2. OpenAI: Create an OpenAI account (or login) and visit the [API Keys](https://platform.openai.com/account/api-keys) page. Generate an API key here and place it in the `.env` file you created above. These examples were designed with GPT-4. If you do not have access to it, please request access through the [waitlist](https://openai.com/waitlist/gpt-4-api). Or alternatively, you can use `gpt-3.5-turbo` instead.
11188

112-
<!-- omit in toc -->
113-
#### [Guest Speakers] Meta & Qdrant
114-
* [Kacper Lukawski](https://www.linkedin.com/in/kacperlukawski/), Developer Advocate at Qdrant
115-
* [Vedanuj Goswami](https://www.linkedin.com/in/vedanuj/), Research Engineer at Meta AI
89+
3. Google: Follow steps 1 through 4 detailed in this [link](https://cloud.google.com/vertex-ai/docs/start/client-libraries). Once you have downloaded the service account key from step 4, place it in the root directory of this repository and rename it to `google-api.json`.
11690

117-
<!-- omit in toc -->
118-
### Day 4:
91+
4. HuggingFace: Create a HuggingFace account (or login) and visit the [Access Tokens](https://huggingface.co/settings/tokens) page in the settings menu. Generate an token (read access is sufficient) and place it in the `.env` file.
11992

120-
<!-- omit in toc -->
121-
#### [Guest Speaker] LLM Observability with Arize AI
122-
* [Amber Roberts](https://www.linkedin.com/in/amber-roberts42/), ML Growth Lead at Arize AI
123-
* [Claire Longo](https://www.linkedin.com/in/claire-longo/), Head of ML Solutions Engineering at Arize AI
93+
5. PrivateAI: Request an API key through [this form](https://www.private-ai.com/api-key/). Add it to the `.env` file above.
12494

95+
6. You should now have all fields in the `.env` file setup and ready to go! You can now proceed with the installation steps below.
12596

12697
[[Back to top]](#)
12798

12899
---
129-
## API Access
130-
131-
### Bootcamp Participants:
100+
## Setup & Installation
132101

133-
To be able to run the notebooks here, you'll need access to API keys for all the different services. Fear not, we've provided you with all the API keys you need. Just download the files we've sent to you and place them in the root of this directory.
102+
0. This repository requires you to have installed poetry as a dependency manager. Please follow the instructions to install poetry from [here](https://python-poetry.org/docs/#installation).
134103

135-
### Non-Bootcamp Participants:
104+
1. Environment management options
105+
106+
a) Poetry: ```poetry shell```
107+
108+
b) Conda: create and activate a conda env for this project:
109+
```bash
110+
conda create -n genai-bootcamp python=3.10
111+
conda activate genai-bootcamp
112+
```
136113

137-
To be able to run the notebooks here, you'll need access to API keys for all these services. Read on for instructions on how to set up each of the APIs that you need. Many of these APIs (such as OpenAI) will charge you based on usage, so you will need to set up billing.
114+
2. Install `fiddler-auditor` (we need to install this separately as it sometimes breaks).
115+
```
116+
pip install fiddler-auditor==0.0.1
117+
```
138118

139-
Note that to run the examples, you only need to have one LLM set up. So if you already have access to an OpenAI key, you could run all the notebooks with it (excluding the Google/HuggingFace examples). The PrivateAI API key is used only for the PrivateAI demo (`notebooks/extra_resources/PrivateAI Demo.ipynb`). We use AnyScale to set up LLaMa 2 access.
119+
3. Install package
120+
```
121+
poetry install
122+
```
140123

141-
1. Create a `.env` file. In the root directory of this repo (I.E., the same directory this readme is in), create a `.env` file. Ensure that the period is present at the start of the filename. Within this file, place the following text:
124+
4. Check installation worked by running
142125
```
143-
OPENAI_API_KEY=""
144-
GOOGLE_APPLICATION_CREDENTIALS="../../google-api.json"
145-
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
146-
ANYSCALE_API_KEY = ""
147-
PRIVATE_AI_API_KEY = ""
148-
AWS_ACCESS_KEY_ID=""
149-
AWS_SECRET_ACCESS_KEY=""
150-
AWS_DEFAULT_REGION="us-east-1"
126+
pytest .
151127
```
152128

153-
1. OpenAI: Create an OpenAI account (or login) and visit the [API Keys](https://platform.openai.com/account/api-keys) page. Generate an API key here and place it in the `.env` file you created above. These examples were designed with GPT-4. If you do not have access to it, please request access through the [waitlist](https://openai.com/waitlist/gpt-4-api). Or alternatively, you can use `gpt-3.5-turbo` instead.
154-
2. Google: Follow steps 1 through 4 detailed in this [link](https://cloud.google.com/vertex-ai/docs/start/client-libraries). Once you have downloaded the service account key from step 4, place it in the root directory of this repository and rename it to `google-api.json`.
155-
3. AnyScale: Once you have billing setup, you can get your API keys from the [credentials](https://app.endpoints.anyscale.com/credentials) page.
156-
4. PrivateAI: Request an API key through [this form](https://www.private-ai.com/api-key/). Add it to the `.env` file above.
157-
5. AWS Bedrock (for Claude): Follow the instructions on [this page](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey) to obtain your AWS keys. Alternatively, you can authenticate via boto3 if you have AWS Bedrock access within your organization.
158-
6. You should now have all fields in the `.env` file setup and ready to go! You can now proceed with the installation steps below.
129+
5. Setup private environment files
159130

160-
[[Back to top]](#)
131+
Paste the `.env` file and `google-api.json` file provided to you into root directory of this repository.
161132

162-
---
133+
Note: DO NOT COMMIT THIS FILE OR SHARE IT ANYWHERE!
163134

164-
## Repository Info
135+
[[Back to top]](#)
165136

166-
<!-- omit in toc -->
137+
## Repository Info
167138
### Poetry
168139
We use [poetry](https://python-poetry.org/) as our dependency manager.
169140
The link above has great documentation but there is a TL;DR.
170141

171142
- Install the package: `poetry install`
172143
- Add a dependency: `poetry add <python-lib>`
173-
- Where are dependencies specified? `pyproject.toml` include the high level requirements. The latests exact versions installed are in `poetry.lock`.
144+
- Where are dependencies specified? `pyproject.toml` include the high level requirements. The latest exact versions installed are in `poetry.lock`.
174145

175-
<!-- omit in toc -->
176146
### Debugging
177147
- If for some reason `poetry install` fails to install a library try to `pip install <lib>` and then run `poetry install` again. This solves 95% of these errors.
178148

@@ -183,4 +153,4 @@ The link above has great documentation but there is a TL;DR.
183153
* [GenAI Interface Cookiecutter](https://github.com/rodrigo-georgian/genai-interface-cookiecutter): A cookie cutter template for you to start off with a basic UI using streamlit.
184154
* [Georgian AI Library (GAL)](https://github.com/georgian-io/GAL): Our library containing overviews of AI techniques.
185155

186-
[[Back to top]](#)
156+
[[Back to top]](#)# t-cot

0 commit comments

Comments
 (0)