Data Loaders Overview#
Oxen provides a suite of pre-built data loaders for a variety of common machine learning tasks. These loaders make it easy to extract data from local or remote Oxen repositories and convert it into a format that’s ready to use with your favorite machine learning framework.
Using Loaders with Local Resositories#
For the below repo structure…
MyRepo
--- images/
------ image1.jpg
------ ...
--- labels.txt
--- train.csv
--- test.csv
…transform and load data with the following:
import oxen
repo = oxen.LocalRepo("MyRepo")
train_loader = oxen.loaders.ImageClassificationLoader(
imagery_root_dir = f"{repo.path}",
labels_file = f"{repo.path}/labels.txt",
df_file = f"{repo.path}/train.csv"
)
test_loader = oxen.loaders.ImageClassificationLoader(
imagery_root_dir = f"{repo.path}",
labels_file = f"{repo.path}/labels.txt",
df_file = f"{repo.path}/test.csv"
)
X_train, y_train, label_mapper = train_loader.run()
X_test, y_test, _ = test_loader.run()
# X_train: (50000 x 32 x 32 x 3)
# y_train: (50000,)
# mapper: {"cat": 0, "dog": 1, etc...}
Defining Custom Loaders#
Oxen loaders are constructed as a Directed Acyclic Graph (DAG) of data operations. Each node in the graph inherits oxen.Op
and defines a call()
method. The call()
method is responsible for executing the operation and returning the output.
These nodes are linked together to form a graph with specified inputs and outputs. The graph is then executed with a run() method, returning the outputs.
Example: Creating a Image Classification Loader#
class ImageClassificationLoader:
def __init__(self, imagery_root_dir, label_file, df_file, path_name, label_name):
# DEFINE INPUT NODES
data_frame = ReadDF(input=df_file)
label_list = ReadText(input=label_file)
path_name = Identity(input=path_name)
label_name = Identity(input=label_name)
imagery_root_dir = Identity(input=imagery_root_dir)
# DEFINE INTERMEDIATE NODES
paths = ExtractCol()(data_frame, path_name)
label_text = ExtractCol()(data_frame, label_name)
# DEFINE OUTPUT NODES
images = ReadImageDir()(imagery_root_dir, paths)
label_map = CreateLabelMap()(label_list, label_text)
labels = EncodeLabels()(label_text, label_map)
# Create and compile the graph
self.graph = DAG(outputs=[images, labels, label_map])
def run():
return self.graph.evaluate()
Image Classification Loading#
To use the loader we defined above
from oxen import LocalRepo
from oxen.loaders import ImageClassificationLoader
repo = LocalRepo()
# Demo data for supervised image classification
repo.clone("https://hub.oxen.ai/ba/dataloader-images")
loader = ImageClassificationLoader(
imagery_root_dir = repo.path,
label_file = f"{repo.path}/annotations/labels.txt",
df_file = f"{repo.path}/annotations/train.csv",
path_name = "file",
label_name = "hair_color"
)
X_train, y_train, mapper = loader.run()
Regression Loader#
from oxen import LocalRepo
from oxen.loaders import RegressionLoader
repo = LocalRepo()
# Demo data for supervised image classification
repo.clone("https://hub.oxen.ai/ba/dataloader-regression")
loader = RegressionLoader(
data_file = f"{repo.path}/prices.csv",
pred_name = "price",
f_names = ["sqft", "num_bed", "num_bath"]
)
X, y = loader.run()
Chat Loader#
from oxen import LocalRepo
from oxen.loaders import ChatLoader
repo = LocalRepo()
# Demo data for supervised image classification
repo.clone("https://hub.oxen.ai/ba/dataloader-chat")
loader = ChatLoader(
prompt_file = f"{repo.path}/prompt.txt",
data_file = f"{repo.path}/examples.tsv",
)
[chat_df] = loader.run()