Contribution Guide

Welcome to the OpenDataArena community! This guide will help you understand how to contribute to OpenDataArena, whether by uploading datasets, submitting evaluation results, or providing improvement suggestions.

To upload your dataset, please follow these steps:

  1. 1

    Prepare Your Dataset

    Your dataset should be uploaded to Hugging Face, ideally with the following columns:

    • instruction: A clear and concise instruction for each data point. If your original data contains an input key (common in formats like Alpaca), you must concatenate the input value with the instruction value, using a \n as a separator.
    • output: The expected output based on instruction.
    • id (optional): The unique it for each sample.
    • Q_scores (optional): Columns for various normalized scores about Q (instruction), including Clarity, Coherence, Completeness, Complexity, Correctness, Meaningfulness, Difficulty, Deita_Complexity, Thinking_Prob.
    • QA_scores (optional): Columns for various normalized scores about QA (instruction + output), including Clarity, Coherence, Completeness, Complexity, Correctness, Meaningfulness, Relevance, IFD, Deita_Quality. Reward_Model, Fail_Rate, A_Length.

    Q_scores and QA_scores can be efficiently calculated using our OpenDataArena-Tool Data Scorer. More details about each score can be found in OpenDataArena-Tool Data Scorer Documentation. An example can be found in example_upload.jsonl for your reference.

  2. 2

    Upload Your Dataset

    You will upload your dataset directly to the Hugging Face Hub. For a detailed, step-by-step guide on the upload process, please refer to the official Hugging Face documentation:

    https://huggingface.co/docs/hub/datasets-adding

    A simple guide for uploading a JSONL file is provided below.

    First, install the required packages and login to Hugging Face:

    Bash Commands
    pip install huggingface_hub datasets huggingface-cli login # login your huggingface account python upload_your_ds.py
    Python Code
    from datasets import load_dataset from utils_jsonl import read_jsonl #--- Configuration --- # Your Hugging Face username or organization name and desired dataset name. # Replace 'your-username' with your actual username or organization. repo_id = "your-username/my-awesome-viewable-dataset" # The path to your local JSONL file. local_file_path = "my_dataset.jsonl" # --- Load the local JSONL file into a Dataset --- dataset = load_dataset('json', data_files=local_file_path, split='train') print(f"Dataset loaded from {local_file_path}:") print(dataset) print(dataset[0]) # --- Push the dataset to Hugging Face Hub --- print(f"Attempting to push dataset to {repo_id}...") dataset.push_to_hub(repo_id) print(f"Successfully pushed dataset to '{repo_id}'") print(f"You can view your dataset here: https://huggingface.co/datasets/{repo_id}")
  3. 3

    Information

    ×

    Submit Your Dataset Information

    https://huggingface.co/datasets/your-username/your-dataset
    your.email@example.com
    Your preferred username

Need Help?

If you encounter any issues during the contribution process or need more detailed guidance, please feel free to contact us.